186714001SSerapheim Dimitropoulos /*
286714001SSerapheim Dimitropoulos * CDDL HEADER START
386714001SSerapheim Dimitropoulos *
486714001SSerapheim Dimitropoulos * The contents of this file are subject to the terms of the
586714001SSerapheim Dimitropoulos * Common Development and Distribution License (the "License").
686714001SSerapheim Dimitropoulos * You may not use this file except in compliance with the License.
786714001SSerapheim Dimitropoulos *
886714001SSerapheim Dimitropoulos * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
986714001SSerapheim Dimitropoulos * or http://www.opensolaris.org/os/licensing.
1086714001SSerapheim Dimitropoulos * See the License for the specific language governing permissions
1186714001SSerapheim Dimitropoulos * and limitations under the License.
1286714001SSerapheim Dimitropoulos *
1386714001SSerapheim Dimitropoulos * When distributing Covered Code, include this CDDL HEADER in each
1486714001SSerapheim Dimitropoulos * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
1586714001SSerapheim Dimitropoulos * If applicable, add the following below this CDDL HEADER, with the
1686714001SSerapheim Dimitropoulos * fields enclosed by brackets "[]" replaced with your own identifying
1786714001SSerapheim Dimitropoulos * information: Portions Copyright [yyyy] [name of copyright owner]
1886714001SSerapheim Dimitropoulos *
1986714001SSerapheim Dimitropoulos * CDDL HEADER END
2086714001SSerapheim Dimitropoulos */
2186714001SSerapheim Dimitropoulos
2286714001SSerapheim Dimitropoulos /*
2386714001SSerapheim Dimitropoulos * Copyright (c) 2017 by Delphix. All rights reserved.
2486714001SSerapheim Dimitropoulos */
2586714001SSerapheim Dimitropoulos
2686714001SSerapheim Dimitropoulos /*
2786714001SSerapheim Dimitropoulos * Storage Pool Checkpoint
2886714001SSerapheim Dimitropoulos *
2986714001SSerapheim Dimitropoulos * A storage pool checkpoint can be thought of as a pool-wide snapshot or
3086714001SSerapheim Dimitropoulos * a stable version of extreme rewind that guarantees no blocks from the
3186714001SSerapheim Dimitropoulos * checkpointed state will have been overwritten. It remembers the entire
3286714001SSerapheim Dimitropoulos * state of the storage pool (e.g. snapshots, dataset names, etc..) from the
3386714001SSerapheim Dimitropoulos * point that it was taken and the user can rewind back to that point even if
3486714001SSerapheim Dimitropoulos * they applied destructive operations on their datasets or even enabled new
3586714001SSerapheim Dimitropoulos * zpool on-disk features. If a pool has a checkpoint that is no longer
3686714001SSerapheim Dimitropoulos * needed, the user can discard it.
3786714001SSerapheim Dimitropoulos *
3886714001SSerapheim Dimitropoulos * == On disk data structures used ==
3986714001SSerapheim Dimitropoulos *
4086714001SSerapheim Dimitropoulos * - The pool has a new feature flag and a new entry in the MOS. The feature
4186714001SSerapheim Dimitropoulos * flag is set to active when we create the checkpoint and remains active
4286714001SSerapheim Dimitropoulos * until the checkpoint is fully discarded. The entry in the MOS config
4386714001SSerapheim Dimitropoulos * (DMU_POOL_ZPOOL_CHECKPOINT) is populated with the uberblock that
4486714001SSerapheim Dimitropoulos * references the state of the pool when we take the checkpoint. The entry
4586714001SSerapheim Dimitropoulos * remains populated until we start discarding the checkpoint or we rewind
4686714001SSerapheim Dimitropoulos * back to it.
4786714001SSerapheim Dimitropoulos *
4886714001SSerapheim Dimitropoulos * - Each vdev contains a vdev-wide space map while the pool has a checkpoint,
4986714001SSerapheim Dimitropoulos * which persists until the checkpoint is fully discarded. The space map
5086714001SSerapheim Dimitropoulos * contains entries that have been freed in the current state of the pool
5186714001SSerapheim Dimitropoulos * but we want to keep around in case we decide to rewind to the checkpoint.
5286714001SSerapheim Dimitropoulos * [see vdev_checkpoint_sm]
5386714001SSerapheim Dimitropoulos *
5486714001SSerapheim Dimitropoulos * - Each metaslab's ms_sm space map behaves the same as without the
5586714001SSerapheim Dimitropoulos * checkpoint, with the only exception being the scenario when we free
5686714001SSerapheim Dimitropoulos * blocks that belong to the checkpoint. In this case, these blocks remain
5786714001SSerapheim Dimitropoulos * ALLOCATED in the metaslab's space map and they are added as FREE in the
5886714001SSerapheim Dimitropoulos * vdev's checkpoint space map.
5986714001SSerapheim Dimitropoulos *
6086714001SSerapheim Dimitropoulos * - Each uberblock has a field (ub_checkpoint_txg) which holds the txg that
6186714001SSerapheim Dimitropoulos * the uberblock was checkpointed. For normal uberblocks this field is 0.
6286714001SSerapheim Dimitropoulos *
6386714001SSerapheim Dimitropoulos * == Overview of operations ==
6486714001SSerapheim Dimitropoulos *
6586714001SSerapheim Dimitropoulos * - To create a checkpoint, we first wait for the current TXG to be synced,
6686714001SSerapheim Dimitropoulos * so we can use the most recently synced uberblock (spa_ubsync) as the
6786714001SSerapheim Dimitropoulos * checkpointed uberblock. Then we use an early synctask to place that
6886714001SSerapheim Dimitropoulos * uberblock in MOS config, increment the feature flag for the checkpoint
6986714001SSerapheim Dimitropoulos * (marking it active), and setting spa_checkpoint_txg (see its use below)
7086714001SSerapheim Dimitropoulos * to the TXG of the checkpointed uberblock. We use an early synctask for
7186714001SSerapheim Dimitropoulos * the aforementioned operations to ensure that no blocks were dirtied
7286714001SSerapheim Dimitropoulos * between the current TXG and the TXG of the checkpointed uberblock
7386714001SSerapheim Dimitropoulos * (e.g the previous txg).
7486714001SSerapheim Dimitropoulos *
7586714001SSerapheim Dimitropoulos * - When a checkpoint exists, we need to ensure that the blocks that
7686714001SSerapheim Dimitropoulos * belong to the checkpoint are freed but never reused. This means that
7786714001SSerapheim Dimitropoulos * these blocks should never end up in the ms_allocatable or the ms_freeing
7886714001SSerapheim Dimitropoulos * trees of a metaslab. Therefore, whenever there is a checkpoint the new
7986714001SSerapheim Dimitropoulos * ms_checkpointing tree is used in addition to the aforementioned ones.
8086714001SSerapheim Dimitropoulos *
8186714001SSerapheim Dimitropoulos * Whenever a block is freed and we find out that it is referenced by the
8286714001SSerapheim Dimitropoulos * checkpoint (we find out by comparing its birth to spa_checkpoint_txg),
8386714001SSerapheim Dimitropoulos * we place it in the ms_checkpointing tree instead of the ms_freeingtree.
8486714001SSerapheim Dimitropoulos * This way, we divide the blocks that are being freed into checkpointed
8586714001SSerapheim Dimitropoulos * and not-checkpointed blocks.
8686714001SSerapheim Dimitropoulos *
8786714001SSerapheim Dimitropoulos * In order to persist these frees, we write the extents from the
8886714001SSerapheim Dimitropoulos * ms_freeingtree to the ms_sm as usual, and the extents from the
8986714001SSerapheim Dimitropoulos * ms_checkpointing tree to the vdev_checkpoint_sm. This way, these
9086714001SSerapheim Dimitropoulos * checkpointed extents will remain allocated in the metaslab's ms_sm space
9186714001SSerapheim Dimitropoulos * map, and therefore won't be reused [see metaslab_sync()]. In addition,
9286714001SSerapheim Dimitropoulos * when we discard the checkpoint, we can find the entries that have
9386714001SSerapheim Dimitropoulos * actually been freed in vdev_checkpoint_sm.
9486714001SSerapheim Dimitropoulos * [see spa_checkpoint_discard_thread_sync()]
9586714001SSerapheim Dimitropoulos *
9686714001SSerapheim Dimitropoulos * - To discard the checkpoint we use an early synctask to delete the
9786714001SSerapheim Dimitropoulos * checkpointed uberblock from the MOS config, set spa_checkpoint_txg to 0,
9886714001SSerapheim Dimitropoulos * and wakeup the discarding zthr thread (an open-context async thread).
9986714001SSerapheim Dimitropoulos * We use an early synctask to ensure that the operation happens before any
10086714001SSerapheim Dimitropoulos * new data end up in the checkpoint's data structures.
10186714001SSerapheim Dimitropoulos *
10286714001SSerapheim Dimitropoulos * Once the synctask is done and the discarding zthr is awake, we discard
10386714001SSerapheim Dimitropoulos * the checkpointed data over multiple TXGs by having the zthr prefetching
10486714001SSerapheim Dimitropoulos * entries from vdev_checkpoint_sm and then starting a synctask that places
10586714001SSerapheim Dimitropoulos * them as free blocks in to their respective ms_allocatable and ms_sm
10686714001SSerapheim Dimitropoulos * structures.
10786714001SSerapheim Dimitropoulos * [see spa_checkpoint_discard_thread()]
10886714001SSerapheim Dimitropoulos *
10986714001SSerapheim Dimitropoulos * When there are no entries left in the vdev_checkpoint_sm of all
11086714001SSerapheim Dimitropoulos * top-level vdevs, a final synctask runs that decrements the feature flag.
11186714001SSerapheim Dimitropoulos *
11286714001SSerapheim Dimitropoulos * - To rewind to the checkpoint, we first use the current uberblock and
11386714001SSerapheim Dimitropoulos * open the MOS so we can access the checkpointed uberblock from the MOS
11486714001SSerapheim Dimitropoulos * config. After we retrieve the checkpointed uberblock, we use it as the
11586714001SSerapheim Dimitropoulos * current uberblock for the pool by writing it to disk with an updated
11686714001SSerapheim Dimitropoulos * TXG, opening its version of the MOS, and moving on as usual from there.
11786714001SSerapheim Dimitropoulos * [see spa_ld_checkpoint_rewind()]
11886714001SSerapheim Dimitropoulos *
11986714001SSerapheim Dimitropoulos * An important note on rewinding to the checkpoint has to do with how we
12086714001SSerapheim Dimitropoulos * handle ZIL blocks. In the scenario of a rewind, we clear out any ZIL
12186714001SSerapheim Dimitropoulos * blocks that have not been claimed by the time we took the checkpoint
12286714001SSerapheim Dimitropoulos * as they should no longer be valid.
12386714001SSerapheim Dimitropoulos * [see comment in zil_claim()]
12486714001SSerapheim Dimitropoulos *
12586714001SSerapheim Dimitropoulos * == Miscellaneous information ==
12686714001SSerapheim Dimitropoulos *
12786714001SSerapheim Dimitropoulos * - In the hypothetical event that we take a checkpoint, remove a vdev,
12886714001SSerapheim Dimitropoulos * and attempt to rewind, the rewind would fail as the checkpointed
12986714001SSerapheim Dimitropoulos * uberblock would reference data in the removed device. For this reason
13086714001SSerapheim Dimitropoulos * and others of similar nature, we disallow the following operations that
13186714001SSerapheim Dimitropoulos * can change the config:
132555d674dSSerapheim Dimitropoulos * vdev removal and attach/detach, mirror splitting, and pool reguid.
13386714001SSerapheim Dimitropoulos *
13486714001SSerapheim Dimitropoulos * - As most of the checkpoint logic is implemented in the SPA and doesn't
13586714001SSerapheim Dimitropoulos * distinguish datasets when it comes to space accounting, having a
13686714001SSerapheim Dimitropoulos * checkpoint can potentially break the boundaries set by dataset
13786714001SSerapheim Dimitropoulos * reservations.
13886714001SSerapheim Dimitropoulos */
13986714001SSerapheim Dimitropoulos
14086714001SSerapheim Dimitropoulos #include <sys/dmu_tx.h>
14186714001SSerapheim Dimitropoulos #include <sys/dsl_dir.h>
14286714001SSerapheim Dimitropoulos #include <sys/dsl_synctask.h>
14386714001SSerapheim Dimitropoulos #include <sys/metaslab_impl.h>
14486714001SSerapheim Dimitropoulos #include <sys/spa.h>
14586714001SSerapheim Dimitropoulos #include <sys/spa_impl.h>
14686714001SSerapheim Dimitropoulos #include <sys/spa_checkpoint.h>
14786714001SSerapheim Dimitropoulos #include <sys/vdev_impl.h>
14886714001SSerapheim Dimitropoulos #include <sys/zap.h>
14986714001SSerapheim Dimitropoulos #include <sys/zfeature.h>
15086714001SSerapheim Dimitropoulos
15186714001SSerapheim Dimitropoulos /*
15286714001SSerapheim Dimitropoulos * The following parameter limits the amount of memory to be used for the
15386714001SSerapheim Dimitropoulos * prefetching of the checkpoint space map done on each vdev while
15486714001SSerapheim Dimitropoulos * discarding the checkpoint.
15586714001SSerapheim Dimitropoulos *
15686714001SSerapheim Dimitropoulos * The reason it exists is because top-level vdevs with long checkpoint
15786714001SSerapheim Dimitropoulos * space maps can potentially take up a lot of memory depending on the
15886714001SSerapheim Dimitropoulos * amount of checkpointed data that has been freed within them while
15986714001SSerapheim Dimitropoulos * the pool had a checkpoint.
16086714001SSerapheim Dimitropoulos */
16186714001SSerapheim Dimitropoulos uint64_t zfs_spa_discard_memory_limit = 16 * 1024 * 1024;
16286714001SSerapheim Dimitropoulos
16386714001SSerapheim Dimitropoulos int
spa_checkpoint_get_stats(spa_t * spa,pool_checkpoint_stat_t * pcs)16486714001SSerapheim Dimitropoulos spa_checkpoint_get_stats(spa_t *spa, pool_checkpoint_stat_t *pcs)
16586714001SSerapheim Dimitropoulos {
16686714001SSerapheim Dimitropoulos if (!spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT))
16786714001SSerapheim Dimitropoulos return (SET_ERROR(ZFS_ERR_NO_CHECKPOINT));
16886714001SSerapheim Dimitropoulos
16986714001SSerapheim Dimitropoulos bzero(pcs, sizeof (pool_checkpoint_stat_t));
17086714001SSerapheim Dimitropoulos
17186714001SSerapheim Dimitropoulos int error = zap_contains(spa_meta_objset(spa),
17286714001SSerapheim Dimitropoulos DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_ZPOOL_CHECKPOINT);
17386714001SSerapheim Dimitropoulos ASSERT(error == 0 || error == ENOENT);
17486714001SSerapheim Dimitropoulos
17586714001SSerapheim Dimitropoulos if (error == ENOENT)
17686714001SSerapheim Dimitropoulos pcs->pcs_state = CS_CHECKPOINT_DISCARDING;
17786714001SSerapheim Dimitropoulos else
17886714001SSerapheim Dimitropoulos pcs->pcs_state = CS_CHECKPOINT_EXISTS;
17986714001SSerapheim Dimitropoulos
18086714001SSerapheim Dimitropoulos pcs->pcs_space = spa->spa_checkpoint_info.sci_dspace;
18186714001SSerapheim Dimitropoulos pcs->pcs_start_time = spa->spa_checkpoint_info.sci_timestamp;
18286714001SSerapheim Dimitropoulos
18386714001SSerapheim Dimitropoulos return (0);
18486714001SSerapheim Dimitropoulos }
18586714001SSerapheim Dimitropoulos
18686714001SSerapheim Dimitropoulos static void
spa_checkpoint_discard_complete_sync(void * arg,dmu_tx_t * tx)18786714001SSerapheim Dimitropoulos spa_checkpoint_discard_complete_sync(void *arg, dmu_tx_t *tx)
18886714001SSerapheim Dimitropoulos {
18986714001SSerapheim Dimitropoulos spa_t *spa = arg;
19086714001SSerapheim Dimitropoulos
19186714001SSerapheim Dimitropoulos spa->spa_checkpoint_info.sci_timestamp = 0;
19286714001SSerapheim Dimitropoulos
19386714001SSerapheim Dimitropoulos spa_feature_decr(spa, SPA_FEATURE_POOL_CHECKPOINT, tx);
19486714001SSerapheim Dimitropoulos
19586714001SSerapheim Dimitropoulos spa_history_log_internal(spa, "spa discard checkpoint", tx,
19686714001SSerapheim Dimitropoulos "finished discarding checkpointed state from the pool");
19786714001SSerapheim Dimitropoulos }
19886714001SSerapheim Dimitropoulos
19986714001SSerapheim Dimitropoulos typedef struct spa_checkpoint_discard_sync_callback_arg {
20086714001SSerapheim Dimitropoulos vdev_t *sdc_vd;
20186714001SSerapheim Dimitropoulos uint64_t sdc_txg;
20286714001SSerapheim Dimitropoulos uint64_t sdc_entry_limit;
20386714001SSerapheim Dimitropoulos } spa_checkpoint_discard_sync_callback_arg_t;
20486714001SSerapheim Dimitropoulos
20586714001SSerapheim Dimitropoulos static int
spa_checkpoint_discard_sync_callback(space_map_entry_t * sme,void * arg)20617f11284SSerapheim Dimitropoulos spa_checkpoint_discard_sync_callback(space_map_entry_t *sme, void *arg)
20786714001SSerapheim Dimitropoulos {
20886714001SSerapheim Dimitropoulos spa_checkpoint_discard_sync_callback_arg_t *sdc = arg;
20986714001SSerapheim Dimitropoulos vdev_t *vd = sdc->sdc_vd;
21017f11284SSerapheim Dimitropoulos metaslab_t *ms = vd->vdev_ms[sme->sme_offset >> vd->vdev_ms_shift];
21117f11284SSerapheim Dimitropoulos uint64_t end = sme->sme_offset + sme->sme_run;
21286714001SSerapheim Dimitropoulos
21386714001SSerapheim Dimitropoulos if (sdc->sdc_entry_limit == 0)
21486714001SSerapheim Dimitropoulos return (EINTR);
21586714001SSerapheim Dimitropoulos
21686714001SSerapheim Dimitropoulos /*
21786714001SSerapheim Dimitropoulos * Since the space map is not condensed, we know that
21886714001SSerapheim Dimitropoulos * none of its entries is crossing the boundaries of
21986714001SSerapheim Dimitropoulos * its respective metaslab.
22086714001SSerapheim Dimitropoulos *
22186714001SSerapheim Dimitropoulos * That said, there is no fundamental requirement that
22286714001SSerapheim Dimitropoulos * the checkpoint's space map entries should not cross
22386714001SSerapheim Dimitropoulos * metaslab boundaries. So if needed we could add code
22486714001SSerapheim Dimitropoulos * that handles metaslab-crossing segments in the future.
22586714001SSerapheim Dimitropoulos */
22617f11284SSerapheim Dimitropoulos VERIFY3U(sme->sme_type, ==, SM_FREE);
22717f11284SSerapheim Dimitropoulos VERIFY3U(sme->sme_offset, >=, ms->ms_start);
22886714001SSerapheim Dimitropoulos VERIFY3U(end, <=, ms->ms_start + ms->ms_size);
22986714001SSerapheim Dimitropoulos
23086714001SSerapheim Dimitropoulos /*
23186714001SSerapheim Dimitropoulos * At this point we should not be processing any
23286714001SSerapheim Dimitropoulos * other frees concurrently, so the lock is technically
23386714001SSerapheim Dimitropoulos * unnecessary. We use the lock anyway though to
23486714001SSerapheim Dimitropoulos * potentially save ourselves from future headaches.
23586714001SSerapheim Dimitropoulos */
23686714001SSerapheim Dimitropoulos mutex_enter(&ms->ms_lock);
23786714001SSerapheim Dimitropoulos if (range_tree_is_empty(ms->ms_freeing))
23886714001SSerapheim Dimitropoulos vdev_dirty(vd, VDD_METASLAB, ms, sdc->sdc_txg);
23917f11284SSerapheim Dimitropoulos range_tree_add(ms->ms_freeing, sme->sme_offset, sme->sme_run);
24086714001SSerapheim Dimitropoulos mutex_exit(&ms->ms_lock);
24186714001SSerapheim Dimitropoulos
24217f11284SSerapheim Dimitropoulos ASSERT3U(vd->vdev_spa->spa_checkpoint_info.sci_dspace, >=,
24317f11284SSerapheim Dimitropoulos sme->sme_run);
24417f11284SSerapheim Dimitropoulos ASSERT3U(vd->vdev_stat.vs_checkpoint_space, >=, sme->sme_run);
24586714001SSerapheim Dimitropoulos
24617f11284SSerapheim Dimitropoulos vd->vdev_spa->spa_checkpoint_info.sci_dspace -= sme->sme_run;
24717f11284SSerapheim Dimitropoulos vd->vdev_stat.vs_checkpoint_space -= sme->sme_run;
24886714001SSerapheim Dimitropoulos sdc->sdc_entry_limit--;
24986714001SSerapheim Dimitropoulos
25086714001SSerapheim Dimitropoulos return (0);
25186714001SSerapheim Dimitropoulos }
25286714001SSerapheim Dimitropoulos
25386714001SSerapheim Dimitropoulos static void
spa_checkpoint_accounting_verify(spa_t * spa)25486714001SSerapheim Dimitropoulos spa_checkpoint_accounting_verify(spa_t *spa)
25586714001SSerapheim Dimitropoulos {
25686714001SSerapheim Dimitropoulos vdev_t *rvd = spa->spa_root_vdev;
25786714001SSerapheim Dimitropoulos uint64_t ckpoint_sm_space_sum = 0;
25886714001SSerapheim Dimitropoulos uint64_t vs_ckpoint_space_sum = 0;
25986714001SSerapheim Dimitropoulos
26086714001SSerapheim Dimitropoulos for (uint64_t c = 0; c < rvd->vdev_children; c++) {
26186714001SSerapheim Dimitropoulos vdev_t *vd = rvd->vdev_child[c];
26286714001SSerapheim Dimitropoulos
26386714001SSerapheim Dimitropoulos if (vd->vdev_checkpoint_sm != NULL) {
26486714001SSerapheim Dimitropoulos ckpoint_sm_space_sum +=
265555d674dSSerapheim Dimitropoulos -space_map_allocated(vd->vdev_checkpoint_sm);
26686714001SSerapheim Dimitropoulos vs_ckpoint_space_sum +=
26786714001SSerapheim Dimitropoulos vd->vdev_stat.vs_checkpoint_space;
26886714001SSerapheim Dimitropoulos ASSERT3U(ckpoint_sm_space_sum, ==,
26986714001SSerapheim Dimitropoulos vs_ckpoint_space_sum);
27086714001SSerapheim Dimitropoulos } else {
27186714001SSerapheim Dimitropoulos ASSERT0(vd->vdev_stat.vs_checkpoint_space);
27286714001SSerapheim Dimitropoulos }
27386714001SSerapheim Dimitropoulos }
27486714001SSerapheim Dimitropoulos ASSERT3U(spa->spa_checkpoint_info.sci_dspace, ==, ckpoint_sm_space_sum);
27586714001SSerapheim Dimitropoulos }
27686714001SSerapheim Dimitropoulos
27786714001SSerapheim Dimitropoulos static void
spa_checkpoint_discard_thread_sync(void * arg,dmu_tx_t * tx)27886714001SSerapheim Dimitropoulos spa_checkpoint_discard_thread_sync(void *arg, dmu_tx_t *tx)
27986714001SSerapheim Dimitropoulos {
28086714001SSerapheim Dimitropoulos vdev_t *vd = arg;
28186714001SSerapheim Dimitropoulos int error;
28286714001SSerapheim Dimitropoulos
28386714001SSerapheim Dimitropoulos /*
28486714001SSerapheim Dimitropoulos * The space map callback is applied only to non-debug entries.
28586714001SSerapheim Dimitropoulos * Because the number of debug entries is less or equal to the
28686714001SSerapheim Dimitropoulos * number of non-debug entries, we want to ensure that we only
28786714001SSerapheim Dimitropoulos * read what we prefetched from open-context.
28886714001SSerapheim Dimitropoulos *
28986714001SSerapheim Dimitropoulos * Thus, we set the maximum entries that the space map callback
29086714001SSerapheim Dimitropoulos * will be applied to be half the entries that could fit in the
29186714001SSerapheim Dimitropoulos * imposed memory limit.
29217f11284SSerapheim Dimitropoulos *
29317f11284SSerapheim Dimitropoulos * Note that since this is a conservative estimate we also
29417f11284SSerapheim Dimitropoulos * assume the worst case scenario in our computation where each
29517f11284SSerapheim Dimitropoulos * entry is two-word.
29686714001SSerapheim Dimitropoulos */
29786714001SSerapheim Dimitropoulos uint64_t max_entry_limit =
29817f11284SSerapheim Dimitropoulos (zfs_spa_discard_memory_limit / (2 * sizeof (uint64_t))) >> 1;
29986714001SSerapheim Dimitropoulos
30086714001SSerapheim Dimitropoulos /*
30186714001SSerapheim Dimitropoulos * Iterate from the end of the space map towards the beginning,
30286714001SSerapheim Dimitropoulos * placing its entries on ms_freeing and removing them from the
30386714001SSerapheim Dimitropoulos * space map. The iteration stops if one of the following
30486714001SSerapheim Dimitropoulos * conditions is true:
30586714001SSerapheim Dimitropoulos *
30686714001SSerapheim Dimitropoulos * 1] We reached the beginning of the space map. At this point
30786714001SSerapheim Dimitropoulos * the space map should be completely empty and
30886714001SSerapheim Dimitropoulos * space_map_incremental_destroy should have returned 0.
30986714001SSerapheim Dimitropoulos * The next step would be to free and close the space map
31086714001SSerapheim Dimitropoulos * and remove its entry from its vdev's top zap. This allows
31186714001SSerapheim Dimitropoulos * spa_checkpoint_discard_thread() to move on to the next vdev.
31286714001SSerapheim Dimitropoulos *
31386714001SSerapheim Dimitropoulos * 2] We reached the memory limit (amount of memory used to hold
31486714001SSerapheim Dimitropoulos * space map entries in memory) and space_map_incremental_destroy
31586714001SSerapheim Dimitropoulos * returned EINTR. This means that there are entries remaining
31686714001SSerapheim Dimitropoulos * in the space map that will be cleared in a future invocation
31786714001SSerapheim Dimitropoulos * of this function by spa_checkpoint_discard_thread().
31886714001SSerapheim Dimitropoulos */
31986714001SSerapheim Dimitropoulos spa_checkpoint_discard_sync_callback_arg_t sdc;
32086714001SSerapheim Dimitropoulos sdc.sdc_vd = vd;
32186714001SSerapheim Dimitropoulos sdc.sdc_txg = tx->tx_txg;
32217f11284SSerapheim Dimitropoulos sdc.sdc_entry_limit = max_entry_limit;
32386714001SSerapheim Dimitropoulos
32417f11284SSerapheim Dimitropoulos uint64_t words_before =
32517f11284SSerapheim Dimitropoulos space_map_length(vd->vdev_checkpoint_sm) / sizeof (uint64_t);
32686714001SSerapheim Dimitropoulos
32786714001SSerapheim Dimitropoulos error = space_map_incremental_destroy(vd->vdev_checkpoint_sm,
32886714001SSerapheim Dimitropoulos spa_checkpoint_discard_sync_callback, &sdc, tx);
32986714001SSerapheim Dimitropoulos
33017f11284SSerapheim Dimitropoulos uint64_t words_after =
33186714001SSerapheim Dimitropoulos space_map_length(vd->vdev_checkpoint_sm) / sizeof (uint64_t);
33286714001SSerapheim Dimitropoulos
33386714001SSerapheim Dimitropoulos #ifdef DEBUG
33486714001SSerapheim Dimitropoulos spa_checkpoint_accounting_verify(vd->vdev_spa);
33586714001SSerapheim Dimitropoulos #endif
33686714001SSerapheim Dimitropoulos
33786714001SSerapheim Dimitropoulos zfs_dbgmsg("discarding checkpoint: txg %llu, vdev id %d, "
33817f11284SSerapheim Dimitropoulos "deleted %llu words - %llu words are left",
33917f11284SSerapheim Dimitropoulos tx->tx_txg, vd->vdev_id, (words_before - words_after),
34017f11284SSerapheim Dimitropoulos words_after);
34186714001SSerapheim Dimitropoulos
34286714001SSerapheim Dimitropoulos if (error != EINTR) {
34386714001SSerapheim Dimitropoulos if (error != 0) {
34486714001SSerapheim Dimitropoulos zfs_panic_recover("zfs: error %d was returned "
34586714001SSerapheim Dimitropoulos "while incrementally destroying the checkpoint "
34686714001SSerapheim Dimitropoulos "space map of vdev %llu\n",
34786714001SSerapheim Dimitropoulos error, vd->vdev_id);
34886714001SSerapheim Dimitropoulos }
34917f11284SSerapheim Dimitropoulos ASSERT0(words_after);
350555d674dSSerapheim Dimitropoulos ASSERT0(space_map_allocated(vd->vdev_checkpoint_sm));
35117f11284SSerapheim Dimitropoulos ASSERT0(space_map_length(vd->vdev_checkpoint_sm));
35286714001SSerapheim Dimitropoulos
35386714001SSerapheim Dimitropoulos space_map_free(vd->vdev_checkpoint_sm, tx);
35486714001SSerapheim Dimitropoulos space_map_close(vd->vdev_checkpoint_sm);
35586714001SSerapheim Dimitropoulos vd->vdev_checkpoint_sm = NULL;
35686714001SSerapheim Dimitropoulos
35717f11284SSerapheim Dimitropoulos VERIFY0(zap_remove(spa_meta_objset(vd->vdev_spa),
35886714001SSerapheim Dimitropoulos vd->vdev_top_zap, VDEV_TOP_ZAP_POOL_CHECKPOINT_SM, tx));
35986714001SSerapheim Dimitropoulos }
36086714001SSerapheim Dimitropoulos }
36186714001SSerapheim Dimitropoulos
36286714001SSerapheim Dimitropoulos static boolean_t
spa_checkpoint_discard_is_done(spa_t * spa)36386714001SSerapheim Dimitropoulos spa_checkpoint_discard_is_done(spa_t *spa)
36486714001SSerapheim Dimitropoulos {
36586714001SSerapheim Dimitropoulos vdev_t *rvd = spa->spa_root_vdev;
36686714001SSerapheim Dimitropoulos
36786714001SSerapheim Dimitropoulos ASSERT(!spa_has_checkpoint(spa));
36886714001SSerapheim Dimitropoulos ASSERT(spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT));
36986714001SSerapheim Dimitropoulos
37086714001SSerapheim Dimitropoulos for (uint64_t c = 0; c < rvd->vdev_children; c++) {
37186714001SSerapheim Dimitropoulos if (rvd->vdev_child[c]->vdev_checkpoint_sm != NULL)
37286714001SSerapheim Dimitropoulos return (B_FALSE);
37386714001SSerapheim Dimitropoulos ASSERT0(rvd->vdev_child[c]->vdev_stat.vs_checkpoint_space);
37486714001SSerapheim Dimitropoulos }
37586714001SSerapheim Dimitropoulos
37686714001SSerapheim Dimitropoulos return (B_TRUE);
37786714001SSerapheim Dimitropoulos }
37886714001SSerapheim Dimitropoulos
37986714001SSerapheim Dimitropoulos /* ARGSUSED */
38086714001SSerapheim Dimitropoulos boolean_t
spa_checkpoint_discard_thread_check(void * arg,zthr_t * zthr)38186714001SSerapheim Dimitropoulos spa_checkpoint_discard_thread_check(void *arg, zthr_t *zthr)
38286714001SSerapheim Dimitropoulos {
38386714001SSerapheim Dimitropoulos spa_t *spa = arg;
38486714001SSerapheim Dimitropoulos
38586714001SSerapheim Dimitropoulos if (!spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT))
38686714001SSerapheim Dimitropoulos return (B_FALSE);
38786714001SSerapheim Dimitropoulos
38886714001SSerapheim Dimitropoulos if (spa_has_checkpoint(spa))
38986714001SSerapheim Dimitropoulos return (B_FALSE);
39086714001SSerapheim Dimitropoulos
39186714001SSerapheim Dimitropoulos return (B_TRUE);
39286714001SSerapheim Dimitropoulos }
39386714001SSerapheim Dimitropoulos
394*6a316e1fSSerapheim Dimitropoulos void
spa_checkpoint_discard_thread(void * arg,zthr_t * zthr)39586714001SSerapheim Dimitropoulos spa_checkpoint_discard_thread(void *arg, zthr_t *zthr)
39686714001SSerapheim Dimitropoulos {
39786714001SSerapheim Dimitropoulos spa_t *spa = arg;
39886714001SSerapheim Dimitropoulos vdev_t *rvd = spa->spa_root_vdev;
39986714001SSerapheim Dimitropoulos
40086714001SSerapheim Dimitropoulos for (uint64_t c = 0; c < rvd->vdev_children; c++) {
40186714001SSerapheim Dimitropoulos vdev_t *vd = rvd->vdev_child[c];
40286714001SSerapheim Dimitropoulos
40386714001SSerapheim Dimitropoulos while (vd->vdev_checkpoint_sm != NULL) {
40486714001SSerapheim Dimitropoulos space_map_t *checkpoint_sm = vd->vdev_checkpoint_sm;
40586714001SSerapheim Dimitropoulos int numbufs;
40686714001SSerapheim Dimitropoulos dmu_buf_t **dbp;
40786714001SSerapheim Dimitropoulos
40886714001SSerapheim Dimitropoulos if (zthr_iscancelled(zthr))
409*6a316e1fSSerapheim Dimitropoulos return;
41086714001SSerapheim Dimitropoulos
41186714001SSerapheim Dimitropoulos ASSERT3P(vd->vdev_ops, !=, &vdev_indirect_ops);
41286714001SSerapheim Dimitropoulos
41386714001SSerapheim Dimitropoulos uint64_t size = MIN(space_map_length(checkpoint_sm),
41486714001SSerapheim Dimitropoulos zfs_spa_discard_memory_limit);
41586714001SSerapheim Dimitropoulos uint64_t offset =
41686714001SSerapheim Dimitropoulos space_map_length(checkpoint_sm) - size;
41786714001SSerapheim Dimitropoulos
41886714001SSerapheim Dimitropoulos /*
41986714001SSerapheim Dimitropoulos * Ensure that the part of the space map that will
42086714001SSerapheim Dimitropoulos * be destroyed by the synctask, is prefetched in
42186714001SSerapheim Dimitropoulos * memory before the synctask runs.
42286714001SSerapheim Dimitropoulos */
42386714001SSerapheim Dimitropoulos int error = dmu_buf_hold_array_by_bonus(
42486714001SSerapheim Dimitropoulos checkpoint_sm->sm_dbuf, offset, size,
42586714001SSerapheim Dimitropoulos B_TRUE, FTAG, &numbufs, &dbp);
42686714001SSerapheim Dimitropoulos if (error != 0) {
42786714001SSerapheim Dimitropoulos zfs_panic_recover("zfs: error %d was returned "
42886714001SSerapheim Dimitropoulos "while prefetching checkpoint space map "
42986714001SSerapheim Dimitropoulos "entries of vdev %llu\n",
43086714001SSerapheim Dimitropoulos error, vd->vdev_id);
43186714001SSerapheim Dimitropoulos }
43286714001SSerapheim Dimitropoulos
43386714001SSerapheim Dimitropoulos VERIFY0(dsl_sync_task(spa->spa_name, NULL,
43486714001SSerapheim Dimitropoulos spa_checkpoint_discard_thread_sync, vd,
43586714001SSerapheim Dimitropoulos 0, ZFS_SPACE_CHECK_NONE));
43686714001SSerapheim Dimitropoulos
43786714001SSerapheim Dimitropoulos dmu_buf_rele_array(dbp, numbufs, FTAG);
43886714001SSerapheim Dimitropoulos }
43986714001SSerapheim Dimitropoulos }
44086714001SSerapheim Dimitropoulos
44186714001SSerapheim Dimitropoulos VERIFY(spa_checkpoint_discard_is_done(spa));
44286714001SSerapheim Dimitropoulos VERIFY0(spa->spa_checkpoint_info.sci_dspace);
44386714001SSerapheim Dimitropoulos VERIFY0(dsl_sync_task(spa->spa_name, NULL,
44486714001SSerapheim Dimitropoulos spa_checkpoint_discard_complete_sync, spa,
44586714001SSerapheim Dimitropoulos 0, ZFS_SPACE_CHECK_NONE));
44686714001SSerapheim Dimitropoulos }
44786714001SSerapheim Dimitropoulos
44886714001SSerapheim Dimitropoulos
44986714001SSerapheim Dimitropoulos /* ARGSUSED */
45086714001SSerapheim Dimitropoulos static int
spa_checkpoint_check(void * arg,dmu_tx_t * tx)45186714001SSerapheim Dimitropoulos spa_checkpoint_check(void *arg, dmu_tx_t *tx)
45286714001SSerapheim Dimitropoulos {
45386714001SSerapheim Dimitropoulos spa_t *spa = dmu_tx_pool(tx)->dp_spa;
45486714001SSerapheim Dimitropoulos
45586714001SSerapheim Dimitropoulos if (!spa_feature_is_enabled(spa, SPA_FEATURE_POOL_CHECKPOINT))
45686714001SSerapheim Dimitropoulos return (SET_ERROR(ENOTSUP));
45786714001SSerapheim Dimitropoulos
45886714001SSerapheim Dimitropoulos if (!spa_top_vdevs_spacemap_addressable(spa))
45986714001SSerapheim Dimitropoulos return (SET_ERROR(ZFS_ERR_VDEV_TOO_BIG));
46086714001SSerapheim Dimitropoulos
46186714001SSerapheim Dimitropoulos if (spa->spa_vdev_removal != NULL)
46286714001SSerapheim Dimitropoulos return (SET_ERROR(ZFS_ERR_DEVRM_IN_PROGRESS));
46386714001SSerapheim Dimitropoulos
46486714001SSerapheim Dimitropoulos if (spa->spa_checkpoint_txg != 0)
46586714001SSerapheim Dimitropoulos return (SET_ERROR(ZFS_ERR_CHECKPOINT_EXISTS));
46686714001SSerapheim Dimitropoulos
46786714001SSerapheim Dimitropoulos if (spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT))
46886714001SSerapheim Dimitropoulos return (SET_ERROR(ZFS_ERR_DISCARDING_CHECKPOINT));
46986714001SSerapheim Dimitropoulos
47086714001SSerapheim Dimitropoulos return (0);
47186714001SSerapheim Dimitropoulos }
47286714001SSerapheim Dimitropoulos
47386714001SSerapheim Dimitropoulos /* ARGSUSED */
47486714001SSerapheim Dimitropoulos static void
spa_checkpoint_sync(void * arg,dmu_tx_t * tx)47586714001SSerapheim Dimitropoulos spa_checkpoint_sync(void *arg, dmu_tx_t *tx)
47686714001SSerapheim Dimitropoulos {
47786714001SSerapheim Dimitropoulos dsl_pool_t *dp = dmu_tx_pool(tx);
47886714001SSerapheim Dimitropoulos spa_t *spa = dp->dp_spa;
47986714001SSerapheim Dimitropoulos uberblock_t checkpoint = spa->spa_ubsync;
48086714001SSerapheim Dimitropoulos
48186714001SSerapheim Dimitropoulos /*
48286714001SSerapheim Dimitropoulos * At this point, there should not be a checkpoint in the MOS.
48386714001SSerapheim Dimitropoulos */
48486714001SSerapheim Dimitropoulos ASSERT3U(zap_contains(spa_meta_objset(spa), DMU_POOL_DIRECTORY_OBJECT,
48586714001SSerapheim Dimitropoulos DMU_POOL_ZPOOL_CHECKPOINT), ==, ENOENT);
48686714001SSerapheim Dimitropoulos
48786714001SSerapheim Dimitropoulos ASSERT0(spa->spa_checkpoint_info.sci_timestamp);
48886714001SSerapheim Dimitropoulos ASSERT0(spa->spa_checkpoint_info.sci_dspace);
48986714001SSerapheim Dimitropoulos
49086714001SSerapheim Dimitropoulos /*
49186714001SSerapheim Dimitropoulos * Since the checkpointed uberblock is the one that just got synced
49286714001SSerapheim Dimitropoulos * (we use spa_ubsync), its txg must be equal to the txg number of
49386714001SSerapheim Dimitropoulos * the txg we are syncing, minus 1.
49486714001SSerapheim Dimitropoulos */
49586714001SSerapheim Dimitropoulos ASSERT3U(checkpoint.ub_txg, ==, spa->spa_syncing_txg - 1);
49686714001SSerapheim Dimitropoulos
49786714001SSerapheim Dimitropoulos /*
49886714001SSerapheim Dimitropoulos * Once the checkpoint is in place, we need to ensure that none of
49986714001SSerapheim Dimitropoulos * its blocks will be marked for reuse after it has been freed.
50086714001SSerapheim Dimitropoulos * When there is a checkpoint and a block is freed, we compare its
50186714001SSerapheim Dimitropoulos * birth txg to the txg of the checkpointed uberblock to see if the
50286714001SSerapheim Dimitropoulos * block is part of the checkpoint or not. Therefore, we have to set
50386714001SSerapheim Dimitropoulos * spa_checkpoint_txg before any frees happen in this txg (which is
50486714001SSerapheim Dimitropoulos * why this is done as an early_synctask as explained in the comment
50586714001SSerapheim Dimitropoulos * in spa_checkpoint()).
50686714001SSerapheim Dimitropoulos */
50786714001SSerapheim Dimitropoulos spa->spa_checkpoint_txg = checkpoint.ub_txg;
50886714001SSerapheim Dimitropoulos spa->spa_checkpoint_info.sci_timestamp = checkpoint.ub_timestamp;
50986714001SSerapheim Dimitropoulos
51086714001SSerapheim Dimitropoulos checkpoint.ub_checkpoint_txg = checkpoint.ub_txg;
51186714001SSerapheim Dimitropoulos VERIFY0(zap_add(spa->spa_dsl_pool->dp_meta_objset,
51286714001SSerapheim Dimitropoulos DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_ZPOOL_CHECKPOINT,
51386714001SSerapheim Dimitropoulos sizeof (uint64_t), sizeof (uberblock_t) / sizeof (uint64_t),
51486714001SSerapheim Dimitropoulos &checkpoint, tx));
51586714001SSerapheim Dimitropoulos
51686714001SSerapheim Dimitropoulos /*
51786714001SSerapheim Dimitropoulos * Increment the feature refcount and thus activate the feature.
51886714001SSerapheim Dimitropoulos * Note that the feature will be deactivated when we've
51986714001SSerapheim Dimitropoulos * completely discarded all checkpointed state (both vdev
52086714001SSerapheim Dimitropoulos * space maps and uberblock).
52186714001SSerapheim Dimitropoulos */
52286714001SSerapheim Dimitropoulos spa_feature_incr(spa, SPA_FEATURE_POOL_CHECKPOINT, tx);
52386714001SSerapheim Dimitropoulos
52486714001SSerapheim Dimitropoulos spa_history_log_internal(spa, "spa checkpoint", tx,
52586714001SSerapheim Dimitropoulos "checkpointed uberblock txg=%llu", checkpoint.ub_txg);
52686714001SSerapheim Dimitropoulos }
52786714001SSerapheim Dimitropoulos
52886714001SSerapheim Dimitropoulos /*
52986714001SSerapheim Dimitropoulos * Create a checkpoint for the pool.
53086714001SSerapheim Dimitropoulos */
53186714001SSerapheim Dimitropoulos int
spa_checkpoint(const char * pool)53286714001SSerapheim Dimitropoulos spa_checkpoint(const char *pool)
53386714001SSerapheim Dimitropoulos {
53486714001SSerapheim Dimitropoulos int error;
53586714001SSerapheim Dimitropoulos spa_t *spa;
53686714001SSerapheim Dimitropoulos
53786714001SSerapheim Dimitropoulos error = spa_open(pool, &spa, FTAG);
53886714001SSerapheim Dimitropoulos if (error != 0)
53986714001SSerapheim Dimitropoulos return (error);
54086714001SSerapheim Dimitropoulos
54186714001SSerapheim Dimitropoulos mutex_enter(&spa->spa_vdev_top_lock);
54286714001SSerapheim Dimitropoulos
54386714001SSerapheim Dimitropoulos /*
54486714001SSerapheim Dimitropoulos * Wait for current syncing txg to finish so the latest synced
54586714001SSerapheim Dimitropoulos * uberblock (spa_ubsync) has all the changes that we expect
54686714001SSerapheim Dimitropoulos * to see if we were to revert later to the checkpoint. In other
54786714001SSerapheim Dimitropoulos * words we want the checkpointed uberblock to include/reference
54886714001SSerapheim Dimitropoulos * all the changes that were pending at the time that we issued
54986714001SSerapheim Dimitropoulos * the checkpoint command.
55086714001SSerapheim Dimitropoulos */
55186714001SSerapheim Dimitropoulos txg_wait_synced(spa_get_dsl(spa), 0);
55286714001SSerapheim Dimitropoulos
55386714001SSerapheim Dimitropoulos /*
55486714001SSerapheim Dimitropoulos * As the checkpointed uberblock references blocks from the previous
55586714001SSerapheim Dimitropoulos * txg (spa_ubsync) we want to ensure that are not freeing any of
55686714001SSerapheim Dimitropoulos * these blocks in the same txg that the following synctask will
55786714001SSerapheim Dimitropoulos * run. Thus, we run it as an early synctask, so the dirty changes
55886714001SSerapheim Dimitropoulos * that are synced to disk afterwards during zios and other synctasks
55986714001SSerapheim Dimitropoulos * do not reuse checkpointed blocks.
56086714001SSerapheim Dimitropoulos */
56186714001SSerapheim Dimitropoulos error = dsl_early_sync_task(pool, spa_checkpoint_check,
56286714001SSerapheim Dimitropoulos spa_checkpoint_sync, NULL, 0, ZFS_SPACE_CHECK_NORMAL);
56386714001SSerapheim Dimitropoulos
56486714001SSerapheim Dimitropoulos mutex_exit(&spa->spa_vdev_top_lock);
56586714001SSerapheim Dimitropoulos
56686714001SSerapheim Dimitropoulos spa_close(spa, FTAG);
56786714001SSerapheim Dimitropoulos return (error);
56886714001SSerapheim Dimitropoulos }
56986714001SSerapheim Dimitropoulos
57086714001SSerapheim Dimitropoulos /* ARGSUSED */
57186714001SSerapheim Dimitropoulos static int
spa_checkpoint_discard_check(void * arg,dmu_tx_t * tx)57286714001SSerapheim Dimitropoulos spa_checkpoint_discard_check(void *arg, dmu_tx_t *tx)
57386714001SSerapheim Dimitropoulos {
57486714001SSerapheim Dimitropoulos spa_t *spa = dmu_tx_pool(tx)->dp_spa;
57586714001SSerapheim Dimitropoulos
57686714001SSerapheim Dimitropoulos if (!spa_feature_is_active(spa, SPA_FEATURE_POOL_CHECKPOINT))
57786714001SSerapheim Dimitropoulos return (SET_ERROR(ZFS_ERR_NO_CHECKPOINT));
57886714001SSerapheim Dimitropoulos
57986714001SSerapheim Dimitropoulos if (spa->spa_checkpoint_txg == 0)
58086714001SSerapheim Dimitropoulos return (SET_ERROR(ZFS_ERR_DISCARDING_CHECKPOINT));
58186714001SSerapheim Dimitropoulos
58286714001SSerapheim Dimitropoulos VERIFY0(zap_contains(spa_meta_objset(spa),
58386714001SSerapheim Dimitropoulos DMU_POOL_DIRECTORY_OBJECT, DMU_POOL_ZPOOL_CHECKPOINT));
58486714001SSerapheim Dimitropoulos
58586714001SSerapheim Dimitropoulos return (0);
58686714001SSerapheim Dimitropoulos }
58786714001SSerapheim Dimitropoulos
58886714001SSerapheim Dimitropoulos /* ARGSUSED */
58986714001SSerapheim Dimitropoulos static void
spa_checkpoint_discard_sync(void * arg,dmu_tx_t * tx)59086714001SSerapheim Dimitropoulos spa_checkpoint_discard_sync(void *arg, dmu_tx_t *tx)
59186714001SSerapheim Dimitropoulos {
59286714001SSerapheim Dimitropoulos spa_t *spa = dmu_tx_pool(tx)->dp_spa;
59386714001SSerapheim Dimitropoulos
59486714001SSerapheim Dimitropoulos VERIFY0(zap_remove(spa_meta_objset(spa), DMU_POOL_DIRECTORY_OBJECT,
59586714001SSerapheim Dimitropoulos DMU_POOL_ZPOOL_CHECKPOINT, tx));
59686714001SSerapheim Dimitropoulos
59786714001SSerapheim Dimitropoulos spa->spa_checkpoint_txg = 0;
59886714001SSerapheim Dimitropoulos
59986714001SSerapheim Dimitropoulos zthr_wakeup(spa->spa_checkpoint_discard_zthr);
60086714001SSerapheim Dimitropoulos
60186714001SSerapheim Dimitropoulos spa_history_log_internal(spa, "spa discard checkpoint", tx,
60286714001SSerapheim Dimitropoulos "started discarding checkpointed state from the pool");
60386714001SSerapheim Dimitropoulos }
60486714001SSerapheim Dimitropoulos
60586714001SSerapheim Dimitropoulos /*
60686714001SSerapheim Dimitropoulos * Discard the checkpoint from a pool.
60786714001SSerapheim Dimitropoulos */
60886714001SSerapheim Dimitropoulos int
spa_checkpoint_discard(const char * pool)60986714001SSerapheim Dimitropoulos spa_checkpoint_discard(const char *pool)
61086714001SSerapheim Dimitropoulos {
61186714001SSerapheim Dimitropoulos /*
61286714001SSerapheim Dimitropoulos * Similarly to spa_checkpoint(), we want our synctask to run
61386714001SSerapheim Dimitropoulos * before any pending dirty data are written to disk so they
61486714001SSerapheim Dimitropoulos * won't end up in the checkpoint's data structures (e.g.
61586714001SSerapheim Dimitropoulos * ms_checkpointing and vdev_checkpoint_sm) and re-create any
61686714001SSerapheim Dimitropoulos * space maps that the discarding open-context thread has
61786714001SSerapheim Dimitropoulos * deleted.
61886714001SSerapheim Dimitropoulos * [see spa_discard_checkpoint_sync and spa_discard_checkpoint_thread]
61986714001SSerapheim Dimitropoulos */
62086714001SSerapheim Dimitropoulos return (dsl_early_sync_task(pool, spa_checkpoint_discard_check,
62186714001SSerapheim Dimitropoulos spa_checkpoint_discard_sync, NULL, 0,
62286714001SSerapheim Dimitropoulos ZFS_SPACE_CHECK_DISCARD_CHECKPOINT));
62386714001SSerapheim Dimitropoulos }
624