The two callers to a mirror job (drive-mirror and blockdev-mirror) set
zero_target precisely when sync mode == FULL, with the one exception
that drive-mirror skips zeroing the target if it was newly created and
reads as zero. But given the previous patch, that exception is
equally captured by target_is_zero.
Meanwhile, there is another slight wrinkle, fortunately caught by
iotest 185: if the caller uses "sync":"top" but the source has no
backing file, the code in blockdev.c was changing sync to be FULL, but
only after it had set zero_target=false. In mirror.c, prior to recent
patches, this didn't matter: the only places that inspected sync were
setting is_none_mode (both TOP and FULL had set that to false), and
mirror_start() setting base = mode == MIRROR_SYNC_MODE_TOP ?
bdrv_backing_chain_next(bs) : NULL. But now that we are passing sync
around, the slammed sync mode would result in a new pre-zeroing pass
even when the user had passed "sync":"top" in an effort to skip
pre-zeroing. Fortunately, the assignment of base when bs has no
backing chain still works out to NULL if we don't slam things. So
with the forced change of sync ripped out of blockdev.c, the sync mode
is passed through the full callstack unmolested, and we can now
reliably reconstruct the same settings as what used to be passed in by
zero_target=false, without the redundant parameter.
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20250509204341.3553601-24-eblake@redhat.com>
Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[eblake: Fix regression in iotest 185]
Signed-off-by: Eric Blake <eblake@redhat.com>
QEMU has an optimization for a just-created drive-mirror destination
that is not possible for blockdev-mirror (which can't create the
destination) - any time we know the destination starts life as all
zeroes, we can skip a pre-zeroing pass on the destination. Recent
patches have added an improved heuristic for detecting if a file
contains all zeroes, and we plan to use that heuristic in upcoming
patches. But since a heuristic cannot quickly detect all scenarios,
and there may be cases where the caller is aware of information that
QEMU cannot learn quickly, it makes sense to have a way to tell QEMU
to assume facts about the destination that can make the mirror
operation faster. Given our existing example of "qemu-img convert
--target-is-zero", it is time to expose this override in QMP for
blockdev-mirror as well.
This patch results in some slight redundancy between the older
s->zero_target (set any time mode==FULL and the destination image was
not just created - ie. clear if drive-mirror is asking to skip the
pre-zero pass) and the newly-introduced s->target_is_zero (in addition
to the QMP override, it is set when drive-mirror creates the
destination image); this will be cleaned up in the next patch.
There is also a subtlety that we must consider. When drive-mirror is
passing target_is_zero on behalf of a just-created image, we know the
image is sparse (skipping the pre-zeroing keeps it that way), so it
doesn't matter whether the destination also has "discard":"unmap" and
"detect-zeroes":"unmap". But now that we are letting the user set the
knob for target-is-zero, if the user passes a pre-existing file that
is fully allocated, it is fine to leave the file fully allocated under
"detect-zeroes":"on", but if the file is open with
"detect-zeroes":"unmap", we should really be trying harder to punch
holes in the destination for every region of zeroes copied from the
source. The easiest way to do this is to still run the pre-zeroing
pass (turning the entire destination file sparse before populating
just the allocated portions of the source), even though that currently
results in double I/O to the portions of the file that are allocated.
A later patch will add further optimizations to reduce redundant
zeroing I/O during the mirror operation.
Since "target-is-zero":true is designed for optimizations, it is okay
to silently ignore the parameter rather than erroring if the user ever
sets the parameter in a scenario where the mirror job can't exploit it
(for example, when doing "sync":"top" instead of "sync":"full", we
can't pre-zero, so setting the parameter won't make a speed
difference).
Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20250509204341.3553601-23-eblake@redhat.com>
Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This patch extends the blockdev-backup QMP command to allow users to specify
how to behave when IO errors occur during copy-before-write operations.
Previously, the behavior was fixed and could not be controlled by the user.
The new 'on-cbw-error' option can be set to one of two values:
- 'break-guest-write': Forwards the IO error to the guest and triggers
the on-source-error policy. This preserves snapshot integrity at the
expense of guest IO operations.
- 'break-snapshot': Allows the guest OS to continue running normally,
but invalidates the snapshot and aborts related jobs. This prioritizes
guest operation over backup consistency.
This enhancement provides more flexibility for backup operations in different
environments where requirements for guest availability versus backup
consistency may vary.
The default behavior remains unchanged to maintain backward compatibility.
Signed-off-by: Raman Dzehtsiar <Raman.Dzehtsiar@gmail.com>
Message-ID: <20250414090025.828660-1-Raman.Dzehtsiar@gmail.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
[vsementsov: fix long lines]
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Add a parameter that enables discard-after-copy. That is mostly useful
in "push backup with fleecing" scheme, when source is snapshot-access
format driver node, based on copy-before-write filter snapshot-access
API:
[guest] [snapshot-access] ~~ blockdev-backup ~~> [backup target]
| |
| root | file
v v
[copy-before-write]
| |
| file | target
v v
[active disk] [temp.img]
In this case discard-after-copy does two things:
- discard data in temp.img to save disk space
- avoid further copy-before-write operation in discarded area
Note that we have to declare WRITE permission on source in
copy-before-write filter, for discard to work. Still we can't take it
unconditionally, as it will break normal backup from RO source. So, we
have to add a parameter and pass it thorough bdrv_open flags.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Fiona Ebner <f.ebner@proxmox.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20240313152822.626493-5-vsementsov@yandex-team.ru>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Introduce a new flag 'backing-mask-protocol' for the block-stream QMP
command which instructs the internals to use 'raw' instead of the
protocol driver in case when a image is used without a dummy 'raw'
wrapper.
The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.
The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <bbee9a0a59748a8893289bf8249f568f0d587e62.1701796348.git.pkrempa@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Introduce a new flag 'backing-mask-protocol' for the block-commit QMP
command which instructs the internals to use 'raw' instead of the
protocol driver in case when a image is used without a dummy 'raw'
wrapper.
The flag is designed such that it can be always asserted by management
tools even when there isn't any update to backing files.
The flag will be used by libvirt so that the backing images still
reference the proper format even when libvirt will stop using the dummy
raw driver (raw driver with no other config). Libvirt needs this so that
the images stay compatible with older libvirt versions which didn't
expect that a protocol driver name can appear in the backing file format
field.
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <2cb46e37093ce793ea1604abc8bbb90f4c8e434b.1701796348.git.pkrempa@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_skip_implicit_filters() need to hold a reader lock for the graph
because it calls bdrv_filter_child(), which accesses bs->file/backing.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-8-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_root_attach_child(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231027155333.420094-5-kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_root_unref_child(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-20-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function reads the parents list, so it needs to hold the graph lock.
This happens to result in BlockDriver.bdrv_set_perm() to be called with
the graph lock held. For consistency, make it the same for all of the
BlockDriver callbacks for updating permissions and annotate the function
pointers with GRAPH_RDLOCK_PTR.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-15-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
The function reads the parents list, so it needs to hold the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230911094620.45040-14-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_recurse_can_replace() need to hold a reader lock for the graph
because it accesses the children list of a node.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20230504115750.54437-20-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
We have two inclusion loops:
block/block.h
-> block/block-global-state.h
-> block/block-common.h
-> block/blockjob.h
-> block/block.h
block/block.h
-> block/block-io.h
-> block/block-common.h
-> block/blockjob.h
-> block/block.h
I believe these go back to Emanuele's reorganization of the block API,
merged a few months ago in commit d7e2fe4aac.
Fortunately, breaking them is merely a matter of deleting unnecessary
includes from headers, and adding them back in places where they are
now missing.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20221221133551.3967339-2-armbru@redhat.com>
Remove the old assert_bdrv_graph_writable, and replace it with
the new version using graph-lock API.
See the function documentation for more information.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20221207131838.239125-14-kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Cleaned up with scripts/clean-header-guards.pl.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-5-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Rename the type to be reused. Old name is "what is it for". To be
natively reused for other needs, let's name it exactly "what is it".
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org>
Message-Id: <20220314213226.362217-2-v.sementsov-og@mail.ru>
[eblake: Adjust S-o-b to Vladimir's new email, with permission]
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: John Snow <jsnow@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
We want to be sure that the functions that write the child and
parent list of a bs are under BQL and drain.
BQL prevents from concurrent writings from the GS API, while
drains protect from I/O.
TODO: drains are missing in some functions using this assert.
Therefore a proper assertion will fail. Because adding drains
requires additional discussions, they will be added in future
series.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-15-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Similarly to the previous patch, split block_int.h
in block_int-io.h and block_int-global-state.h
block_int-common.h contains the structures shared between
the two headers, and the functions that can't be categorized as
I/O or global state.
Assertions are added in the next patch.
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20220303151616.325444-12-eesposit@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>