qemu-cr16/block
Hanna Czenczek b002acacc1 Revert "nvme: Fix coroutine waking"
This reverts commit 0f142cbd91.

Said commit changed the replay_bh_schedule_oneshot_event() in
nvme_rw_cb() to aio_co_wake(), allowing the request coroutine to be
entered directly (instead of only being scheduled for later execution).
This can cause the device to become stalled like so:

It is possible that after completion the request coroutine goes on to
submit another request without yielding, e.g. a flush after a write to
emulate FUA.  This will likely cause a nested nvme_process_completion()
call because nvme_rw_cb() itself is called from there.

(After submitting a request, we invoke nvme_process_completion() through
defer_call(); but the fact that nvme_process_completion() ran in the
first place indicates that we are not in a call-deferring section, so
defer_call() will call nvme_process_completion() immediately.)

If this inner nvme_process_completion() loop then processes any
completions, it will write the final completion queue (CQ) head index to
the CQ head doorbell, and subsequently execution will return to the
outer nvme_process_completion() loop.  Even if this loop now finds no
further completions, it still processed at least one completion before,
or it would not have called the nvme_rw_cb() which led to nesting.
Therefore, it will now write the exact same CQ head index value to the
doorbell, which effectively is an unrecoverable error[1].

Therefore, nesting of nvme_process_completion() does not work at this
point.  Reverting said commit removes the nesting (by scheduling the
request coroutine instead of entering it immediately), and so fixes the
stall.

On the downside, reverting said commit breaks multiqueue for nvme, but
better to have single-queue working than neither.  For 11.0, we will
have a solution that makes both work.

A side note: There is a comment in nvme_process_completion() above
qemu_bh_schedule() that claims nesting works, as long as it is done
through the completion_bh.  I am quite sure that is not true, for two
reasons:
- The problem described above, which is even worse when going through
  nvme_process_completion_bh() because that function unconditionally
  writes to the CQ head doorbell,
- nvme_process_completion_bh() never takes q->lock, so
  nvme_process_completion() unlocking it will likely abort.

Given the lack of reports of such aborts, I believe that completion_bh
simply is unused in practice.

[1] See the NVMe Base Specification revision 2.3, page 180, figure 152:
    “Invalid Doorbell Write Value: A host attempted to write an invalid
    doorbell value. Some possible causes of this error are: [...] the
    value written is the same as the previously written doorbell value.”

    To even be notified of this error, we would need to send an
    Asynchronous Event Request to the admin queue (p. 178ff), which we
    don’t do, and then to handle it, we would need to delete and
    recreate the queue (p. 88, section 3.3.1.2 Queue Usage).

Cc: qemu-stable@nongnu.org
Reported-by: Lukáš Doktor <ldoktor@redhat.com>
Tested-by: Lukáš Doktor <ldoktor@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
Message-id: 20251215141540.88915-1-hreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-12-15 09:50:41 -05:00
..
export block/export: Add option to allow export of inactive nodes 2025-02-06 14:46:40 +01:00
monitor block/monitor: Use hmp_handle_error to report error 2025-10-29 12:10:09 +01:00
accounting.c block: enable stats-intervals for storage devices 2025-10-29 12:10:09 +01:00
aio_task.c block: Remove unused aio_task_pool_empty 2024-09-30 10:53:18 +03:00
amend.c block: Mark BlockDriver callbacks for amend job GRAPH_RDLOCK 2023-05-10 14:16:54 +02:00
backup.c block: add bdrv_graph_wrlock_drained() convenience wrapper 2025-07-14 15:40:58 +02:00
blkdebug.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
blkio.c include/system: Move exec/memory.h to system/memory.h 2025-04-23 14:08:21 -07:00
blklogwrites.c block: add bdrv_graph_wrlock_drained() convenience wrapper 2025-07-14 15:40:58 +02:00
blkreplay.c blkreplay: Run BH in coroutine’s AioContext 2025-11-18 18:01:55 +01:00
blkverify.c block: add bdrv_graph_wrlock_drained() convenience wrapper 2025-07-14 15:40:58 +02:00
block-backend.c block: use pwrite_zeroes_alignment when writing first sector 2025-11-25 15:26:22 +01:00
block-copy.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
block-gen.h block-coroutine-wrapper.py: support also basic return types 2022-12-15 16:07:43 +01:00
block-ram-registrar.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
bochs.c block: replace TABs with space 2025-11-11 22:06:09 +01:00
cloop.c block: Take graph lock for most of .bdrv_open 2023-11-08 17:56:18 +01:00
commit.c block/commit: mark commit_abort() as GRAPH_UNLOCKED 2025-07-14 15:42:13 +02:00
copy-before-write.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
copy-before-write.h blockdev-backup: Add error handling option for copy-before-write jobs 2025-05-12 18:19:31 +03:00
copy-on-read.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
copy-on-read.h block: Mark bdrv_(un)freeze_backing_chain() and callers GRAPH_RDLOCK 2023-11-07 19:14:19 +01:00
coroutines.h block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
create.c qemu/compiler: Absorb 'clang-tsa.h' 2025-03-06 14:21:25 +01:00
crypto.c block: Allow drivers to control protocol prefix at creation 2025-11-11 22:06:09 +01:00
crypto.h block: Support detached LUKS header creation using qemu-img 2024-02-09 12:50:37 +00:00
curl.c curl: Fix coroutine waking 2025-11-18 18:01:50 +01:00
dirty-bitmap.c block: Mark bdrv_*_dirty_bitmap() and callers GRAPH_RDLOCK 2023-02-23 19:49:32 +01:00
dmg-bz2.c
dmg-lzfse.c block/dmg: Ignore C99 prototype declaration mismatch from <lzfse.h> 2023-03-30 15:03:36 +02:00
dmg.c block: Protect bs->file with graph_lock 2023-11-08 17:56:18 +01:00
dmg.h block/dmg: Declare a type definition for DMG uncompress function 2023-04-24 13:53:44 -04:00
file-posix.c file-posix: Handle suspended dm-multipath better for SG_IO 2025-12-04 18:34:15 +01:00
file-win32.c block: replace TABs with space 2025-11-11 22:06:09 +01:00
filter-compress.c block: Take graph lock for most of .bdrv_open 2023-11-08 17:56:18 +01:00
gluster.c gluster: Do not move coroutine into BDS context 2025-11-18 18:01:50 +01:00
graph-lock.c block: add bdrv_graph_wrlock_drained() convenience wrapper 2025-07-14 15:40:58 +02:00
io.c block/io: Take reqs_lock for tracked_requests 2025-11-18 18:01:54 +01:00
io_uring.c block/io_uring: use non-vectored read/write when possible 2025-11-11 22:06:09 +01:00
iscsi-opts.c modules: add block module annotations 2021-07-09 18:20:27 +02:00
iscsi.c iscsi: Create AIO BH in original AioContext 2025-11-18 18:01:57 +01:00
linux-aio.c block: skip automatic zero-init of large array in ioq_submit 2025-06-12 13:39:08 -04:00
meson.build include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
mirror.c block: drop wrapper for bdrv_set_backing_hd_drained() 2025-07-14 15:41:58 +02:00
nbd.c treewide: handle result of qio_channel_set_blocking() 2025-09-19 12:46:07 +01:00
nfs.c nfs: Run co BH CB in the coroutine’s AioContext 2025-11-18 18:01:50 +01:00
null.c null-aio: Run CB in original AioContext 2025-11-18 18:01:57 +01:00
nvme.c Revert "nvme: Fix coroutine waking" 2025-12-15 09:50:41 -05:00
parallels-ext.c qapi/crypto: Rename QCryptoHashAlgorithm to *Algo, and drop prefix 2024-09-10 14:02:16 +02:00
parallels.c block: Allow drivers to control protocol prefix at creation 2025-11-11 22:06:09 +01:00
parallels.h block: Protect bs->file with graph_lock 2023-11-08 17:56:18 +01:00
preallocate.c block: Protect bs->file with graph_lock 2023-11-08 17:56:18 +01:00
progress_meter.c coroutine: Clean up superfluous inclusion of qemu/lockable.h 2023-01-19 10:18:28 +01:00
qapi-system.c qapi: Move include/qapi/qmp/ to include/qobject/ 2025-02-10 15:33:16 +01:00
qapi.c qemu-img info: Optionally show block limits 2025-10-29 12:10:10 +01:00
qcow.c block: Allow drivers to control protocol prefix at creation 2025-11-11 22:06:09 +01:00
qcow2-bitmap.c block/qcow2-bitmap: Replace g_memdup() by g_memdup2() 2024-05-08 19:11:34 +02:00
qcow2-cache.c qcow2: Mark qcow2_signal_corruption() and callers GRAPH_RDLOCK 2023-10-12 16:31:33 +02:00
qcow2-cluster.c qcow2: put discards in discard queue when discard-no-unref is enabled 2025-11-11 22:06:09 +01:00
qcow2-refcount.c qcow2: put discards in discard queue when discard-no-unref is enabled 2025-11-11 22:06:09 +01:00
qcow2-snapshot.c include: Rename sysemu/ -> system/ 2024-12-20 17:44:56 +01:00
qcow2-threads.c thread-pool: avoid passing the pool parameter every time 2023-04-25 13:17:28 +02:00
qcow2.c qcow2: Schedule cache-clean-timer in realtime 2025-11-18 18:01:55 +01:00
qcow2.h qcow2: Fix cache_clean_timer 2025-11-18 18:01:55 +01:00
qed-check.c qed: mark more functions as coroutine_fns and GRAPH_RDLOCK 2023-06-28 09:46:20 +02:00
qed-cluster.c
qed-l2-cache.c osdep: Move memalign-related functions to their own header 2022-03-07 13:16:49 +00:00
qed-table.c block: use bdrv_co_debug_event in coroutine context 2023-06-28 09:46:34 +02:00
qed.c block: Allow drivers to control protocol prefix at creation 2025-11-11 22:06:09 +01:00
qed.h block: Protect bs->file with graph_lock 2023-11-08 17:56:18 +01:00
quorum.c block: add bdrv_graph_wrlock_drained() convenience wrapper 2025-07-14 15:40:58 +02:00
raw-format.c block: Allow drivers to control protocol prefix at creation 2025-11-11 22:06:09 +01:00
rbd.c rbd: Run co BH CB in the coroutine’s AioContext 2025-11-18 18:01:50 +01:00
replication.c block: mark bdrv_reopen_queue() and bdrv_reopen_multiple() as GRAPH_UNLOCKED 2025-07-14 15:42:05 +02:00
reqlist.c block/reqlist: allow adding overlapping requests 2024-09-30 10:53:18 +03:00
snapshot-access.c block: Expand block status mode from bool to flags 2025-05-14 15:33:34 -05:00
snapshot.c block: add bdrv_graph_wrlock_drained() convenience wrapper 2025-07-14 15:40:58 +02:00
ssh.c ssh: Run restart_coroutine in current AioContext 2025-11-18 18:01:55 +01:00
stream.c block/stream: mark stream_prepare() as GRAPH_UNLOCKED 2025-07-14 15:42:04 +02:00
throttle-groups.c qom: Make InterfaceInfo[] uses const 2025-04-25 17:00:41 +02:00
throttle.c block: Take graph lock for most of .bdrv_open 2023-11-08 17:56:18 +01:00
trace-events block/io_uring: use aio_add_sqe() 2025-11-11 22:06:09 +01:00
trace.h trace: switch position of headers to what Meson requires 2020-08-21 06:18:24 -04:00
vdi.c block: Allow drivers to control protocol prefix at creation 2025-11-11 22:06:09 +01:00
vhdx-endian.c
vhdx-log.c vhdx: Take locks for accessing bs->file 2023-11-08 17:56:18 +01:00
vhdx.c block: Allow drivers to control protocol prefix at creation 2025-11-11 22:06:09 +01:00
vhdx.h vhdx: Take locks for accessing bs->file 2023-11-08 17:56:18 +01:00
vmdk.c Fix const qualifier build errors with recent glibc 2025-12-09 21:00:15 +01:00
vpc.c block: Allow drivers to control protocol prefix at creation 2025-11-11 22:06:09 +01:00
vvfat.c Fix const qualifier build errors with recent glibc 2025-12-09 21:00:15 +01:00
win32-aio.c win32-aio: Run CB in original context 2025-11-18 18:01:57 +01:00
write-threshold.c block: remove AioContext locking 2023-12-21 22:49:27 +01:00