qemu-cr16/include
Eric Blake 9d86181874 qio: Protect NetListener callback with mutex
Without a mutex, NetListener can run into this data race between a
thread changing the async callback callback function to use when a
client connects, and the thread servicing polling of the listening
sockets:

  Thread 1:
       qio_net_listener_set_client_func(lstnr, f1, ...);
           => foreach sock: socket
               => object_ref(lstnr)
               => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);

  Thread 2:
       poll()
          => event POLLIN on socket
               => ref(GSourceCallback)
               => if (lstnr->io_func) // while lstnr->io_func is f1
                    ...interrupt..

  Thread 1:
       qio_net_listener_set_client_func(lstnr, f2, ...);
          => foreach sock: socket
               => g_source_unref(sock_src)
          => foreach sock: socket
               => object_ref(lstnr)
               => sock_src = qio_channel_socket_add_watch_source(sock, ...., lstnr, object_unref);

  Thread 2:
               => call lstnr->io_func(lstnr->io_data) // now sees f2
               => return dispatch(sock)
               => unref(GSourceCallback)
                  => destroy-notify
                     => object_unref

Found by inspection; I did not spend the time trying to add sleeps or
execute under gdb to try and actually trigger the race in practice.
This is a SEGFAULT waiting to happen if f2 can become NULL because
thread 1 deregisters the user's callback while thread 2 is trying to
service the callback.  Other messes are also theoretically possible,
such as running callback f1 with an opaque pointer that should only be
passed to f2 (if the client code were to use more than just a binary
choice between a single async function or NULL).

Mitigating factor: if the code that modifies the QIONetListener can
only be reached by the same thread that is executing the polling and
async callbacks, then we are not in a two-thread race documented above
(even though poll can see two clients trying to connect in the same
window of time, any changes made to the listener by the first async
callback will be completed before the thread moves on to the second
client).  However, QEMU is complex enough that this is hard to
generically analyze.  If QMP commands (like nbd-server-stop) are run
in the main loop and the listener uses the main loop, things should be
okay.  But when a client uses an alternative GMainContext, or if
servicing a QMP command hands off to a coroutine to avoid blocking, I
am unable to state with certainty whether a given net listener can be
modified by a thread different from the polling thread running
callbacks.

At any rate, it is worth having the API be robust.  To ensure that
modifying a NetListener can be safely done from any thread, add a
mutex that guarantees atomicity to all members of a listener object
related to callbacks.  This problem has been present since
QIONetListener was introduced.

Note that this does NOT prevent the case of a second round of the
user's old async callback being invoked with the old opaque data, even
when the user has already tried to change the async callback during
the first async callback; it is only about ensuring that there is no
sharding (the eventual io_func(io_data) call that does get made will
correspond to a particular combination that the user had requested at
some point in time, and not be sharded to a combination that never
existed in practice).  In other words, this patch maintains the status
quo that a user's async callback function already needs to be robust
to parallel clients landing in the same window of poll servicing, even
when only one client is desired, if that particular listener can be
amended in a thread other than the one doing the polling.

CC: qemu-stable@nongnu.org
Fixes: 53047392 ("io: introduce a network socket listener API", v2.12.0)
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-ID: <20251113011625.878876-20-eblake@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
[eblake: minor commit message wording improvements]
Signed-off-by: Eric Blake <eblake@redhat.com>
2025-11-13 08:46:41 -06:00
..
accel accel/tcg: Add cpu_atomic_*_mmu for 16-byte xchg, fetch_and, fetch_or 2025-08-30 16:37:23 +01:00
authz
block block: Allow drivers to control protocol prefix at creation 2025-11-11 22:06:09 +01:00
chardev char: rename CharBackend->CharFrontend 2025-10-28 14:49:52 +01:00
crypto crypto: support upto 5 parallel certificate identities 2025-11-03 10:45:55 +00:00
disas disas: Fix build against Capstone v6 (again) 2024-11-05 10:09:59 +00:00
exec exec/cpu: Declare cpu_memory_rw_debug() in 'hw/core/cpu.h' and document 2025-11-03 11:59:32 +01:00
fpu fpu: Move m68k_denormal fmt flag into floatx80_behaviour 2025-02-25 15:32:57 +00:00
gdbstub gdbstub/helpers: Replace TARGET_BIG_ENDIAN -> target_big_endian() 2025-07-15 02:56:39 -04:00
hw intel_iommu: Handle PASID cache invalidation 2025-11-09 08:23:48 -05:00
io qio: Protect NetListener callback with mutex 2025-11-13 08:46:41 -06:00
libdecnumber include/libdecnumber: replace FSF postal address with licenses URL 2025-06-26 00:42:37 +02:00
migration migration: vmsd errp handlers: return bool 2025-11-03 16:04:10 -05:00
monitor qdev: add qdev_find_default_bus() 2025-10-29 22:53:41 +04:00
net qom: remove redundant typedef when use OBJECT_DECLARE_SIMPLE_TYPE 2025-10-28 08:08:04 +01:00
qapi error: Kill @error_warn 2025-10-01 08:33:24 +02:00
qemu Merge crypto and other misc fixes / features 2025-11-04 15:17:31 +01:00
qobject qobject: make refcount atomic 2025-10-28 13:02:26 +01:00
qom qom: reverse order of instance_post_init calls 2025-05-20 08:18:53 +02:00
scsi
semihosting include/semihosting/common-semi: extract common_semi API 2025-09-26 09:55:19 +01:00
standard-headers linux-headers: Update to Linux v6.18-rc3 2025-10-30 10:33:55 +08:00
system igvm: add support for initial register state load in native mode 2025-11-03 07:38:53 +01:00
tcg tcg: Add tcg_gen_atomic_{xchg,fetch_and,fetch_or}_i128 2025-08-30 16:37:24 +01:00
ui ui/gtk: Add scale option 2025-07-15 10:22:33 +04:00
user accel/tcg: Add clear_flags argument to page_set_flags 2025-10-14 07:30:39 -07:00
elf.h elf: Add EF_MIPS_ARCH_ASE definitions 2025-09-02 17:57:05 +02:00
glib-compat.h include/glib-compat.h: Poison g_list_sort and g_slist_sort 2025-05-06 16:02:04 +02:00
qemu-io.h
qemu-main.h ui & main loop: Redesign of system-specific main thread event handling 2024-12-31 21:21:34 +01:00