qemu-cr16

No description

Find a file

Peter Xu b63a2e9e4b migration/postcopy: Optimize blocktime fault tracking with hashtable Currently, the postcopy blocktime feature maintains vCPU fault information using an array (vcpu_addr[]). It has two issues. Issue 1: Performance Concern ============================ The old algorithm was almost OK and fast on inserts, except that the lookup is slow and won't scale if there are a lot of vCPUs: when a page is copied during postcopy, mark_postcopy_blocktime_end() will walk the whole array trying to find which vCPUs are blocked by the address. So it needs constant O(N) walk for each page resolution. Alexey (the author of postcopy blocktime) mentioned the perf issue and how to optimize it in a piece of comment in the page resolution path. The comment was (interestingly..) not complete, but it's relatively clear what he wanted to say about this perf issue. Issue 2: Wrong Accounting on re-entrancies ========================================== People might think that each vCPU should only and always get one fault at a time, so that when the blocktime layer captured one fault on one vCPU, we should never see another fault message on this vCPU. It's almost correct, except in some extreme rare cases. Case 1: it's possible the fault thread processes the userfaultfd messages too fast so it can see >1 messages on one vCPU before the previous one was resolved. Case 2: it's theoretically also possible one vCPU can get even more than one message on the same fault address if a fault is retried by the kernel (e.g., handle_userfault() got interrupted before page resolution). As this info might be important, instead of using commit message, I put more details into the code as comment, when introducing an array maintaining concurrent faults on one vCPU. Please refer to the comments for details on both cases, especially case 1 which can be tricky. Case 1 sounds rare, but it can be easily reproduced locally for me when we run blocktime together with the migration-test on the vanilla postcopy. New Design ========== This patch should do almost what Alexey mentioned, but slightly differently: instead of having an array to maintain vCPU fault addresses, for each of the fault message we push a message into a hash, indexed by the fault address. With the hash, it can replace the old two structs: both the vcpu_addr[] array, and also the array to store the start time of the fault. However due to above we need one more counter array to account concurrent faults on the same vCPU - that should even be needed in the old code, it's just that the old code was buggy and it will blindly overwrite an existing entry.. now we'll start to really track everything. The hash structure might be more efficient than tree to maintain such addr->(cpu, fault_time) information, so that the insert() and lookup() paths should ideally both be ~O(1). After all, we do not need to sort. Here we need to do one remove() though after the lookup(). It could be slow but only if many vCPUs faulted exactly on the same address (so when the list of cpu entries is long), which should be unlikely. Even with that, it's still a worst case O(N) (consider 400 vCPUs faulted on the same address and how likely is it..) rather than a constant O(N) complexity. When at it, touch up the tracepoints to make them slightly more useful. One tracepoint is added when walking all the fault entries. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-13-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>		2025-07-11 10:37:38 -03:00
.github/workflows	github: fix config mistake preventing repo lockdown commenting	2022-04-26 16:12:26 +01:00
.gitlab/issue_templates	.gitlab/issue_templates: Move suggestions into comments	2022-12-15 15:19:24 +01:00
.gitlab-ci.d	gitlab: mark s390x-system to allow failures	2025-07-02 09:56:37 +01:00
accel	Accelerators patches	2025-07-07 09:18:34 -04:00
audio	audio: Reset rate control when adding bytes	2025-05-25 15:25:21 +02:00
authz	qom: Make InterfaceInfo[] uses const	2025-04-25 17:00:41 +02:00
backends	iommufd: preserve DMA mappings	2025-07-03 13:42:28 +02:00
block	block: skip automatic zero-init of large array in ioq_submit	2025-06-12 13:39:08 -04:00
bsd-user	accel: Propagate AccelState to AccelClass::init_machine()	2025-07-04 15:22:02 +02:00
chardev	chardev/char-socket: skip automatic zero-init of large array	2025-06-12 13:39:08 -04:00
common-user	common-user/host/riscv: use tail pseudoinstruction for calling tail	2025-05-19 13:38:04 +10:00
configs	hw/riscv: Initial support for BOSC's Xiangshan Kunminghu FPGA prototype	2025-07-04 21:09:49 +10:00
contrib	contrib: replace FSF postal address with licenses URL	2025-06-26 00:42:37 +02:00
crypto	crypto: fully drop built-in cipher provider	2025-05-22 11:24:25 +01:00
disas	include/system: Move exec/memory.h to system/memory.h	2025-04-23 14:08:21 -07:00
docs	migration: Rename save_live_complete_precopy to save_complete	2025-07-11 10:37:36 -03:00
dump	cleanup: Drop pointless return at end of function	2025-04-24 09:33:42 +02:00
ebpf	ebpf: improve trace event coverage to all key operations	2024-10-28 14:37:25 +08:00
fpu	fpu: Build only once	2025-02-25 15:32:57 +00:00
fsdev	9pfs: Introduce futimens file op	2025-05-05 11:28:29 +02:00
gdb-xml	gdbstub: update aarch64-core.xml	2025-06-07 16:40:44 +01:00
gdbstub	gdbstub: Expose gdb_write_register function to consumers of gdbstub	2025-07-02 10:09:48 +01:00
host/include	host/include/loongarch64: Fix inline assembly compatibility with Clang	2025-03-21 11:31:56 +08:00
hw	migration: Rename save_live_complete_precopy to save_complete	2025-07-11 10:37:36 -03:00
include	migration: Rename save_live_complete_precopy to save_complete	2025-07-11 10:37:36 -03:00
io	io: Add helper for setting socket send buffer size	2025-05-29 16:37:15 -05:00
libdecnumber	libdecnumber: replace FSF postal address with licenses URL	2025-06-26 00:42:37 +02:00
linux-headers	update Linux headers to v6.16-rc3	2025-06-20 13:25:59 +02:00
linux-user	target-arm queue:	2025-07-07 09:22:41 -04:00
migration	migration/postcopy: Optimize blocktime fault tracking with hashtable	2025-07-11 10:37:38 -03:00
monitor	monitor/hmp-cmds-target: add CPU_DUMP_VPU in hmp_info_registers()	2025-07-04 15:37:08 +02:00
nbd	nbd: Set unix socket send buffer on Linux	2025-05-29 16:37:15 -05:00
net	net/stream: skip automatic zero-init of large array	2025-06-12 13:40:16 -04:00
pc-bios	pc-bios/dtb/meson: Prefer target name to be outfile, not infile	2025-06-17 09:54:51 +02:00
plugins	plugins: Add memory hardware address read/write API	2025-07-02 10:09:48 +01:00
po	po: update Italian translation	2024-08-13 19:01:42 +02:00
python	tests/functional: Add hvf_available() helper	2025-07-01 17:22:27 +01:00
qapi	migration/postcopy: Report fault latencies in blocktime	2025-07-11 10:37:38 -03:00
qga	qga/vss-win32: Add VSS provider unregistration retry	2025-06-30 13:17:10 +03:00
qobject	qapi: Move include/qapi/qmp/ to include/qobject/	2025-02-10 15:33:16 +01:00
qom	qom: reverse order of instance_post_init calls	2025-05-20 08:18:53 +02:00
replay	include/exec: Split out icount.h	2025-04-23 14:08:44 -07:00
roms	seabios: update submodule to 1.17.0	2025-06-11 09:44:02 +02:00
rust	rust: hpet: fix new warning	2025-06-20 13:25:59 +02:00
scripts	scripts: replace FSF postal address with licenses URL	2025-06-26 00:42:37 +02:00
scsi	qom: Make InterfaceInfo[] uses const	2025-04-25 17:00:41 +02:00
semihosting	semihosting/uaccess: Compile once	2025-07-02 10:09:48 +01:00
stats	qapi: Move include/qapi/qmp/ to include/qobject/	2025-02-10 15:33:16 +01:00
storage-daemon	storage-daemon/qapi/qapi-schema: Add a proper introduction	2025-04-08 09:04:34 +02:00
stubs	qapi: make s390x specific CPU commands unconditionally available	2025-05-28 18:56:08 +02:00
subprojects	subprojects: add the foreign crate	2025-06-05 20:24:51 +02:00
system	Accelerators patches	2025-07-07 09:18:34 -04:00
target	target-arm queue:	2025-07-07 09:22:41 -04:00
tcg	tcg: Fix constant propagation in tcg_reg_alloc_dup	2025-06-30 07:42:56 -06:00
tests	migration/postcopy: Report fault latencies in blocktime	2025-07-11 10:37:38 -03:00
tools	cleanup: Re-run return_directly.cocci	2025-04-24 09:33:24 +02:00
trace	meson: fix Windows build	2025-06-16 13:16:27 -04:00
ui	ui/vnc: Update display update interval when VM state changes to RUNNING	2025-06-23 16:03:59 -04:00
util	util/rcu.c: replace FSF postal address with licenses URL	2025-06-26 00:42:37 +02:00
.b4-config	b4: Drop linktrailermask	2025-07-03 13:42:28 +02:00
.dir-locals.el
.editorconfig	editorconfig: update for perl scripts	2025-01-17 10:45:38 +00:00
.exrc
.gdbinit	.gdbinit: load QEMU sub-commands when gdb starts	2017-06-07 14:38:45 +01:00
.git-blame-ignore-revs	metadata: add .git-blame-ignore-revs	2023-04-04 15:56:44 +01:00
.gitattributes	rust: patch bilge-impl to allow compilation with 1.63.0	2024-11-05 14:18:16 +01:00
.gitignore	configure: rename --enable-pypi to --enable-download, control subprojects too	2023-06-06 16:30:01 +02:00
.gitlab-ci.yml	docs: Document GitLab custom CI/CD variables	2021-07-29 07:56:01 +02:00
.gitmodules	meson: subprojects: replace berkeley-{soft,test}float-3 with wraps	2023-06-06 16:30:01 +02:00
.gitpublish	Add a git-publish configuration file	2018-03-05 09:03:17 +00:00
.mailmap	MAINTAINERS: Update Akihiko Odaki's affiliation	2025-06-11 13:08:31 +02:00
.patchew.yml	scripts/checkpatch: roll diff tweaking into checkpatch itself	2021-06-25 10:08:33 +01:00
.readthedocs.yml	readthodocs: fully specify a build environment	2024-01-12 13:23:48 +00:00
.travis.yml	travis.yml: Remove the aarch64 job	2025-06-11 12:17:17 +02:00
block.c	block: move drain outside of quorum_del_child()	2025-06-04 18:16:34 +02:00
blockdev-nbd.c	qapi: merge common parts of NbdServerOptions and nbd-server-start data	2025-03-04 16:44:48 -06:00
blockdev.c	blockdev: drain while unlocked in external_snapshot_action()	2025-06-04 18:16:34 +02:00
blockjob.c	block: move drain outside of bdrv_root_unref_child()	2025-06-04 18:16:34 +02:00
clippy.toml	rust: use native Meson support for clippy and rustdoc	2025-06-03 22:42:18 +02:00
configure	build, dockerfiles: add support for detecting rustdoc	2025-06-03 22:42:18 +02:00
COPYING	COPYING: replace FSF postal address with licenses URL	2025-06-26 00:42:37 +02:00
COPYING.LIB	COPYING: replace FSF postal address with licenses URL	2025-06-26 00:42:37 +02:00
cpu-common.c	cpus: Prefer cached CpuClass over CPU_GET_CLASS() macro	2025-03-09 17:00:47 +01:00
cpu-target.c	qemu: Convert target_name() to TargetInfo API	2025-04-25 17:09:58 +02:00
event-loop-base.c	qom: Make InterfaceInfo[] uses const	2025-04-25 17:00:41 +02:00
gitdm.config	contrib/gitdm: add group map for AMD	2023-03-22 15:08:26 +00:00
hmp-commands-info.hx	accel/tcg: Remove 'info opcount' and @x-query-opcount	2025-07-04 14:43:45 +02:00
hmp-commands.hx	hmp/migration: Fix "migrate" command's documentation	2024-05-08 09:22:37 -03:00
iothread.c	qom: Have class_init() take a const data argument	2025-04-25 17:00:41 +02:00
job-qmp.c	qapi job: Elide redundant has_FOO in generated C	2022-12-14 20:04:47 +01:00
job.c	test-bdrv-drain: Fix data races	2025-04-08 15:00:01 +02:00
Kconfig	build-sys: Add rust feature option	2024-10-07 16:41:58 +02:00
Kconfig.host	pvg: do not enable it on cross-architecture targets	2025-02-25 16:18:11 +01:00
LICENSE	tcg/LICENSE: Remove out of date claim about TCG subdirectory licensing	2019-11-11 15:11:21 +01:00
MAINTAINERS	MAINTAINERS: Add me as reviewer of overall accelerators section	2025-07-04 15:37:08 +02:00
Makefile	Makefile: prune quilt source files for cscope	2025-07-03 13:42:28 +02:00
meson.build	accel/hvf: Trace VM memory mapping	2025-07-01 15:08:33 +01:00
meson_options.txt	meson: drop --enable-avx* options	2025-05-12 10:35:25 +02:00
module-common.c	all: Clean up includes	2016-02-04 17:41:30 +00:00
os-posix.c	os: add an ability to lock memory on_fault	2025-02-12 11:36:01 -05:00
os-wasm.c	include/qemu/osdep.h: Add Emscripten-specific OS dependencies	2025-05-06 16:02:04 +02:00
os-win32.c	include: Rename sysemu/ -> system/	2024-12-20 17:44:56 +01:00
page-target.c	page-vary: Move and rename qemu_target_page_bits_min	2025-04-23 15:04:57 -07:00
page-vary-common.c	Remove qemu-common.h include from most units	2022-04-06 14:31:55 +02:00
page-vary-target.c	page-vary: Restrict scope of TARGET_PAGE_BITS_MIN	2025-04-23 15:04:57 -07:00
pythondeps.toml	meson: update to version 1.8.1	2025-06-03 22:42:18 +02:00
qemu-bridge-helper.c	qemu-bridge-helper: relocate path to default ACL	2020-09-30 19:11:36 +02:00
qemu-edid.c	qemu-edid: Restrict input parameter -d to avoid division by zero	2022-10-12 13:38:15 +02:00
qemu-img-cmds.hx	docs/devel/docs: Document .hx file syntax	2024-01-15 17:12:22 +00:00
qemu-img.c	block/snapshot: move drain outside of read-locked bdrv_snapshot_delete()	2025-06-04 18:16:33 +02:00
qemu-io-cmds.c	qapi: Move include/qapi/qmp/ to include/qobject/	2025-02-10 15:33:16 +01:00
qemu-io.c	qapi: Move include/qapi/qmp/ to include/qobject/	2025-02-10 15:33:16 +01:00
qemu-keymap.c	cleanup: Drop pointless return at end of function	2025-04-24 09:33:42 +02:00
qemu-nbd.c	nbd: Defer trace init until after daemonization	2025-03-05 13:00:22 -06:00
qemu-options.hx	qemu-options.hx: Fix reversed description of icount sleep behavior	2025-06-13 10:05:19 +01:00
qemu.nsi	pc-bios: Move device tree files in their own subdir	2025-04-25 17:09:58 +02:00
qemu.sasl	sasl: remove comment about obsolete kerberos versions	2021-06-14 13:28:50 +01:00
README.rst	README.rst: add the missing punctuations	2024-07-17 14:04:15 +03:00
replication.c	replication: move include out of root directory	2021-05-26 14:49:46 +02:00
target-info-stub.c	qemu: Introduce target_long_bits()	2025-04-30 12:51:51 -07:00
target-info.c	qemu: Introduce target_long_bits()	2025-04-30 12:51:51 -07:00
trace-events	system/dma-helpers.c: Move trace events to system/trace-events	2024-11-19 14:14:13 +00:00
VERSION	Open 10.1 development tree	2025-04-22 15:09:23 -04:00
version.rc	configure: remove CONFIG_FILEVERSION and CONFIG_PRODUCTVERSION	2021-01-02 21:03:37 +01:00

README.rst

===========
QEMU README
===========

QEMU is a generic and open source machine & userspace emulator and
virtualizer.

QEMU is capable of emulating a complete machine in software without any
need for hardware virtualization support. By using dynamic translation,
it achieves very good performance. QEMU can also integrate with the Xen
and KVM hypervisors to provide emulated hardware while allowing the
hypervisor to manage the CPU. With hypervisor support, QEMU can achieve
near native performance for CPUs. When QEMU emulates CPUs directly it is
capable of running operating systems made for one machine (e.g. an ARMv7
board) on a different machine (e.g. an x86_64 PC board).

QEMU is also capable of providing userspace API virtualization for Linux
and BSD kernel interfaces. This allows binaries compiled against one
architecture ABI (e.g. the Linux PPC64 ABI) to be run on a host using a
different architecture ABI (e.g. the Linux x86_64 ABI). This does not
involve any hardware emulation, simply CPU and syscall emulation.

QEMU aims to fit into a variety of use cases. It can be invoked directly
by users wishing to have full control over its behaviour and settings.
It also aims to facilitate integration into higher level management
layers, by providing a stable command line interface and monitor API.
It is commonly invoked indirectly via the libvirt library when using
open source applications such as oVirt, OpenStack and virt-manager.

QEMU as a whole is released under the GNU General Public License,
version 2. For full licensing details, consult the LICENSE file.


Documentation
=============

Documentation can be found hosted online at
`<https://www.qemu.org/documentation/>`_. The documentation for the
current development version that is available at
`<https://www.qemu.org/docs/master/>`_ is generated from the ``docs/``
folder in the source tree, and is built by `Sphinx
<https://www.sphinx-doc.org/en/master/>`_.


Building
========

QEMU is multi-platform software intended to be buildable on all modern
Linux platforms, OS-X, Win32 (via the Mingw64 toolchain) and a variety
of other UNIX targets. The simple steps to build QEMU are:


.. code-block:: shell

  mkdir build
  cd build
  ../configure
  make

Additional information can also be found online via the QEMU website:

* `<https://wiki.qemu.org/Hosts/Linux>`_
* `<https://wiki.qemu.org/Hosts/Mac>`_
* `<https://wiki.qemu.org/Hosts/W32>`_


Submitting patches
==================

The QEMU source code is maintained under the GIT version control system.

.. code-block:: shell

   git clone https://gitlab.com/qemu-project/qemu.git

When submitting patches, one common approach is to use 'git
format-patch' and/or 'git send-email' to format & send the mail to the
qemu-devel@nongnu.org mailing list. All patches submitted must contain
a 'Signed-off-by' line from the author. Patches should follow the
guidelines set out in the `style section
<https://www.qemu.org/docs/master/devel/style.html>`_ of
the Developers Guide.

Additional information on submitting patches can be found online via
the QEMU website:

* `<https://wiki.qemu.org/Contribute/SubmitAPatch>`_
* `<https://wiki.qemu.org/Contribute/TrivialPatches>`_

The QEMU website is also maintained under source control.

.. code-block:: shell

  git clone https://gitlab.com/qemu-project/qemu-web.git

* `<https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/>`_

A 'git-publish' utility was created to make above process less
cumbersome, and is highly recommended for making regular contributions,
or even just for sending consecutive patch series revisions. It also
requires a working 'git send-email' setup, and by default doesn't
automate everything, so you may want to go through the above steps
manually for once.

For installation instructions, please go to:

*  `<https://github.com/stefanha/git-publish>`_

The workflow with 'git-publish' is:

.. code-block:: shell

  $ git checkout master -b my-feature
  $ # work on new commits, add your 'Signed-off-by' lines to each
  $ git publish

Your patch series will be sent and tagged as my-feature-v1 if you need to refer
back to it in the future.

Sending v2:

.. code-block:: shell

  $ git checkout my-feature # same topic branch
  $ # making changes to the commits (using 'git rebase', for example)
  $ git publish

Your patch series will be sent with 'v2' tag in the subject and the git tip
will be tagged as my-feature-v2.

Bug reporting
=============

The QEMU project uses GitLab issues to track bugs. Bugs
found when running code built from QEMU git or upstream released sources
should be reported via:

* `<https://gitlab.com/qemu-project/qemu/-/issues>`_

If using QEMU via an operating system vendor pre-built binary package, it
is preferable to report bugs to the vendor's own bug tracker first. If
the bug is also known to affect latest upstream code, it can also be
reported via GitLab.

For additional information on bug reporting consult:

* `<https://wiki.qemu.org/Contribute/ReportABug>`_


ChangeLog
=========

For version history and release notes, please visit
`<https://wiki.qemu.org/ChangeLog/>`_ or look at the git history for
more detailed information.


Contact
=======

The QEMU community can be contacted in a number of ways, with the two
main methods being email and IRC:

* `<mailto:qemu-devel@nongnu.org>`_
* `<https://lists.nongnu.org/mailman/listinfo/qemu-devel>`_
* #qemu on irc.oftc.net

Information on additional methods of contacting the community can be
found online via the QEMU website:

* `<https://wiki.qemu.org/Contribute/StartHere>`_