docs/devel: convert and update MTTCG design document

Do a light conversion to .rst and clean-up some of the language at the
start now MTTCG has been merged for a while.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20200709141327.14631-2-alex.bennee@linaro.org>
This commit is contained in:
Alex Bennée 2020-07-09 15:13:15 +01:00
parent 78441c04ca
commit c8c06e520d
2 changed files with 34 additions and 19 deletions

View File

@ -23,6 +23,7 @@ Contents:
decodetree decodetree
secure-coding-practices secure-coding-practices
tcg tcg
multi-thread-tcg
tcg-plugins tcg-plugins
bitops bitops
reset reset

View File

@ -1,15 +1,17 @@
Copyright (c) 2015-2016 Linaro Ltd. ..
Copyright (c) 2015-2020 Linaro Ltd.
This work is licensed under the terms of the GNU GPL, version 2 or This work is licensed under the terms of the GNU GPL, version 2 or
later. See the COPYING file in the top-level directory. later. See the COPYING file in the top-level directory.
Introduction Introduction
============ ============
This document outlines the design for multi-threaded TCG system-mode This document outlines the design for multi-threaded TCG (a.k.a MTTCG)
emulation. The current user-mode emulation mirrors the thread system-mode emulation. user-mode emulation has always mirrored the
structure of the translated executable. Some of the work will be thread structure of the translated executable although some of the
applicable to both system and linux-user emulation. changes done for MTTCG system emulation have improved the stability of
linux-user emulation.
The original system-mode TCG implementation was single threaded and The original system-mode TCG implementation was single threaded and
dealt with multiple CPUs with simple round-robin scheduling. This dealt with multiple CPUs with simple round-robin scheduling. This
@ -21,9 +23,18 @@ vCPU Scheduling
=============== ===============
We introduce a new running mode where each vCPU will run on its own We introduce a new running mode where each vCPU will run on its own
user-space thread. This will be enabled by default for all FE/BE user-space thread. This is enabled by default for all FE/BE
combinations that have had the required work done to support this combinations where the host memory model is able to accommodate the
safely. guest (TCG_GUEST_DEFAULT_MO & ~TCG_TARGET_DEFAULT_MO is zero) and the
guest has had the required work done to support this safely
(TARGET_SUPPORTS_MTTCG).
System emulation will fall back to the original round robin approach
if:
* forced by --accel tcg,thread=single
* enabling --icount mode
* 64 bit guests on 32 bit hosts (TCG_OVERSIZED_GUEST)
In the general case of running translated code there should be no In the general case of running translated code there should be no
inter-vCPU dependencies and all vCPUs should be able to run at full inter-vCPU dependencies and all vCPUs should be able to run at full
@ -61,7 +72,9 @@ have their block-to-block jumps patched.
Global TCG State Global TCG State
---------------- ----------------
### User-mode emulation User-mode emulation
~~~~~~~~~~~~~~~~~~~
We need to protect the entire code generation cycle including any post We need to protect the entire code generation cycle including any post
generation patching of the translated code. This also implies a shared generation patching of the translated code. This also implies a shared
translation buffer which contains code running on all cores. Any translation buffer which contains code running on all cores. Any
@ -78,9 +91,11 @@ patching.
Code generation is serialised with mmap_lock(). Code generation is serialised with mmap_lock().
### !User-mode emulation !User-mode emulation
~~~~~~~~~~~~~~~~~~~~
Each vCPU has its own TCG context and associated TCG region, thereby Each vCPU has its own TCG context and associated TCG region, thereby
requiring no locking. requiring no locking during translation.
Translation Blocks Translation Blocks
------------------ ------------------
@ -92,6 +107,7 @@ including:
- debugging operations (breakpoint insertion/removal) - debugging operations (breakpoint insertion/removal)
- some CPU helper functions - some CPU helper functions
- linux-user spawning it's first thread
This is done with the async_safe_run_on_cpu() mechanism to ensure all This is done with the async_safe_run_on_cpu() mechanism to ensure all
vCPUs are quiescent when changes are being made to shared global vCPUs are quiescent when changes are being made to shared global
@ -250,8 +266,10 @@ to enforce a particular ordering of memory operations from the point
of view of external observers (e.g. another processor core). They can of view of external observers (e.g. another processor core). They can
apply to any memory operations as well as just loads or stores. apply to any memory operations as well as just loads or stores.
The Linux kernel has an excellent write-up on the various forms of The Linux kernel has an excellent `write-up
memory barrier and the guarantees they can provide [1]. <https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/memory-barriers.txt>`
on the various forms of memory barrier and the guarantees they can
provide.
Barriers are often wrapped around synchronisation primitives to Barriers are often wrapped around synchronisation primitives to
provide explicit memory ordering semantics. However they can be used provide explicit memory ordering semantics. However they can be used
@ -352,7 +370,3 @@ an exclusive lock which ensures all emulation is serialised.
While the atomic helpers look good enough for now there may be a need While the atomic helpers look good enough for now there may be a need
to look at solutions that can more closely model the guest to look at solutions that can more closely model the guest
architectures semantics. architectures semantics.
==========
[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/memory-barriers.txt