CPUID[4].EAX[bits 25:14] is used to represent the cache topology for
Intel CPUs.
After cache models have topology information, we can use
CPUCacheInfo.share_level to decide which topology level to be encoded
into CPUID[4].EAX[bits 25:14].
And since with the helper max_processor_ids_for_cache(), the filed
CPUID[4].EAX[bits 25:14] (original virable "num_apic_ids") is parsed
based on cpu topology levels, which are verified when parsing -smp, it's
no need to check this value by "assert(num_apic_ids > 0)" again, so
remove this assert().
Additionally, wrap the encoding of CPUID[4].EAX[bits 31:26] into a
helper to make the code cleaner.
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-21-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, by default, the cache topology is encoded as:
1. i/d cache is shared in one core.
2. L2 cache is shared in one core.
3. L3 cache is shared in one die.
This default general setting has caused a misunderstanding, that is, the
cache topology is completely equated with a specific cpu topology, such
as the connection between L2 cache and core level, and the connection
between L3 cache and die level.
In fact, the settings of these topologies depend on the specific
platform and are not static. For example, on Alder Lake-P, every
four Atom cores share the same L2 cache.
Thus, we should explicitly define the corresponding cache topology for
different cache models to increase scalability.
Except legacy_l2_cache_cpuid2 (its default topo level is
CPU_TOPO_LEVEL_UNKNOW), explicitly set the corresponding topology level
for all other cache models. In order to be compatible with the existing
cache topology, set the CPU_TOPO_LEVEL_CORE level for the i/d cache, set
the CPU_TOPO_LEVEL_CORE level for L2 cache, and set the
CPU_TOPO_LEVEL_DIE level for L3 cache.
The field for CPUID[4].EAX[bits 25:14] or CPUID[0x8000001D].EAX[bits
25:14] will be set based on CPUCacheInfo.share_level.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20240424154929.1487382-20-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce module-id to be consistent with the module-id field in
CpuInstanceProperties.
Following the legacy smp check rules, also add the module_id validity
into x86_cpu_pre_plug().
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-17-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linux kernel (from v6.4, with commit edc0a2b595765 ("x86/topology: Fix
erroneous smp_num_siblings on Intel Hybrid platforms") is able to
handle platforms with Module level enumerated via CPUID.1F.
Expose the module level in CPUID[0x1F] if the machine has more than 1
modules.
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-15-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Support module level in i386 cpu topology structure "X86CPUTopoInfo".
Since x86 does not yet support the "modules" parameter in "-smp",
X86CPUTopoInfo.modules_per_die is currently always 1.
Therefore, the module level width in APIC ID, which can be calculated by
"apicid_bitwidth_for_count(topo_info->modules_per_die)", is always 0 for
now, so we can directly add APIC ID related helpers to support module
level parsing.
In addition, update topology structure in test-x86-topo.c.
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-14-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Intel CPUs implement module level on hybrid client products (e.g.,
ADL-N, MTL, etc) and E-core server products.
A module contains a set of cores that share certain resources (in
current products, the resource usually includes L2 cache, as well as
module scoped features and MSRs).
Module level support is the prerequisite for L2 cache topology on
module level. With module level, we can implement the Guest's CPU
topology and future cache topology to be consistent with the Host's on
Intel hybrid client/E-core server platforms.
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Co-developed-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-13-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
At present, the subleaf 0x02 of CPUID[0x1F] is bound to the "die" level.
In fact, the specific topology level exposed in 0x1F depends on the
platform's support for extension levels (module, tile and die).
To help expose "module" level in 0x1F, decouple CPUID[0x1F] subleaf
with specific topology level.
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240424154929.1487382-12-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CPUID[0xB] defines SMT, Core and Invalid types, and this leaf is shared
by Intel and AMD CPUs.
But for extended topology levels, Intel CPU (in CPUID[0x1F]) and AMD CPU
(in CPUID[0x80000026]) have the different definitions with different
enumeration values.
Though CPUID[0x80000026] hasn't been implemented in QEMU, to avoid
possible misunderstanding, split topology types of CPUID[0x1F] from the
definitions of CPUID[0xB] and introduce CPUID[0x1F]-specific topology
types.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-11-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently, QEMU checks the specify number of topology domains to detect
if there's extended topology levels (e.g., checking nr_dies).
With this bitmap, the extended CPU topology (the levels other than SMT,
core and package) could be easier to detect without touching the
topology details.
This is also in preparation for the follow-up to decouple CPUID[0x1F]
subleaf with specific topology level.
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240424154929.1487382-10-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In cpu_x86_cpuid(), there are many variables in representing the cpu
topology, e.g., topo_info, cs->nr_cores and cs->nr_threads.
Since the names of cs->nr_cores and cs->nr_threads do not accurately
represent its meaning, the use of cs->nr_cores or cs->nr_threads is
prone to confusion and mistakes.
And the structure X86CPUTopoInfo names its members clearly, thus the
variable "topo_info" should be preferred.
In addition, in cpu_x86_cpuid(), to uniformly use the topology variable,
replace env->dies with topo_info.dies_per_pkg as well.
Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-9-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The commit 8f4202fb10 ("i386: Populate AMD Processor Cache Information
for cpuid 0x8000001D") adds the cache topology for AMD CPU by encoding
the number of sharing threads directly.
From AMD's APM, NumSharingCache (CPUID[0x8000001D].EAX[bits 25:14])
means [1]:
The number of logical processors sharing this cache is the value of
this field incremented by 1. To determine which logical processors are
sharing a cache, determine a Share Id for each processor as follows:
ShareId = LocalApicId >> log2(NumSharingCache+1)
Logical processors with the same ShareId then share a cache. If
NumSharingCache+1 is not a power of two, round it up to the next power
of two.
From the description above, the calculation of this field should be same
as CPUID[4].EAX[bits 25:14] for Intel CPUs. So also use the offsets of
APIC ID to calculate this field.
[1]: APM, vol.3, appendix.E.4.15 Function 8000_001Dh--Cache Topology
Information
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240424154929.1487382-8-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Refer to the fixes of cache_info_passthrough ([1], [2]) and SDM, the
CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26] should use the
nearest power-of-2 integer.
The nearest power-of-2 integer can be calculated by pow2ceil() or by
using APIC ID offset/width (like L3 topology using 1 << die_offset [3]).
But in fact, CPUID.04H:EAX[bits 25:14] and CPUID.04H:EAX[bits 31:26]
are associated with APIC ID. For example, in linux kernel, the field
"num_threads_sharing" (Bits 25 - 14) is parsed with APIC ID. And for
another example, on Alder Lake P, the CPUID.04H:EAX[bits 31:26] is not
matched with actual core numbers and it's calculated by:
"(1 << (pkg_offset - core_offset)) - 1".
Therefore the topology information of APIC ID should be preferred to
calculate nearest power-of-2 integer for CPUID.04H:EAX[bits 25:14] and
CPUID.04H:EAX[bits 31:26]:
1. d/i cache is shared in a core, 1 << core_offset should be used
instead of "cs->nr_threads" in encode_cache_cpuid4() for
CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14].
2. L2 cache is supposed to be shared in a core as for now, thereby
1 << core_offset should also be used instead of "cs->nr_threads" in
encode_cache_cpuid4() for CPUID.04H.02H:EAX[bits 25:14].
3. Similarly, the value for CPUID.04H:EAX[bits 31:26] should also be
calculated with the bit width between the package and SMT levels in
the APIC ID (1 << (pkg_offset - core_offset) - 1).
In addition, use APIC ID bits calculations to replace "pow2ceil()" for
cache_info_passthrough case.
[1]: efb3934adf ("x86: cpu: make sure number of addressable IDs for processor cores meets the spec")
[2]: d7caf13b5f ("x86: cpu: fixup number of addressable IDs for logical processors sharing cache")
[3]: d65af288a8 ("i386: Update new x86_apicid parsing rules with die_offset support")
Fixes: 7e3482f824 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Message-ID: <20240424154929.1487382-7-zhao1.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For i-cache and d-cache, current QEMU hardcodes the maximum IDs for CPUs
sharing cache (CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits
25:14]) to 0, and this means i-cache and d-cache are shared in the SMT
level.
This is correct if there's single thread per core, but is wrong for the
hyper threading case (one core contains multiple threads) since the
i-cache and d-cache are shared in the core level other than SMT level.
For AMD CPU, commit 8f4202fb10 ("i386: Populate AMD Processor Cache
Information for cpuid 0x8000001D") has already introduced i/d cache
topology as core level by default.
Therefore, in order to be compatible with both multi-threaded and
single-threaded situations, we should set i-cache and d-cache be shared
at the core level by default.
This fix changes the default i/d cache topology from per-thread to
per-core. Potentially, this change in L1 cache topology may affect the
performance of the VM if the user does not specifically specify the
topology or bind the vCPU. However, the way to achieve optimal
performance should be to create a reasonable topology and set the
appropriate vCPU affinity without relying on QEMU's default topology
structure.
Fixes: 7e3482f824 ("i386: Helpers to encode cache information consistently")
Suggested-by: Robert Hoo <robert.hu@linux.intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20240424154929.1487382-6-zhao1.liu@intel.com>
[Add compat property. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
LAM uses CR3[61] and CR3[62] to configure/enable LAM on user pointers.
LAM uses CR4[28] to configure/enable LAM on supervisor pointers.
For CR3 LAM bits, no additional handling needed:
- TCG
LAM is not supported for TCG of target-i386. helper_write_crN() and
helper_vmrun() check max physical address bits before calling
cpu_x86_update_cr3(), no change needed, i.e. CR3 LAM bits are not allowed
to be set in TCG.
- gdbstub
x86_cpu_gdb_write_register() will call cpu_x86_update_cr3() to update cr3.
Allow gdb to set the LAM bit(s) to CR3, if vcpu doesn't support LAM,
KVM_SET_SREGS will fail as other reserved bits.
For CR4 LAM bit, its reservation depends on vcpu supporting LAM feature or
not.
- TCG
LAM is not supported for TCG of target-i386. helper_write_crN() and
helper_vmrun() check CR4 reserved bit before calling cpu_x86_update_cr4(),
i.e. CR4 LAM bit is not allowed to be set in TCG.
- gdbstub
x86_cpu_gdb_write_register() will call cpu_x86_update_cr4() to update cr4.
Mask out LAM bit on CR4 if vcpu doesn't support LAM.
- x86_cpu_reset_hold() doesn't need special handling.
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240112060042.19925-3-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Linear Address Masking (LAM) is a new Intel CPU feature, which allows
software to use of the untranslated address bits for metadata.
The bit definition:
CPUID.(EAX=7,ECX=1):EAX[26]
Add CPUID definition for LAM.
Note LAM feature is not supported for TCG of target-i386, LAM CPIUD bit
will not be added to TCG_7_1_EAX_FEATURES.
More info can be found in Intel ISE Chapter "LINEAR ADDRESS MASKING(LAM)"
https://cdrdv2.intel.com/v1/dl/getContent/671368
Signed-off-by: Robert Hoo <robert.hu@linux.intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Xuelian Guo <xuelian.guo@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-ID: <20240112060042.19925-2-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The 32-bit AAM/AAD opcodes are using helpers that read and write flags and
env->regs[R_EAX]. Clean them up so that the table correctly includes AX
as a 16-bit input and output.
No real reason to do it to be honest, but they are nice one-output helpers
and it removes the masking of env->regs[R_EAX] that generic load/writeback
code already does.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240522123912.608497-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
gen_rot_carry and gen_rot_overflow are meant to be called with count == NULL
if the count cannot be zero. However this is not done in gen_ROL and gen_ROR,
and writing everywhere "can_be_zero ? count : NULL" is burdensome and less
readable. Just pass can_be_zero as a separate argument.
gen_RCL and gen_RCR use a conditional branch to skip the computation
if count is zero, so they can pass false unconditionally to gen_rot_overflow.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240522123914.608516-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Use TCG_COND_TST where applicable.
- Use CF_BP_PAGE instead of a local breakpoint search.
- Clean up IAOQ handling during translation.
- Implement CF_PCREL.
- Implement PSW.B.
- Implement PSW.X.
- Log cpu state on interrupt and rfi.
-----BEGIN PGP SIGNATURE-----
iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmZEgnwdHHJpY2hhcmQu
aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+43gf8CakQdMSqfGV2nGP+
7wWZOAV04IyfkJ38F/CH0ihUkblEOzXJ1shTFkrHEw257j0D10MctSSbjrqz5BwU
obQcwoVlxzTGXqzhkZ6wagkcqjv3TtlPtznZIk6JssdlrtwIKDmE2/3t1dzHnyBD
WTrS0SK3YvVRovq/ai51raUbiBsNq7XG3skHEsMKsFxp4EaDP5JTbputdQWdffjh
TBmXImhHC3gm09KWIUZwfEBHlaa7YXk2orzB8kBE8S2kQj9vrGXEaC4jYnBcQLPw
NDDkBYRqxHYQr0vIAHee+5cUgt1jDBr5rXnAnJwzK0wyEEc4Mi4OTPhNE604iu2y
SDxS8Q==
=A4Qf
-----END PGP SIGNATURE-----
Merge tag 'pull-hppa-20240515' of https://gitlab.com/rth7680/qemu into staging
target/hppa:
- Use TCG_COND_TST where applicable.
- Use CF_BP_PAGE instead of a local breakpoint search.
- Clean up IAOQ handling during translation.
- Implement CF_PCREL.
- Implement PSW.B.
- Implement PSW.X.
- Log cpu state on interrupt and rfi.
# -----BEGIN PGP SIGNATURE-----
#
# iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmZEgnwdHHJpY2hhcmQu
# aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+43gf8CakQdMSqfGV2nGP+
# 7wWZOAV04IyfkJ38F/CH0ihUkblEOzXJ1shTFkrHEw257j0D10MctSSbjrqz5BwU
# obQcwoVlxzTGXqzhkZ6wagkcqjv3TtlPtznZIk6JssdlrtwIKDmE2/3t1dzHnyBD
# WTrS0SK3YvVRovq/ai51raUbiBsNq7XG3skHEsMKsFxp4EaDP5JTbputdQWdffjh
# TBmXImhHC3gm09KWIUZwfEBHlaa7YXk2orzB8kBE8S2kQj9vrGXEaC4jYnBcQLPw
# NDDkBYRqxHYQr0vIAHee+5cUgt1jDBr5rXnAnJwzK0wyEEc4Mi4OTPhNE604iu2y
# SDxS8Q==
# =A4Qf
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 15 May 2024 11:38:04 AM CEST
# gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg: issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [ultimate]
* tag 'pull-hppa-20240515' of https://gitlab.com/rth7680/qemu: (43 commits)
target/hppa: Log cpu state on return-from-interrupt
target/hppa: Log cpu state at interrupt
target/hppa: Implement CF_PCREL
target/hppa: Adjust priv for B,GATE at runtime
target/hppa: Drop tlb_entry return from hppa_get_physical_address
target/hppa: Implement PSW_X
target/hppa: Implement PSW_B
target/hppa: Manage PSW_X and PSW_B in translator
target/hppa: Split PSW X and B into their own field
target/hppa: Improve hppa_cpu_dump_state
target/hppa: Do not mask in copy_iaoq_entry
target/hppa: Store full iaoq_f and page offset of iaoq_b in TB
linux-user/hppa: Force all code addresses to PRIV_USER
target/hppa: Use delay_excp for conditional trap on overflow
target/hppa: Use delay_excp for conditional traps
target/hppa: Introduce DisasDelayException
target/hppa: Remove cond_free
target/hppa: Use TCG_COND_TST* in trans_ftest
target/hppa: Use registerfields.h for FPSR
target/hppa: Use TCG_COND_TST* in trans_bb_imm
...
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Now that the groundwork has been laid, enabling CF_PCREL within the
translator proper is a simple matter of updating copy_iaoq_entry
and install_iaq_entries.
We also need to modify the unwind info, since we no longer have
absolute addresses to install.
As expected, this reduces the runtime overhead of compilation when
running a Linux kernel with address space randomization enabled.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Do not compile in the priv change based on the first translation;
look up the PTE at execution time. This is required for CF_PCREL,
where a page may be mapped multiple times with different attributes.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The return-by-reference is never used.
Reviewed-by: Helge Deller <deller@gmx.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use PAGE_WRITE_INV to temporarily enable write permission
on for a given page, driven by PSW_X being set.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
PSW_B causes B,GATE to trap as an illegal instruction, removing our
previous sequential execution test that was merely an approximation.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
PSW_X is cleared after every instruction, and only set by RFI.
PSW_B is cleared after every non-branch, or branch not taken,
and only set by taken branches. We can clear both bits with a
single store, at most once per TB. Taken branches set PSW_B,
at most once per TB.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Generally, both of these bits are cleared at the end of each
instruction. By separating these, we will be able to clear
both with a single insn, instead of 2 or 3.
Reviewed-by: Helge Deller <deller@gmx.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Print both raw IAQ_Front and IAQ_Back as well as the GVAs.
Print control registers in system mode.
Print floating point registers if CPU_DUMP_FPU.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
As with loads and stores, code offsets are kept intact until the
full gva is formed. In qemu, this is in cpu_get_tb_cpu_state.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In preparation for CF_PCREL. store the iaoq_f in 3 parts: high
bits in cs_base, middle bits in pc, and low bits in priv.
For iaoq_b, set a bit for either of space or page differing,
else the page offset.
Install iaq entries before goto_tb. The change to not record
the full direct branch difference in TB means that we have to
store at least iaoq_b before goto_tb. But since a later change
to enable CF_PCREL will require both iaoq_f and iaoq_b to be
updated before goto_tb, go ahead and update both fields now.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The kernel does this along the return path to user mode.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Allow an exception to be emitted at the end of the TranslationBlock,
leaving only the conditional branch inline. Use it for simple
exception instructions like break, which happen to be nullified.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Now that we do not need to free tcg temporaries, the only
thing cond_free does is reset the condition to never.
Instead, simply write a new condition over the old, which
may be simply cond_make_f() for the never condition.
The do_*_cond functions do the right thing with c or cf == 0,
so there's no need for a special case anymore.
Reviewed-by: Helge Deller <deller@gmx.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Define all of the context dependent field definitions.
Use FIELD_EX32 and FIELD_DP32 with named fields instead
of extract32 and deposit32 with raw constants.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We can directly test bits of a 32-bit comparison without
zero or sign-extending an intermediate result.
We can directly test bit 0 for odd/even.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
We can directly test bits of a 32-bit comparison without
zero or sign-extending an intermediate result.
We can directly test bit 0 for odd/even.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Use 'v' for a variable that needs copying, 't' for a temp that
doesn't need copying, and 'i' for an immediate, and use this
naming for both arguments of the comparison. So:
cond_make_tmp -> cond_make_tt
cond_make_0_tmp -> cond_make_ti
cond_make_0 -> cond_make_vi
cond_make -> cond_make_vv
Pass 0 explictly, rather than implicitly in the function name.
Reviewed-by: Helge Deller <deller@gmx.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This is a first step in enabling CF_PCREL, but for now
we regenerate the absolute address before writeback.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Wrap offset and space together in one structure, ensuring
that they're copied together as required.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This simplifies callers, which might otherwise have
to make another copy.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Using umax is clearer than the same operation using movcond.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
This allows unification of BE, BLR, BV, BVE with a common helper.
Since we can now track space with IAQ_Next, we can now let the
TranslationBlock continue across the delay slot with BE, BVE.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Move space assighments to a central location.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add variable to track space changes to IAQ. So far, no such changes
are introduced, but the new checks vs ctx->iasq_b may eliminate an
unnecessary copy to cpu_iasq_f with e.g. BLR.
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>