Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20220426163043.100432-23-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The length of the previous insn may be computed from
the difference of start and end addresses.
Use tcg_constant_i32 while we're at it.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Use tcg_gen_umin_i32 instead of tcg_gen_movcond_i32.
Use tcg_constant_i32 while we're at it.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Instead of computing
tmp1 = shift & 0xff;
dest = (tmp1 > 0x1f ? 0 : value) << (tmp1 & 0x1f)
use
tmpd = value << (shift & 0x1f);
dest = shift & 0xe0 ? 0 : tmpd;
which has a flatter dependency tree.
Use tcg_constant_i32 while we're at it.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For aa32, the function has a parameter to use the new el.
For aa64, that never happens.
Use tcg_constant_i32 while we're at it.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Currently we assume all fields are 32-bit.
Prepare for fields of a single byte, using sizeof_field().
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
[PMM: use sizeof_field() instead of raw sizeof()]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Bool is a more appropriate type for this value.
Move the member down in the struct to keep the
bool type members together and remove a hole.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We have fetched and locked the logfile in translator_loop.
Pass the filepointer down to the disas_log hook so that it
need not be fetched and locked again.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220417183019.755276-13-richard.henderson@linaro.org>
Replace a config-time define with a compile time condition
define (compatible with clang and gcc) that must be declared prior to
its usage. This avoids having a global configure time define, but also
prevents from bad usage, if the config header wasn't included before.
This can help to make some code independent from qemu too.
gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2.
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
[ For the s390x parts I'm involved in ]
Acked-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Renaming defines for quad in their various forms so that their signedness is
now explicit.
Done using git grep as suggested by Philippe, with a bit of hand edition to
keep assignments aligned.
Signed-off-by: Frédéric Pétrot <frederic.petrot@univ-grenoble-alpes.fr>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Message-id: 20220106210108.138226-2-frederic.petrot@univ-grenoble-alpes.fr
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Misaligned thumb PC is architecturally impossible.
Assert is better than proceeding, in case we've missed
something somewhere.
Expand a comment about aligning the pc in gdbstub.
Fail an incoming migrate if a thumb pc is misaligned.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
For A64, any input to an indirect branch can cause this.
For A32, many indirect branch paths force the branch to be aligned,
but BXWritePC does not. This includes the BX instruction but also
other interworking changes to PC. Prior to v8, this case is UNDEFINED.
With v8, this is CONSTRAINED UNPREDICTABLE and may either raise an
exception or force align the PC.
We choose to raise an exception because we have the infrastructure,
it makes the generated code for gen_bx simpler, and it has the
possibility of catching more guest bugs.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Create arm_check_ss_active and arm_check_kernelpage.
Reverse the order of the tests. While it doesn't matter in practice,
because only user-only has a kernel page and user-only never sets
ss_active, ss_active has priority over execution exceptions and it
is best to keep them in the proper order.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Since the mask is a constant value, use tcg_constant_i32()
instead of a TCG temporary.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211029231834.2476117-6-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
When using a constant variable, we can replace the store_cpu_field()
call by store_cpu_field_constant() which avoid using TCG temporaries.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211029231834.2476117-4-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Avoid using a TCG temporary for a read-only constant.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20211029231834.2476117-2-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Our current codegen for MVE always calls out to helper functions,
because some byte lanes might be predicated. The common case is that
in fact there is no predication active and all lanes should be
updated together, so we can produce better code by detecting that and
using the TCG generic vector infrastructure.
Add a TB flag that is set when we can guarantee that there is no
active MVE predication, and a bool in the DisasContext. Subsequent
patches will use this flag to generate improved code for some
instructions.
In most cases when the predication state changes we simply end the TB
after that instruction. For the code called from vfp_access_check()
that handles lazy state preservation and creating a new FP context,
we can usually avoid having to try to end the TB because luckily the
new value of the flag following the register changes in those
sequences doesn't depend on any runtime decisions. We do have to end
the TB if the guest has enabled lazy FP state preservation but not
automatic state preservation, but this is an odd corner case that is
not going to be common in real-world code.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-4-peter.maydell@linaro.org
Currently gen_jmp_tb() assumes that if it is called then the jump it
is handling is the only reason that we might be trying to end the TB,
so it will use goto_tb if it can. This is usually the case: mostly
"we did something that means we must end the TB" happens on a
non-branch instruction. However, there are cases where we decide
early in handling an instruction that we need to end the TB and
return to the main loop, and then the insn is a complex one that
involves gen_jmp_tb(). For instance, for M-profile FP instructions,
in gen_preserve_fp_state() which is called from vfp_access_check() we
want to force an exit to the main loop if lazy state preservation is
active and we are in icount mode.
Make gen_jmp_tb() look at the current value of is_jmp, and only use
goto_tb if the previous is_jmp was DISAS_NEXT or DISAS_TOO_MANY.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210913095440.13462-2-peter.maydell@linaro.org
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
[rth: Split out of a larger patch.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In v8A, the PSTATE.IL bit is set for various kinds of illegal
exception return or mode-change attempts. We already set PSTATE.IL
(or its AArch32 equivalent CPSR.IL) in all those cases, but we
weren't implementing the part of the behaviour where attempting to
execute an instruction with PSTATE.IL takes an immediate exception
with an appropriate syndrome value.
Add a new TB flags bit tracking PSTATE.IL/CPSR.IL, and generate code
to take an exception instead of whatever the instruction would have
been.
PSTATE.IL and CPSR.IL change only on exception entry, attempted
exception exit, and various AArch32 mode changes via cpsr_write().
These places generally already rebuild the hflags, so the only place
we need an extra rebuild_hflags call is in the illegal-return
codepath of the AArch64 exception_return helper.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210821195958.41312-2-richard.henderson@linaro.org
Message-Id: <20210817162118.24319-1-peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[rth: Added missing returns; set IL bit in syndrome]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
In v7A, the HSTR register has a TJDBX bit which traps NS EL0/EL1
access to the JOSCR and JMCR trivial Jazelle registers, and also BXJ.
Implement these traps. In v8A this HSTR bit doesn't exist, so don't
trap for v8A CPUs.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210816180305.20137-3-peter.maydell@linaro.org
Unlike A-profile, for M-profile the UDIV and SDIV insns can be
configured to raise an exception on division by zero, using the CCR
DIV_0_TRP bit.
Implement support for setting this bit by making the helper functions
raise the appropriate exception.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210730151636.17254-3-peter.maydell@linaro.org
Implement the MVE VCTP insn, which sets the VPR.P0 predicate bits so
as to predicate any element at index Rn or greater is predicated. As
with VPNOT, this insn itself is predicable and subject to beatwise
execution.
The calculation of the mask is the same as is used to determine
ltpmask in mve_element_mask(), but we precalculate masklen in
generated code to avoid having to have 4 helpers specialized by size.
We put the decode line in with the low-overhead-loop insns in
t32.decode because it's logically part of that collection of insn
patterns, even though it is an MVE only insn.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
For M-profile, unlike A-profile, the low 2 bits of SP are defined to be
RES0H, which is to say that they must be hardwired to zero so that
guest attempts to write non-zero values to them are ignored.
Implement this behaviour by masking out the low bits:
* for writes to r13 by the gdbstub
* for writes to any of the various flavours of SP via MSR
* for writes to r13 via store_reg() in generated code
Note that all the direct uses of cpu_R[] in translate.c are in places
where the register is definitely not r13 (usually because that has
been checked for as an UNDEFINED or UNPREDICTABLE case and handled as
UNDEF).
All the other writes to regs[13] in C code are either:
* A-profile only code
* writes of values we can guarantee to be aligned, such as
- writes of previous-SP-value plus or minus a 4-aligned constant
- writes of the value in an SP limit register (which we already
enforce to be aligned)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210723162146.5167-2-peter.maydell@linaro.org
The hook is now unused, with breakpoints checked outside translation.
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Just use translator_use_goto_tb directly at the one call site,
rather than maintaining a local wrapper.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Using gen_goto_tb directly misses the single-step check.
Let the branch or debug exception be emitted by arm_tr_tb_stop.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
The root trace-events only declares a single TCG event:
$ git grep -w tcg trace-events
trace-events:115:# tcg/tcg-op.c
trace-events:137:vcpu tcg guest_mem_before(TCGv vaddr, uint16_t info) "info=%d", "vaddr=0x%016"PRIx64" info=%d"
and only a tcg/tcg-op.c uses it:
$ git grep -l trace_guest_mem_before_tcg
tcg/tcg-op.c
therefore it is pointless to include "trace-tcg.h" in each target
(because it is not used). Remove it.
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20210629050935.2570721-1-f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Implement the MVE shifts by register, which perform
shifts on a single general-purpose register.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-19-peter.maydell@linaro.org
Implement the MVE shifts by immediate, which perform shifts
on a single general-purpose register.
These patterns overlap with the long-shift-by-immediates,
so we have to rearrange the grouping a little here.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-18-peter.maydell@linaro.org
Implement the MVE long shifts by register, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
the shift count in another general-purpose register, which might be
either positive or negative.
Like the long-shifts-by-immediate, these encodings sit in the space
that was previously the UNPREDICTABLE MOVS/ORRS with Rm==13,15.
Because LSLL_rr and ASRL_rr overlap with both MOV_rxri/ORR_rrri and
also with CSEL (as one of the previously-UNPREDICTABLE Rm==13 cases),
we have to move the CSEL pattern into the same decodetree group.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-17-peter.maydell@linaro.org
The MVE extension to v8.1M includes some new shift instructions which
sit entirely within the non-coprocessor part of the encoding space
and which operate only on general-purpose registers. They take up
the space which was previously UNPREDICTABLE MOVS and ORRS encodings
with Rm == 13 or 15.
Implement the long shifts by immediate, which perform shifts on a
pair of general-purpose registers treated as a 64-bit quantity, with
an immediate shift count between 1 and 32.
Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for
the Rm==13,15 case, we need to explicitly emit code to UNDEF for the
cases where v8.1M now requires that. (Trying to change MOVS and ORRS
is too difficult, because the functions that generate the code are
shared between a dozen different kinds of arithmetic or logical
instruction for all A32, T16 and T32 encodings, and for some insns
and some encodings Rm==13,15 are valid.)
We make the helper functions we need for UQSHLL and SQSHLL take
a 32-bit value which the helper casts to int8_t because we'll need
these helpers also for the shift-by-register insns, where the shift
count might be < 0 or > 32.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-16-peter.maydell@linaro.org
The A64 AdvSIMD modified-immediate grouping uses almost the same
constant encoding that A32 Neon does; reuse asimd_imm_const() (to
which we add the AArch64-specific case for cmode 15 op 1) instead of
reimplementing it all.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-5-peter.maydell@linaro.org
The function asimd_imm_const() in translate-neon.c is an
implementation of the pseudocode AdvSIMDExpandImm(), which we will
also want for MVE. Move the implementation to translate.c, with a
prototype in translate.h.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-4-peter.maydell@linaro.org
The new bswap flags can implement the semantics exactly.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Implement the new semantics in the fallback expansion.
Change all callers to supply the flags that keep the
semantics unchanged locally.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Add the framework for decoding MVE insns, with the necessary new
files and the meson.build rules, but no actual content yet.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-11-peter.maydell@linaro.org
Implement the MVE LETP insn. This is like the existing LE loop-end
insn, but it must perform an FPU-enabled check, and on loop-exit it
resets LTPSIZE to 4.
To accommodate the requirement to do something on loop-exit, we drop
the use of condlabel and instead manage both the TB exits manually,
in the same way we already do in trans_WLS().
The other MVE-specific change to the LE insn is that we must raise an
INVSTATE UsageFault insn if LTPSIZE is not 4.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-10-peter.maydell@linaro.org
Implement the MVE DLSTP insn; this is like the existing DLS
insn, except that it must do an FPU access check and it
sets LTPSIZE to the value specified in the insn.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-9-peter.maydell@linaro.org
Implement the MVE WLSTP insn; this is like the existing WLS insn,
except that it specifies a size value which is used to set
FPSCR.LTPSIZE.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-8-peter.maydell@linaro.org
Implement the MVE LCTP instruction.
We put its decode and implementation with the other
low-overhead-branch insns because although it is only present if MVE
is implemented it is logically in the same group as the other LOB
insns.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-7-peter.maydell@linaro.org
On A-profile, PSR bits [15:10][26:25] are always the IT state bits.
On M-profile, some of the reserved encodings of the IT state are used
to instead indicate partial progress through instructions that were
interrupted partway through by an exception and can be resumed.
These resumable instructions fall into two categories:
(1) load/store multiple instructions, where these bits are called
"ICI" and specify the register in the ldm/stm list where execution
should resume. (Specifically: LDM, STM, VLDM, VSTM, VLLDM, VLSTM,
CLRM, VSCCLRM.)
(2) MVE instructions subject to beatwise execution, where these bits
are called "ECI" and specify which beats in this and possibly also
the following MVE insn have been executed.
There are also a few insns (LE, LETP, and BKPT) which do not use the
ICI/ECI bits but must leave them alone.
Otherwise, we should raise an INVSTATE UsageFault for any attempt to
execute an insn with non-zero ICI/ECI bits.
So far we have been able to ignore ECI/ICI, because the architecture
allows the IMPDEF choice of "always restart load/store multiple from
the beginning regardless of ICI state", so the only thing we have
been missing is that we don't raise the INVSTATE fault for bad guest
code. However, MVE requires that we honour ECI bits and do not
rexecute beats of an insn that have already been executed.
Add the support in the decoder for handling ECI/ICI:
* identify the ECI/ICI case in the CONDEXEC TB flags
* when a load/store multiple insn succeeds, it updates the ECI/ICI
state (both in DisasContext and in the CPU state), and sets a flag
to say that the ECI/ICI state was handled
* if we find that the insn we just decoded did not handle the
ECI/ICI state, we delete all the code that we just generated for
it and instead emit the code to raise the INVFAULT. This allows
us to avoid having to update every non-MVE non-LDM/STM insn to
make it check for "is ECI/ICI set?".
We continue with our existing IMPDEF choice of not caring about the
ICI state for the load/store multiples and simply restarting them
from the beginning. Because we don't allow interrupts in the middle
of an insn, the only way we would see this state is if the guest set
ICI manually on return from an exception handler, so it's a corner
case which doesn't merit optimisation.
ICI update for LDM/STM is simple -- it always zeroes the state. ECI
update for MVE beatwise insns will be a little more complex, since
the ECI state may include information for the following insn.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210614151007.4545-5-peter.maydell@linaro.org
tb_gen_code() assumes that tb->size must never be zero, otherwise it
may produce spurious exceptions. For ARM this may happen when creating
a translation block for the commpage.
Fix by pretending that commpage translation blocks have at least one
instruction.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20210416154939.32404-3-iii@linux.ibm.com>
Signed-off-by: Cornelia Huck <cohuck@redhat.com>
Switch translate-neon.c.inc from being #included into translate.c
to being its own compilation unit.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-14-peter.maydell@linaro.org
Make the remaining functions needed by the translate-neon code
global.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-13-peter.maydell@linaro.org
Move the NeonGenThreeOpEnvFn typedef to translate.h together
with the other similar typedefs.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210430132740.10391-12-peter.maydell@linaro.org
The VFPGenFixPointFn typedef is unused; delete it.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id: 20210430132740.10391-11-peter.maydell@linaro.org
The function vfp_reg_ptr() is used only in translate-neon.c.inc;
move it there.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-10-peter.maydell@linaro.org
Switch translate-vfp.c.inc from being #included into translate.c
to being its own compilation unit.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-9-peter.maydell@linaro.org
Make the remaining functions which are needed by translate-vfp.c.inc
global.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-8-peter.maydell@linaro.org
The functions vfp_load_reg32(), vfp_load_reg64(), vfp_store_reg32()
and vfp_store_reg64() are used only in translate-vfp.c.inc. Move
them to that file.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-7-peter.maydell@linaro.org
Move the various gen_aa32* functions and macros out of translate.c
and into translate-a32.h.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-6-peter.maydell@linaro.org
Currently the trans functions for m-nocp.decode all live in
translate-vfp.inc.c; move them out into their own translation unit,
translate-m-nocp.c.
The trans_* functions here are pure code motion with no changes.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-5-peter.maydell@linaro.org
We want to split out the .c.inc files which are currently included
into translate.c so they are separate compilation units. To do this
we need to make some functions which are currently file-local to
translate.c have global scope; create a translate-a32.h paralleling
the existing translate-a64.h as a place for these declarations to
live, so that code moved into the new compilation units can call
them.
The functions made global here are those required by the
m-nocp.decode functions, except that I have converted the whole
family of {read,write}_neon_element* and also both the load_cpu and
store_cpu functions for consistency, even though m-nocp only wants a
few functions from each.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-4-peter.maydell@linaro.org
The unallocated_encoding() function is the same in both
translate-a64.c and translate.c; make the translate.c function global
and drop the translate-a64.c version. To do this we need to also
share gen_exception_insn(), which currently exists in two slightly
different versions for A32 and A64: merge those into a single
function that can work for both.
This will be useful for splitting up translate.c, which will require
unallocated_encoding() to no longer be file-local. It's also
hopefully less confusing to have only one version of the function
rather than two.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-3-peter.maydell@linaro.org
Some of the constant expanders defined in translate.c are generically
useful and will be used by the separate C files for VFP and Neon once
they are created; move the expander definitions to translate.h.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210430132740.10391-2-peter.maydell@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-23-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-20-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-19-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-18-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Adjust the interface to match what has been done to the
TCGv_i32 load/store functions.
This is less obvious, because at present the only user of
these functions, trans_VLDST_multiple, also wants to manipulate
the endianness to speed up loading multiple bytes. Thus we
retain an "internal" interface which is identical to the
current gen_aa32_{ld,st}_i64 interface.
The "new" interface will gain users as we remove the legacy
interfaces, gen_aa32_ld64 and gen_aa32_st64.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-15-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Just because operating on a TCGv_i64 temporary does not
mean that we're performing a 64-bit operation. Restrict
the frobbing to actual 64-bit operations.
This bug is not currently visible because all current
users of these two functions always pass MO_64.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-14-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This is the only caller. Adjust some commentary to talk
about SCTLR_B instead of the vanishing function.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-13-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Create a finalize_memop function that computes alignment and
endianness and returns the final MemOp for the operation.
Split out gen_aa32_{ld,st}_internal_i32 which bypasses any special
handling of endianness or alignment. Adjust gen_aa32_{ld,st}_i32
so that s->be_data is not added by the callers.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-12-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Use this to signal when memory access alignment is required.
This value comes from the CCR register for M-profile, and
from the SCTLR register for A-profile.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-11-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In preparation for splitting tb->flags across multiple
fields, introduce a structure to hold the value(s).
So far this only migrates the one uint32_t and fixes
all of the places that require adjustment to match.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-6-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We're about to split tbflags into two parts. These macros
will ensure that the correct part is used with the correct
set of bits.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-5-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We're about to rearrange the macro expansion surrounding tbflags,
and this field name will be expanded using the bit definition of
the same name, resulting in a token pasting error.
So PSTATE_SS -> PSTATE__SS in the uses, and document it.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-4-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We're about to rearrange the macro expansion surrounding tbflags,
and this field name will be expanded using the bit definition of
the same name, resulting in a token pasting error.
So SCTLR_B -> SCTLR__B in the 3 uses, and document it.
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210419202257.161730-3-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The Arm ARM specifies that for Thumb encodings of the various plain
store insns, if the Rn field is 1111 then we must UNDEF. This is
different from the Arm encodings, where this case is either
UNPREDICTABLE or has well-defined behaviour. The exclusive stores,
store-release and STRD do not have this UNDEF case for any encoding.
Enforce the UNDEF for this case in the Thumb plain store insns.
Fixes: https://bugs.launchpad.net/qemu/+bug/1922887
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210408162402.5822-1-peter.maydell@linaro.org
This is a left over erroneous check from the days front-ends handled
io start/end themselves. Regardless just because IO could be performed
on the last instruction doesn't obligate the front end to do so.
This fixes an abort faced by the aspeed execute-in-place support which
will necessarily trigger this state (even before the one-shot
CF_LAST_IO fix). The test still seems to hang once it attempts to boot
the Linux kernel but I suspect this is an unrelated issue with icount
and the timer handling code.
The original intention of the cpu_abort (added in commit 2e70f6efa8
when the icount stuff was first added) seems to have been to act as
an assert() to catch an unhandled corner case where the generated code
would be something like:
conditional branch to condlabel if its cc failed
implementation of the insn (a conditional branch or trap)
code emitted by gen_io_end()
condlabel:
gen_goto_tb or equivalent thing to go to next insn
At runtime the cc-failed case would skip over the code emitted by
gen_io_end(), leaving the can_do_io flag incorrectly set.
In commit ba3e792669 we switched to an implementation which
always clears can_do_io at the start of the following TB instead
of trying to clear it at the end of a TB that did IO. So the corner
case that this cpu_abort() was trying to flag is no longer possible,
because the gen_io_end() call has been deleted. We can therefore
safely remove the no-longer-valid assertion.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20210416170207.12504-1-alex.bennee@linaro.org
Cc: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We want to move the semihosting code out of hw/ in the next patch.
This patch contains the mechanical steps, created using:
$ git mv include/hw/semihosting/ include/
$ sed -i s,hw/semihosting,semihosting, $(git grep -l hw/semihosting)
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20210226131356.3964782-2-f4bug@amsat.org>
Message-Id: <20210305135451.15427-2-alex.bennee@linaro.org>
This adds handling for the SCR_EL3.EEL2 bit.
Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Message-id: 20210112104511.36576-17-remi.denis.courmont@huawei.com
[PMM: Applied fixes for review issues noted by RTH:
- check for FEATURE_AARCH64 before checking sel2 isar feature
- correct the commit message subject line]
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
On ARMv8-A, accesses by 32-bit secure EL1 to monitor registers trap to
the upper (64-bit) EL. With Secure EL2 support, we can no longer assume
that that is always EL3, so make room for the value to be computed at
run-time.
Signed-off-by: Rémi Denis-Courmont <remi.denis.courmont@huawei.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210112104511.36576-16-remi.denis.courmont@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
In commit cd8be50e58 we converted the A32 coprocessor
insns to decodetree. This accidentally broke XScale/iWMMXt insns,
because it moved the handling of "cp insns which are handled
by looking up the cp register in the hashtable" from after the
call to the legacy disas_xscale_insn() decode to before it,
with the result that all XScale/iWMMXt insns now UNDEF.
Update valid_cp() so that it knows that on XScale cp 0 and 1
are not standard coprocessor instructions; this will cause
the decodetree trans_ functions to ignore them, so that
execution will correctly get through to the legacy decode again.
Cc: qemu-stable@nongnu.org
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Message-id: 20210108195157.32067-1-peter.maydell@linaro.org
The constant-expander functions like negate, plus_2, etc, are
generally useful; move them up in translate.c so we can use them in
the VFP/Neon decoders as well as in the A32/T32/T16 decoders.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201119215617.29887-9-peter.maydell@linaro.org
In v8.1M the new CLRM instruction allows zeroing an arbitrary set of
the general-purpose registers and APSR. Implement this.
The encoding is a subset of the LDMIA T2 encoding, using what would
be Rn=0b1111 (which UNDEFs for LDMIA).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201119215617.29887-6-peter.maydell@linaro.org
Implement the v8.1M VSCCLRM insn, which zeros floating point
registers if there is an active floating point context.
This requires support in write_neon_element32() for the MO_32
element size, so add it.
Because we want to use arm_gen_condlabel(), we need to move
the definition of that function up in translate.c so it is
before the #include of translate-vfp.c.inc.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201119215617.29887-5-peter.maydell@linaro.org
There is no "version 2" of the "Lesser" General Public License.
It is either "GPL version 2.0" or "Lesser GPL version 2.1".
This patch replaces all occurrences of "Lesser GPL version 2" with
"Lesser GPL version 2.1" in comment section.
Signed-off-by: Chetan Pant <chetan4windows@gmail.com>
Message-Id: <20201023122913.19561-1-chetan4windows@gmail.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Fix code style. Space required before the open parenthesis '('.
Signed-off-by: Xinhao Zhang <zhangxinhao1@huawei.com>
Signed-off-by: Kai Deng <dengkai1@huawei.com>
Message-id: 20201103114529.638233-3-zhangxinhao1@huawei.com
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We can use proper widening loads to extend 32-bit inputs,
and skip the "widenfn" step.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-12-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The only uses of this function are for loading VFP
double-precision values, and nothing to do with NEON.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-10-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Replace all uses of neon_load/store_reg64 within translate-neon.c.inc.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-9-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
The only uses of this function are for loading VFP
single-precision values, and nothing to do with NEON.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-8-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
We can then use this to improve VMOV (scalar to gp) and
VMOV (gp to scalar) so that we simply perform the memory
operation that we wanted, rather than inserting or
extracting from a 32-bit quantity.
These were the last uses of neon_load/store_reg, so remove them.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-7-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Model these off the aa64 read/write_vec_element functions.
Use it within translate-neon.c.inc. The new functions do
not allocate or free temps, so this rearranges the calling
code a bit.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-6-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This seems a bit more readable than using offsetof CPU_DoubleU.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-5-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
These are the only users of neon_reg_offset, so remove that.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-4-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This will shortly have users outside of translate-neon.c.inc.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-3-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
This function makes it clear that we're talking about the whole
register, and not the 32-bit piece at index 0. This fixes a bug
when running on a big-endian host.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201030022618.785675-2-richard.henderson@linaro.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
v8.1M's "low-overhead-loop" extension has three instructions
for looping:
* DLS (start of a do-loop)
* WLS (start of a while-loop)
* LE (end of a loop)
The loop-start instructions are both simple operations to start a
loop whose iteration count (if any) is in LR. The loop-end
instruction handles "decrement iteration count and jump back to loop
start"; it also caches the information about the branch back to the
start of the loop to improve performance of the branch on subsequent
iterations.
As with the branch-future instructions, the architecture permits an
implementation to discard the LO_BRANCH_INFO cache at any time, and
QEMU takes the IMPDEF option to never set it in the first place
(equivalent to discarding it immediately), because for us a "real"
implementation would be unnecessary complexity.
(This implementation only provides the simple looping constructs; the
vector extension MVE (Helium) adds some extra variants to handle
looping across vectors. We'll add those later when we implement
MVE.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201019151301.2046-8-peter.maydell@linaro.org
v8.1M implements a new 'branch future' feature, which is a
set of instructions that request the CPU to perform a branch
"in the future", when it reaches a particular execution address.
In hardware, the expected implementation is that the information
about the branch location and destination is cached and then
acted upon when execution reaches the specified address.
However the architecture permits an implementation to discard
this cached information at any point, and so guest code must
always include a normal branch insn at the branch point as
a fallback. In particular, an implementation is specifically
permitted to treat all BF insns as NOPs (which is equivalent
to discarding the cached information immediately).
For QEMU, implementing this caching of branch information
would be complicated and would not improve the speed of
execution at all, so we make the IMPDEF choice to implement
all BF insns as NOPs.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201019151301.2046-7-peter.maydell@linaro.org
The BLX immediate insn in the Thumb encoding always performs
a switch from Thumb to Arm state. This would be totally useless
in M-profile which has no Arm decoder, and so the instruction
does not exist at all there. Make the encoding UNDEF for M-profile.
(This part of the encoding space is used for the branch-future
and low-overhead-loop insns in v8.1M.)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20201019151301.2046-6-peter.maydell@linaro.org
The SMLAD instruction is supposed to:
* signed multiply Rn[15:0] * Rm[15:0]
* signed multiply Rn[31:16] * Rm[31:16]
* perform a signed addition of the products and Ra
* set Rd to the low 32 bits of the theoretical
infinite-precision result
* set the Q flag if the sign-extension of Rd
would differ from the infinite-precision result
(ie on overflow)
Our current implementation doesn't quite do this, though: it performs
an addition of the products setting Q on overflow, and then it adds
Ra, again possibly setting Q. This sometimes incorrectly sets Q when
the architecturally mandated only-check-for-overflow-once algorithm
does not. For instance:
r1 = 0x80008000; r2 = 0x80008000; r3 = 0xffffffff
smlad r0, r1, r2, r3
This is (-32768 * -32768) + (-32768 * -32768) - 1
The products are both 0x4000_0000, so when added together as 32-bit
signed numbers they overflow (and QEMU sets Q), but because the
addition of Ra == -1 brings the total back down to 0x7fff_ffff
there is no overflow for the complete operation and setting Q is
incorrect.
Fix this edge case by resorting to 64-bit arithmetic for the
case where we need to add three values together.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20201009144712.11187-1-peter.maydell@linaro.org