ShuriZma/xemu - xemu

Commit Graph

Author	SHA1	Message	Date
Peter Maydell	09b0d9e0ad	target/arm: Prepare bfdotadd() callers for FEAT_EBF support We use bfdotadd() in four callsites for various helper functions. Currently this all assumes that we have the FPCR.EBF=0 semantics. For FPCR.EBF=1 we will need to: * call a different routine to bfdotadd() because we need to do a fused multiply-add rather than separate multiply and add steps * use a different float_status that honours the FPCR rounding mode and denormal-flushing fields * pass in an extra float_status that has been set up to perform round-to-odd rounding To prepare for this, refactor all the callsites so that instead of for (...) { x = bfdotadd(...); } they are: float_status fpst, fpst_odd; if (is_ebf(env, &fpst, &fpst_odd)) { for (...) { x = bfdotadd_ebf(..., &fpst, &fpst_odd); } } else { for (...) { x = bfdotadd(..., &fpst); } } For the moment the is_ebf() function always returns false, sets up fpst for EBF=0 semantics and never sets up fpst_odd; bfdotadd_ebf() will assert if called. We'll fill in the handling for EBF=1 in the next commit. This change should be a zero-behaviour-change refactor. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2024-09-05 13:12:36 +01:00
Peter Maydell	ecabcfa47c	target/arm: Pass env pointer through to sme_bfmopa helper To implement the FEAT_EBF16 semantics, we are going to need the CPUARMState env pointer in every helper function which calls bfdotadd(). Pass the env pointer through from generated code to the sme_bfmopa helper. (We'll add the code that uses it when we've adjusted all the helpers to have access to the env pointer.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2024-09-05 13:12:35 +01:00
Peter Maydell	55f9f4ee01	target/arm: Handle denormals correctly for FMOPA (widening) The FMOPA (widening) SME instruction takes pairs of half-precision floating point values, widens them to single-precision, does a two-way dot product and accumulates the results into a single-precision destination. We don't quite correctly handle the FPCR bits FZ and FZ16 which control flushing of denormal inputs and outputs. This is because at the moment we pass a single float_status value to the helper function, which then uses that configuration for all the fp operations it does. However, because the inputs to this operation are float16 and the outputs are float32 we need to use the fp_status_f16 for the float16 input widening but the normal fp_status for everything else. Otherwise we will apply the flushing control FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and incorrectly flush a denormal output to zero when we should not (or vice-versa). (In commit `207d30b5fd` we tried to fix the FZ handling but didn't get it right, switching from "use FPCR.FZ for everything" to "use FPCR.FZ16 for everything".) Pass the CPU env to the sme_fmopa_h helper instead of an fp_status pointer, and have the helper pass an extra fp_status into the f16_dotadd() function so that we can use the right status for the right parts of this operation. Cc: qemu-stable@nongnu.org Fixes: `207d30b5fd` ("target/arm: Use FPST_F16 for SME FMOPA (widening)") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>	2024-08-01 10:15:03 +01:00
Peter Maydell	ea3f5a90f0	target/arm: Fix UMOPA/UMOPS of 16-bit values The UMOPA/UMOPS instructions are supposed to multiply unsigned 8 or 16 bit elements and accumulate the products into a 64-bit element. In the Arm ARM pseudocode, this is done with the usual infinite-precision signed arithmetic. However our implementation doesn't quite get it right, because in the DEF_IMOP_64() macro we do: sum += (NTYPE)(n >> 0) * (MTYPE)(m >> 0); where NTYPE and MTYPE are uint16_t or int16_t. In the uint16_t case, the C usual arithmetic conversions mean the values are converted to "int" type and the multiply is done as a 32-bit multiply. This means that if the inputs are, for example, 0xffff and 0xffff then the result is 0xFFFE0001 as an int, which is then promoted to uint64_t for the accumulation into sum; this promotion incorrectly sign extends the multiply. Avoid the incorrect sign extension by casting to int64_t before the multiply, so we do the multiply as 64-bit signed arithmetic, which is a type large enough that the multiply can never overflow into the sign bit. (The equivalent 8-bit operations in DEF_IMOP_32() are fine, because the 8-bit multiplies can never overflow into the sign bit of a 32-bit integer.) Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2372 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240722172957.1041231-3-peter.maydell@linaro.org	2024-07-29 16:56:46 +01:00
Richard Henderson	3b9991e35c	target/arm: Use set/clear_helper_retaddr in SVE and SME helpers Avoid a race condition with munmap in another thread. Use around blocks that exclusively use "host_fn". Keep the blocks as small as possible, but without setting and clearing for every operation on one page. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2024-07-23 10:56:04 +10:00
Daniyal Khan	31d93fedf4	target/arm: Use float_status copy in sme_fmopa_s We made a copy above because the fp exception flags are not propagated back to the FPST register, but then failed to use the copy. Cc: qemu-stable@nongnu.org Fixes: `558e956c71` ("target/arm: Implement FMOPA, FMOPS (non-widening)") Signed-off-by: Daniyal Khan <danikhan632@gmail.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20240717060149.204788-2-richard.henderson@linaro.org [rth: Split from a larger patch] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2024-07-18 13:49:30 +01:00
Richard Henderson	d572bcb222	target/arm: Fix 32-bit SMOPA While the 8-bit input elements are sequential in the input vector, the 32-bit output elements are not sequential in the output matrix. Do not attempt to compute 2 32-bit outputs at the same time. Cc: qemu-stable@nongnu.org Fixes: `23a5e3859f` ("target/arm: Implement SME integer outer product") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2083 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240305163931.242795-1-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2024-03-07 12:49:16 +00:00
Richard Henderson	855f94eca8	target/arm: Fix SVE/SME gross MTE suppression checks The TBI and TCMA bits are located within mtedesc, not desc. Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Gustavo Romero <gustavo.romero@linaro.org> Message-id: 20240207025210.8837-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2024-02-15 11:30:45 +00:00
Richard Henderson	3efd849573	target/arm: Fix SME FMOPA (16-bit), BFMOPA Perform the loop increment unconditionally, not nested within the predication. Cc: qemu-stable@nongnu.org Fixes: `3916841ac7` ("target/arm: Implement FMOPA, FMOPS (widening)") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1985 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20231117193135.1180657-1-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-11-20 15:17:00 +00:00
Richard Henderson	4b3520fd93	target/arm: Fix SME ST1Q A typo, noted in the bug report, resulting in an incorrect write offset. Cc: qemu-stable@nongnu.org Fixes: `7390e0e9ab` ("target/arm: Implement SME LD1, ST1") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1833 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20230818214255.146905-1-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-08-22 17:31:13 +01:00
Claudio Fontana	a3ef070ea9	target/arm: move helpers to tcg/ Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2023-02-27 13:27:04 +00:00

11 Commits