Based on GCC docs[1], we use the '-mpower8-vector' flag at config-time
to detect the toolchain support to the bcdsub instruction. LLVM/Clang
supports this flag since version 3.6[2], but the instruction and related
builtins were only added in LLVM 14[3]. In the absence of other means to
detect this support at config-time, we resort to __has_builtin to
identify the presence of __builtin_bcdsub at compile-time. If the
builtin is not available, the instruction is emitted with a ".long".
[1] https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html
[2] 59eb767e11
[3] c933c2eb33
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220304165417.1981159-5-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Using __int128 with inline asm constraints like "v" generates incorrect
code when compiling with LLVM/Clang (e.g., only one doubleword of the
VSR is loaded). Instead, use a GPR pair to pass the 128-bits value and
load the VSR with mtvsrd/xxmrghd.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Matheus Ferst <matheus.ferst@eldorado.org.br>
Message-Id: <20220304165417.1981159-4-matheus.ferst@eldorado.org.br>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
The commit d03b174a83 (target/ppc: simplify bcdadd/sub functions)
meant to simplify some of the code but it inadvertently altered the
way the CR6 field is set after the operation has overflowed.
The CR6 bits are set based on the *unbounded* result of the operation,
so we need to look at the result before returning from bcd_add_mag,
otherwise we will look at 0 when it overflows.
Consider the following subtraction:
v0 = 0x9999999999999999999999999999999c (maximum positive BCD value)
v1 = 0x0000000000000000000000000000001d (negative one BCD value)
bcdsub. v0,v0,v1,0
The Power ISA 2.07B says:
If the unbounded result is greater than zero, do the following.
If PS=0, the sign code of the result is set to 0b1100.
If PS=1, the sign code of the result is set to 0b1111.
If the operation overflows, CR field 6 is set to 0b0101. Otherwise,
CR field 6 is set to 0b0100.
POWER9 hardware:
vr0 = 0x0000000000000000000000000000000c (positive zero BCD value)
cr6 = 0b0101 (0x5) (positive, overflow)
QEMU:
vr0 = 0x0000000000000000000000000000000c (positive zero BCD value)
cr6 = 0b0011 (0x3) (zero, overflow) <--- wrong
This patch reverts the part of d03b174a83 that introduced the
problem and adds a test-case to avoid further regressions:
before:
$ make run-tcg-tests-ppc64le-linux-user
(...)
TEST bcdsub on ppc64le
bcdsub: qemu/tests/tcg/ppc64le/bcdsub.c:58: test_bcdsub_gt:
Assertion `(cr >> 4) == ((1 << 2) | (1 << 0))' failed.
Fixes: d03b174a83 (target/ppc: simplify bcdadd/sub functions)
Reported-by: Paul Clarke <pc@us.ibm.com>
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Message-Id: <20210222194035.2723056-1-farosas@linux.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>