[x64] Add `GFNI`-based optimization for `VECTOR_SHA_V128(Int8)`

In the `Int8` case of `VECTOR_SHA_V128`, when all the values are the same, then a single-instruction `gf2p8affineqb` can be emitted that does an int8-based arithmetic-shift, utilizing GF(8) arithmetic.

More info here:
https://wunkolo.github.io/post/2020/11/gf2p8affineqb-int8-shifting/

As of now(Dec 2021): Tremont(Lakefield), Jasper Lake, Ice lake, Tigerlake, and Rocket Lake support GNFI.
This commit is contained in:
Wunkolo 2021-12-28 20:17:41 -08:00 committed by Rick Gibbed
parent fba23e3e75
commit a9a365aa32
1 changed files with 22 additions and 0 deletions

View File

@ -1084,6 +1084,28 @@ struct VECTOR_SHA_V128
static void EmitInt8(X64Emitter& e, const EmitArgType& i) { static void EmitInt8(X64Emitter& e, const EmitArgType& i) {
// TODO(benvanik): native version (with shift magic). // TODO(benvanik): native version (with shift magic).
if (i.src2.is_constant) { if (i.src2.is_constant) {
if (e.IsFeatureEnabled(kX64EmitGFNI)) {
const auto& shamt = i.src2.constant();
bool all_same = true;
for (size_t n = 0; n < 8 - n; ++n) {
if (shamt.u16[n] != shamt.u16[n + 1]) {
all_same = false;
break;
}
}
if (all_same) {
// Every count is the same, so we can use gf2p8affineqb.
const uint8_t shift_amount = shamt.u8[0];
const uint64_t shift_matrix =
shift_amount < 8
? (0x0102040810204080ULL << (shift_amount * 8)) |
(0x8080808080808080ULL >> (64 - shift_amount * 8))
: 0x8080808080808080ULL;
e.vgf2p8affineqb(i.dest, i.src1,
e.StashConstantXmm(0, vec128q(shift_matrix)), 0);
return;
}
}
e.lea(e.GetNativeParam(1), e.StashConstantXmm(1, i.src2.constant())); e.lea(e.GetNativeParam(1), e.StashConstantXmm(1, i.src2.constant()));
} else { } else {
e.lea(e.GetNativeParam(1), e.StashXmm(1, i.src2)); e.lea(e.GetNativeParam(1), e.StashXmm(1, i.src2));