Jit64: srawx - Optimize shift by constant

More efficient code can be generated if the shift amount is known at
compile time. We can once again take advantage of shifts with the shift
amount in an 8-bit immediate to eliminate ECX as a scratch register,
reducing register pressure and removing the occasional spill. We can
also do 32-bit shifts instead of 64-bit operations.

We recognize four distinct cases:

- The special case where we're dealing with the PowerPC's quirky shift
  amount masking. If the shift amount is a number from 32 to 63, all
  bits are shifted out and the result it either all zeroes or all ones.

Before:
B9 F0 FF FF FF       mov         ecx,0FFFFFFF0h
8B F7                mov         esi,edi
48 C1 E6 20          shl         rsi,20h
48 D3 FE             sar         rsi,cl
8B C6                mov         eax,esi
48 C1 EE 20          shr         rsi,20h
85 F0                test        eax,esi
0F 95 45 58          setne       byte ptr [rbp+58h]

After:
8B F7                mov         esi,edi
C1 FE 1F             sar         esi,1Fh
0F 95 45 58          setne       byte ptr [rbp+58h]

- The shift amount is zero. Not calculation needs to be done, just clear
  the carry flag.

Before:
B9 00 00 00 00       mov         ecx,0
49 C1 E5 20          shl         r13,20h
49 D3 FD             sar         r13,cl
41 8B C5             mov         eax,r13d
49 C1 ED 20          shr         r13,20h
44 85 E8             test        eax,r13d
0F 95 45 58          setne       byte ptr [rbp+58h]

After:
C6 45 58 00          mov         byte ptr [rbp+58h],0

- The carry flag doesn't need to be computed. Just do the arithmetic
  shift.

Before:
B9 02 00 00 00       mov         ecx,2
48 C1 E7 20          shl         rdi,20h
48 D3 FF             sar         rdi,cl
48 C1 EF 20          shr         rdi,20h

After:
C1 FF 02             sar         edi,2

- The carry flag must be computed. In addition to the arithmetic shift,
  we do a shift to the left and and them together to know if any ones
  were shifted out. It's still better than before, because we can do
  32-bit shifts.

Before:
B9 02 00 00 00       mov         ecx,2
49 C1 E5 20          shl         r13,20h
49 D3 FD             sar         r13,cl
41 8B C5             mov         eax,r13d
49 C1 ED 20          shr         r13,20h
44 85 E8             test        eax,r13d
0F 95 45 58          setne       byte ptr [rbp+58h]

After:
41 8B C5             mov         eax,r13d
41 C1 FD 02          sar         r13d,2
C1 E0 1E             shl         eax,1Eh
44 85 E8             test        eax,r13d
0F 95 45 58          setne       byte ptr [rbp+58h]
This commit is contained in:
Sintendo 2020-11-18 00:03:16 +01:00
parent 17dc870847
commit b968120f8a
1 changed files with 37 additions and 0 deletions

View File

@ -1907,6 +1907,43 @@ void Jit64::srawx(UGeckoInstruction inst)
int b = inst.RB; int b = inst.RB;
int s = inst.RS; int s = inst.RS;
if (gpr.IsImm(b))
{
u32 amount = gpr.Imm32(b);
RCX64Reg Ra = gpr.Bind(a, RCMode::Write);
RCOpArg Rs = gpr.Use(s, RCMode::Read);
RegCache::Realize(Ra, Rs);
if (a != s)
MOV(32, Ra, Rs);
bool special = amount & 0x20;
amount &= 0x1f;
if (special)
{
SAR(32, Ra, Imm8(31));
FinalizeCarry(CC_NZ);
}
else if (amount == 0)
{
FinalizeCarry(false);
}
else if (!js.op->wantsCA)
{
SAR(32, Ra, Imm8(amount));
FinalizeCarry(CC_NZ);
}
else
{
MOV(32, R(RSCRATCH), Ra);
SAR(32, Ra, Imm8(amount));
SHL(32, R(RSCRATCH), Imm8(32 - amount));
TEST(32, Ra, R(RSCRATCH));
FinalizeCarry(CC_NZ);
}
}
else
{ {
RCX64Reg ecx = gpr.Scratch(ECX); // no register choice RCX64Reg ecx = gpr.Scratch(ECX); // no register choice
RCX64Reg Ra = gpr.Bind(a, RCMode::Write); RCX64Reg Ra = gpr.Bind(a, RCMode::Write);