Jit64: srawx - Optimize shift by constant

More efficient code can be generated if the shift amount is known at compile time. We can once again take advantage of shifts with the shift amount in an 8-bit immediate to eliminate ECX as a scratch register, reducing register pressure and removing the occasional spill. We can also do 32-bit shifts instead of 64-bit operations. We recognize four distinct cases: - The special case where we're dealing with the PowerPC's quirky shift amount masking. If the shift amount is a number from 32 to 63, all bits are shifted out and the result it either all zeroes or all ones. Before: B9 F0 FF FF FF mov ecx,0FFFFFFF0h 8B F7 mov esi,edi 48 C1 E6 20 shl rsi,20h 48 D3 FE sar rsi,cl 8B C6 mov eax,esi 48 C1 EE 20 shr rsi,20h 85 F0 test eax,esi 0F 95 45 58 setne byte ptr [rbp+58h] After: 8B F7 mov esi,edi C1 FE 1F sar esi,1Fh 0F 95 45 58 setne byte ptr [rbp+58h] - The shift amount is zero. Not calculation needs to be done, just clear the carry flag. Before: B9 00 00 00 00 mov ecx,0 49 C1 E5 20 shl r13,20h 49 D3 FD sar r13,cl 41 8B C5 mov eax,r13d 49 C1 ED 20 shr r13,20h 44 85 E8 test eax,r13d 0F 95 45 58 setne byte ptr [rbp+58h] After: C6 45 58 00 mov byte ptr [rbp+58h],0 - The carry flag doesn't need to be computed. Just do the arithmetic shift. Before: B9 02 00 00 00 mov ecx,2 48 C1 E7 20 shl rdi,20h 48 D3 FF sar rdi,cl 48 C1 EF 20 shr rdi,20h After: C1 FF 02 sar edi,2 - The carry flag must be computed. In addition to the arithmetic shift, we do a shift to the left and and them together to know if any ones were shifted out. It's still better than before, because we can do 32-bit shifts. Before: B9 02 00 00 00 mov ecx,2 49 C1 E5 20 shl r13,20h 49 D3 FD sar r13,cl 41 8B C5 mov eax,r13d 49 C1 ED 20 shr r13,20h 44 85 E8 test eax,r13d 0F 95 45 58 setne byte ptr [rbp+58h] After: 41 8B C5 mov eax,r13d 41 C1 FD 02 sar r13d,2 C1 E0 1E shl eax,1Eh 44 85 E8 test eax,r13d 0F 95 45 58 setne byte ptr [rbp+58h]
2020-11-18 00:03:16 +01:00 · 2020-11-18 00:03:16 +01:00 · b968120f8a
parent 17dc870847
commit b968120f8a
1 changed files with 37 additions and 0 deletions
--- a/Source/Core/Core/PowerPC/Jit64/Jit_Integer.cpp
+++ b/Source/Core/Core/PowerPC/Jit64/Jit_Integer.cpp
@ -1907,6 +1907,43 @@ void Jit64::srawx(UGeckoInstruction inst)
  int b = inst.RB;
  int s = inst.RS;

+  if (gpr.IsImm(b))
+  {
+    u32 amount = gpr.Imm32(b);
+    RCX64Reg Ra = gpr.Bind(a, RCMode::Write);
+    RCOpArg Rs = gpr.Use(s, RCMode::Read);
+    RegCache::Realize(Ra, Rs);
+
+    if (a != s)
+      MOV(32, Ra, Rs);
+
+    bool special = amount & 0x20;
+    amount &= 0x1f;
+
+    if (special)
+    {
+      SAR(32, Ra, Imm8(31));
+      FinalizeCarry(CC_NZ);
+    }
+    else if (amount == 0)
+    {
+      FinalizeCarry(false);
+    }
+    else if (!js.op->wantsCA)
+    {
+      SAR(32, Ra, Imm8(amount));
+      FinalizeCarry(CC_NZ);
+    }
+    else
+    {
+      MOV(32, R(RSCRATCH), Ra);
+      SAR(32, Ra, Imm8(amount));
+      SHL(32, R(RSCRATCH), Imm8(32 - amount));
+      TEST(32, Ra, R(RSCRATCH));
+      FinalizeCarry(CC_NZ);
+    }
+  }
+  else
  {
    RCX64Reg ecx = gpr.Scratch(ECX);  // no register choice
    RCX64Reg Ra = gpr.Bind(a, RCMode::Write);