Jit64: fselx - Optimize SSE4.1 packed

Pretty much the same optimization we did for AVX, although slightly more constrained because we're stuck with the two-operand instruction where destination and source have to match. We could also specialize the case where registers b, c, and d are all distinct, but I decided against it since I couldn't find any game that does this. Before: 66 0F 57 C0 xorpd xmm0,xmm0 66 41 0F C2 C1 06 cmpnlepd xmm0,xmm9 41 0F 28 CE movaps xmm1,xmm14 66 41 0F 38 15 CC blendvpd xmm1,xmm12,xmm0 44 0F 28 F1 movaps xmm14,xmm1 After: 66 0F 57 C0 xorpd xmm0,xmm0 66 41 0F C2 C1 06 cmpnlepd xmm0,xmm9 66 45 0F 38 15 F4 blendvpd xmm14,xmm12,xmm0
2020-07-29 17:28:48 +02:00 · 2020-07-29 17:28:48 +02:00 · 31755bc13a
parent afb86a12ab
commit 31755bc13a
1 changed files with 6 additions and 0 deletions
--- a/Source/Core/Core/PowerPC/Jit64/Jit_FloatingPoint.cpp
+++ b/Source/Core/Core/PowerPC/Jit64/Jit_FloatingPoint.cpp
@ -459,6 +459,12 @@ void Jit64::fselx(UGeckoInstruction inst)
  }
  else if (cpu_info.bSSE4_1)
  {
+    if (packed && d == c)
+    {
+      BLENDVPD(Rd, Rb);
+      return;
+    }
+
    MOVAPD(XMM1, Rc);
    BLENDVPD(XMM1, Rb);
  }