Seems to be pretty high in the profile in some geometry-heavy games like The
Last Story, and the compiler-generated assembly is terrifyingly bad, so
SSE-ize it.
Just use regular boolean negation in our pixel shader's depth test everywhere except on Qualcomm.
This works around a bug in the Intel Windows driver where comparing a boolean value against true or false fails but boolean negation works fine.
Quite silly.
Should fix issues #7830 and #7899.
We try to keep as many registers as possible in callee saved registers, so if we have guest registers in the correct registers and the interpreter
call we are falling back to doesn't need the registers then we can dump just those ones. Which means we don't have to dump 100% of our register state
when falling to the interpreter.
ComputeRC was a bit unclear by using 64bit registers for setting the immediate and then calling SXTW on a 6b4it register which is just a bit obscure.
When the source register is an immediate in cntlzwx, just use the built in GCC function instead of our own implementing for counting leading zeros.
Before block linking was enabled but it wasn't ever implemented.
Implements link blocks and destroy block functions and moves the downcount check in the WriteExit function so it doesn't get overwritten when linking.
Changes the dispatcher to make sure to we are saving the LR(X30) to the stack. Also makes sure to keep the stack aligned.
AArch64's AAPCS64 mandates the stack to be quad-word aligned.
Fixes the dispatcher from infinite looping due to a downcount check jumping to the dispatcher. This was because checking exceptions and the state
pointer wouldn't reset the global conditional flags. So it would leave the timing/exception, jump to the start of the dispatcher and then jump back
again due to the conditional branch.
Removes the REG_AWAY nonsense I was doing. I've got to get the JIT more up to speed before thinking of insane register cache things.
Also fixes a bug in immediate setting where if the register being set to an immediate already had a host register tied to it then it wouldn't free the
register it had. Resulting in register exhaustion.
lmw/stmw weren't properly setting input and output registers since they use multiple registers.
dcbz was just missing a flag in the instruction tables.