The calculation of each address in lmw/stmw currently has a dependency
on the calculation of the previous address. By removing this dependency,
the host CPU should be able to pipeline the loads/stores better. The cost
we pay for this is up to one extra register and one extra MOV instruction
per guest instruction, but often nothing.
Making EmitBackpatchRoutine support using any register as the address
register would let us get rid of the MOV, but I consider that to be too
big of a task to do in one go at the same time as this.
Now that we've flipped the C++20 switch, let's start making use of
the nice new <bit> header.
I'm planning on handling this move away from BitUtils.h incrementally
in a series of PRs. There may be a few functions remaining in
BitUtils.h by the end that C++20 doesn't have any equivalents for.
This reverts commit 351d095fff.
In hindsight, my attempted optimization messes with the return
predictor, unlike real tail calls. So I think it does more bad than
good.
The "vector shift by immediate" category encodes the shift amount for
right shifts as `size - amount`, whereas left shifts use `amount`.
We're not actually using SHRN/SHRN2 anywhere, which is why this has gone
undetected.
Use: callstack(0x80000000).
!callstack(value) works as a 'does not contain'.
Add strings to expr.h conditionals.
Use quotations: callstack("anim") to check symbols/name.
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.