Over time OnData() has become a huge function-long case statement that
attempts to manage numerous packet-related behaviors, which makes it a
little difficult to reliably ensure certain handling doesn't interfere
with another case's. It's also mildly annoying to navigate due to its
size.
To make it a little easier to read and find the specific behavior, we
can break the relevant pieces of code out into their own functions.
When RenderDoc is attached, wglShareLists fails for some reason (see baldurk/renderdoc#2361). wglCreateContextAttribsARB has a parameter for the share context, so there's no reason to use a separate wglShareLists call.
Co-authored-by: baldurk <baldurk@baldurk.org>
Previous code from #7950 only clamps correctly when the efb copies
left and top coordinates are (0, 0)
Now we should handle all situations.
Spyro: A hero's tail is an example of a game that does an oversized
EFB copy with a non-zero origin.
If W0 is locked when fpr.RW is called, the indirectly called
ConvertSingleToDoubleLower may need to emit a push+pop, so it's
better for fresx/frsqrtex to call RW before locking W0 than after.
This way the address check will take up less icache (since it's
only emitted once for each routine rather than once for each
psq_st instruction), and we also get address checking for psq_l.
Matches Jit64's approach.
The disadvantage: In the slowmem case, the routines have to
push *every* caller-saved register onto the stack, even though
most callers probably don't need it. But at long as the slowmem
case isn't hit frequently, this is fine.
In the case of the JitAsm routines, we can't actually use
backpatching. Still, I would like to gather all the load and
store instructions in one place to make future changes easier.
This adjusts the NaN replacement logic introduced in #9928 to work around the HLSL compiler optimizing away calls to isnan, which caused that functionality to not work with ubershaders on D3D11 and D3D12 (it did work with specialized shaders, despite a warning being logged for both; that warning is also now gone). Note that the `D3DCOMPILE_IEEE_STRICTNESS` flag did not solve this issue, despite the warning suggesting that it might.
Suggested by @kayru and @jamiehayes.
This is a proper fix for the issue that 3071a1d was a workaround for.
It wasn't some kind of bug in the register cache that had laid dormant,
it was a simple mistake made in b24b79e.
Fixes a regression from ecf86bb.
The GPR allocation_order is initialized with only 28 elements,
so the 29th element ends up getting zero initialized.
Very sneaky bug...