This implementation is pretty efficient in my opinion. And "As
long as we aren't falling back to interpreter we're winning a lot"
applies to basically every instruction to some degree anyway.
The dcbz instruction needs to lock W30 so that the slowmem code will
push and pop it when calling into C++. Also, the slowmem code expects
that the address is present in W0, so replace the use of W0 as a scratch
register in the fastmem code with the now locked W30.
We currently have a bug when calling Arm64GPRCache::Flush with
FlushMode::MaintainState, zero free host registers, and at least
one guest register containing an immediate. We end up grabbing
a temporary register from the register cache in order to be
able to write the immediate to memory, but grabbing a temporary
register when there are zero free registers causes the least
recently used register to be flushed in a way which does not
maintain the state of the register cache.
To get around this, require callers to pass in a temporary
register in the GPR MaintainState case. In other cases,
passing in a temporary register is not required but can help
avoid spilling a register (if the caller already had a
temporary register at hand anyway, which in particular will
be the case in my upcoming memcheck pull request).
At extreme angles, there is severe shadow zfighting with fast depth
disabled. Enabling it causes it work on all backends without issue.
Verified in OpenGL, D3D11, D3D12, and Vulkan.
Sin & Punishment: Star Successor has rendering issues in the upper left
part of the screen. While we can't completely fix it, by disabling
Store EFB Copies to Texture only, we can make it from a missing box to
just a couple of lines being visible that make up the copy region.
release-ubu-x64 currently fails with "sorry, unimplemented: non-trivial
designated initializers not supported". pr-ubu-x64 doesn't for some
reason, but we might as well remove the designated initializer.
This fixes various texture offsetting issues with negative texture coordinates (bringing the software renderer in line with the hardware renderers). It also handles the invalid wrap mode accurately (as was done for the hardware renderers in the previous commit). Lastly, it handles wrapping with non-power-of-2 texture sizes in a hardware-accurate way (which is somewhat broken looking, as games aren't supposed to use wrapping with non-power-of-2 sizes); this has not been done for the hardware renderers.
Disables Store EFB Copies to Texture Only. As this game has a lot of
water, the water not rendering correctly provides a pretty significant
gameplay issue.