Fragment coordinates always have a 0.5 offset from a whole integer, as
that's where the pixel center is on modern GPUs. Therefore, we want to
always round the fragment coordinates down for bounding box
calculations. This also renders the pixel center offset useless, as 0.5
vs ~0.5833333 makes no difference when rounding down.
The SDK seems to write "default" bounding box values before every draw
(1023 0 1023 0 are the only values encountered so far, which happen to
be the extents allowed by the BP registers) to reset the registers for
comparison in the pixel engine, and presumably to detect whether GX has
updated the registers with real values. Handling these writes and
returning them on read when bounding box emulation is disabled or
unsupported, even without computing real values from rendering, seems
to prevent games from corrupting memory or crashing.
This obviously does not fix any effects that rely on bounding box
emulation, but having the game not clobber its own code/data or just
outright crash is a definite improvement.
We also need to ensure the the CPU does not receive stale values
which have been updated by the GPU. Apparently the buffer here
is not coherent on NVIDIA drivers. Not sure if this is a driver
bug/spec violation or not, one would think that
glGetBufferSubData() would invalidate any caches as needed, but
this path is only used on NVIDIA anyway, so it's fine. A point
to note is that according to ARB_debug_report, it's moved from
video to host memory, which would explain why it needs the
cache invalidate.
This fixes rendering issues in Viewtiful Joe (https://bugs.dolphin-emu.org/issues/12525), but it is not entirely hardware accurate, as hardware testing showed other, more complex behavior in this case. However, it should be good enough for our purposes.
Looks like the option was added to the Wx UI at commit 198d3b69, which
was a few months after the advancedWidget was originally ported from
Wx to Qt, but before anyone was actually using Qt.
When determinism is enabled, we either want all CPUs to use FMA or
we want no CPUs to use FMA. Until now, Jit64 has been been doing
the latter. However, this is inaccurate behavior, all CPUs since
Haswell support FMA, and getting JitArm64 to match the exact
inaccurate rounding used by Jit64 would be a bit annoying. This
commit switches us over to using FMA on all CPUs when determinism
is enabled, with older CPUs calling the std::fma function.
Adds a new PlatformID for universal builds. This will allow single architecture
builds to be updated through the single architecture path, and universal builds
to be updated with universal builds.