Separated out from my gpu-determinism branch by request. It's not a big
commit; I just like to write long commit messages.
The main reason to kill it is hopefully a slight performance improvement
from avoiding the double switch (especially in single core mode);
however, this also improves cycle calculation, as described below.
- FifoCommandRunnable is removed; in its stead, Decode returns the
number of cycles (which only matters for "sync" GPU mode), or 0 if there
was not enough data, and is also responsible for unknown opcode alerts.
Decode and DecodeSemiNop are almost identical, so the latter is replaced
with a skipped_frame parameter to Decode. Doesn't mean we can't improve
skipped_frame mode to do less work; if, at such a point, branching on it
has too much overhead (it certainly won't now), it can always be changed
to a template parameter.
- FifoCommandRunnable used a fixed, large cycle count for display lists,
regardless of the contents. Presumably the actual hardware's processing
time is mostly the processing time of whatever commands are in the list,
and with this change InterpretDisplayList can just return the list's
cycle count to be added to the total. (Since the calculation for this
is part of Decode, it didn't seem easy to split this change up.)
To facilitate this, Decode also gains an explicit 'end' parameter in
lieu of FifoCommandRunnable's call to GetVideoBufferEndPtr, which can
point to there or to the end of a display list (or elsewhere in
gpu-determinism, but that's another story). Also, as a small
optimization, InterpretDisplayList now calls OpcodeDecoder_Run rather
than having its own Decode loop, to allow Decode to be inlined (haven't
checked whether this actually happens though).
skipped_frame mode still does not traverse display lists and uses the
old fake value of 45 cycles. degasus has suggested that this hack is
not essential for performance and can be removed, but I want to separate
any potential performance impact of that from this commit.
This is no longer required since we don't support x86_32 anymore.
x86_64 implies SSE2 support.
Also this check was a bit messed up and was hitting on Generic builds.
Only show the JIT cores on x86_64(Will have its own issues once we reach that point)
Show AArch64 JIT if running on a AArch64 device(Good luck with that for now. Future proofing though)
A bug that seems to have been uncovered by allowing immediate-address loads.
Super Monkey Ball 2 crashes without this change -- it's possible, however, that
the game actually requires the MMU hack, since it crashed due to accessing an
address in the 0x20000000-0x3fffffff range.
Decreases total Wii state save time (not counting compression) from
~570ms to ~18ms.
The compiler can't remove this check because of potential aliasing; this
might be fixable (e.g. by making mode const), but there is no reason to
have the code work in such a braindead way in the first place.
- DoVoid now uses memcpy.
- DoArray now uses DoVoid on the whole rather than Doing each element
(would fail for an array of STL structures, but we don't have any of
those).
- Do also now uses DoVoid. (In the previous version, it replicated
DoVoid's code in order to ensure each type gets its own implementation,
which for small types then becomes a simple load/store in any modern
compiler. Now DoVoid is __forceinline, which addresses that issue and
shouldn't make a big difference otherwise - perhaps a few extra copies
of the code inlined into DoArray or whatever.)
We were generating a texture without ever setting the data to a known value.
This happened on the old code as well, just that PP shaders are receiving some love and people are using it and noticing some of its issues.