DataReader is generally jank - it has a start and end pointer, but the end pointer is generally not used, and all of the vertex loaders mostly bypassed it anyways.
Wrapper code (the vertex loaer test, as well as Fifo.cpp and OpcodeDecoding.cpp) still uses it, as does the software vertex loader (which is not a subclass of VertexLoader). These can probably be eliminated later.
This more accurately represents what's going on, and also ends at 0 instead of 1, making some indexing operations easier. This also changes it so that position_matrix_index_cache actually starts from index 0 instead of index 1.
SPDX standardizes how source code conveys its copyright and licensing
information. See https://spdx.github.io/spdx-spec/1-rationale/ . SPDX
tags are adopted in many large projects, including things like the Linux
kernel.
These vertex formats enable all attributes. Inactive attributes are set
to offset=0, and the smallest type possible. This "optimization" stops
the NV compiler from generating variants of vertex shaders.
Yet another story of games loading weird shit into registers.
For some reason, Burnout 2 would (in rare situations) load invalid
addresses into cp_state.array_bases. What would the real hardware
do in this situation? Who knows, Burnout 2 doesn't actually enable
the vertex array with the invalid address so nothing kinky happens.
But dolphin tries to optimise things and starts using the address
as soon as it is loaded into memory. This causes GetPointer (which is
now much more vocal) to throw an error.
The Fix: We don't call GetPointer until we are sure the vertex array
has been enabled.
This state will be used to calculate sizes for skipping over commands on
a separate thread. An alternative to having these state variables would
be to have the preprocessor stash "state as we go" somewhere, but I
think that would be much uglier.
GetVertexSize now takes an extra argument to determine which state to
use, as does FifoCommandRunnable, which calls it. While I'm modifying
FifoCommandRunnable, I also change it to take a buffer and size as
parameters rather than using g_pVideoData, which will also be necessary
later. I also get rid of an unused overload.
Separated out from my gpu-determinism branch by request. It's not a big
commit; I just like to write long commit messages.
The main reason to kill it is hopefully a slight performance improvement
from avoiding the double switch (especially in single core mode);
however, this also improves cycle calculation, as described below.
- FifoCommandRunnable is removed; in its stead, Decode returns the
number of cycles (which only matters for "sync" GPU mode), or 0 if there
was not enough data, and is also responsible for unknown opcode alerts.
Decode and DecodeSemiNop are almost identical, so the latter is replaced
with a skipped_frame parameter to Decode. Doesn't mean we can't improve
skipped_frame mode to do less work; if, at such a point, branching on it
has too much overhead (it certainly won't now), it can always be changed
to a template parameter.
- FifoCommandRunnable used a fixed, large cycle count for display lists,
regardless of the contents. Presumably the actual hardware's processing
time is mostly the processing time of whatever commands are in the list,
and with this change InterpretDisplayList can just return the list's
cycle count to be added to the total. (Since the calculation for this
is part of Decode, it didn't seem easy to split this change up.)
To facilitate this, Decode also gains an explicit 'end' parameter in
lieu of FifoCommandRunnable's call to GetVideoBufferEndPtr, which can
point to there or to the end of a display list (or elsewhere in
gpu-determinism, but that's another story). Also, as a small
optimization, InterpretDisplayList now calls OpcodeDecoder_Run rather
than having its own Decode loop, to allow Decode to be inlined (haven't
checked whether this actually happens though).
skipped_frame mode still does not traverse display lists and uses the
old fake value of 45 cycles. degasus has suggested that this hack is
not essential for performance and can be removed, but I want to separate
any potential performance impact of that from this commit.
We are used to have a 1:1 mapping of GX vertex formats and the native (OGL + D3D) ones, but there are by far more GX ones.
This new cache maps them directly so that we don't flush on GX vertex format changes as long as the native one doesn't change.
The idea is stolen from galop1n.
This option was known to break every second game and only boost a bit.
It also seems to be broken because of streaming into pinned memory and buffer storage buffers.
v2: also remove dlc_desc