Adds a pass to process driver deficiencies between UID caching and use, allowing a full view of the whole pipeline, since some bugs/workarounds involve interactions between blend modes and the pixel shader
This increases accuracy, fixing the white rendering in Major Minor's Majestic March. However, the hardware backends can only have one viewport and scissor rectangle at a time, while sometimes multiple are needed to accurately emulate what is happening. If possible, this will need to be fixed later.
I think this is a relic of D3D9. D3D11 and D3D12 seem to work fine without it. Plus, ViewportCorrectionMatrix just didn't work correctly (at least with the viewports being generated by the new scissor code).
Fixes a crash that could occur if the static constructor function for
the MainSettings.cpp TU happened to run before the variables in
Common/Version.cpp are initialised. (This is known as the static
initialisation order fiasco.)
By using wrapper functions, those variables are now guaranteed to be
constructed on first use.
We don't use sampler2DMS, but we do use sampler2DMSArray.
I can't reproduce it on my phone, but a user who was running GLES
on a Tegra X1 reported a shader compilation error related to this.
The benefit to exposing this over the raw BP state is that adjustments Dolphin makes, such as LOD biases from arbitrary mipmap detection, will work properly.
SPDX standardizes how source code conveys its copyright and licensing
information. See https://spdx.github.io/spdx-spec/1-rationale/ . SPDX
tags are adopted in many large projects, including things like the Linux
kernel.
Running the min/max operation on the upside down, quad-rounded pixel
coordinates before inverting them to the standard upper-left origin
produces wrong results. Therefore, we need to do the inversion before
rounding to pixel quads.
We also need to ensure the the CPU does not receive stale values
which have been updated by the GPU. Apparently the buffer here
is not coherent on NVIDIA drivers. Not sure if this is a driver
bug/spec violation or not, one would think that
glGetBufferSubData() would invalidate any caches as needed, but
this path is only used on NVIDIA anyway, so it's fine. A point
to note is that according to ARB_debug_report, it's moved from
video to host memory, which would explain why it needs the
cache invalidate.
When trying to do a small optimization in 8a0f5ea, I failed to
take into account that WeakFlush and FlushOne update m_query_count.
Only D3D11 and OGL had this problem, not D3D12 and Vulkan.
The STL has everything we need nowadays.
I have tried to not alter any behavior or semantics with this
change wherever possible. In particular, WriteLow and WriteHigh
in CommandProcessor retain the ability to accidentally undo
another thread's write to the upper half or lower half
respectively. If that should be fixed, it should be done in a
separate commit for clarity. One thing did change: The places
where we were using += on a volatile variable (not an atomic
operation) are now using fetch_add (actually an atomic operation).
Tested with single core and dual core on x86-64 and AArch64.