Adds a pass to process driver deficiencies between UID caching and use, allowing a full view of the whole pipeline, since some bugs/workarounds involve interactions between blend modes and the pixel shader
Spirv-cross's MSL codegen makes the amazing choice of compiling calls to inout functions as `State temp = s; call_function(temp); s = temp`. Not all Metal backends handle this mess well. In particular, it causes register spills on Intel, losing about 5% in performance.
Previously we were using this workaround when using framebuffer fetch
to emulate dual source blending, but it seems like we also need to use
it when using framebuffer fetch to emulate logic ops, otherwise some
Adreno devices get a crash when compiling OpenGL ES ubershaders.
Using the workaround in specialized shaders doesn't seem to be
necessary, but I've made the same change there for consistency.
This gets us closer to fixing https://bugs.dolphin-emu.org/issues/12791
but doesn't actually fix it.
It doesn't make sense for alpha to add the bias ONLY when dividing by 2, while color doesn't apply the bias for divide by 2 only; hardware testing indicates that alpha should have the bias.
This fixes the menus in Mario Kart Wii (https://bugs.dolphin-emu.org/issues/11909) but reintroduces the white rectangle in Fortune Street.
This reverts commit 5aaa5141ed (and several other matching changes elsewhere).
SPDX standardizes how source code conveys its copyright and licensing
information. See https://spdx.github.io/spdx-spec/1-rationale/ . SPDX
tags are adopted in many large projects, including things like the Linux
kernel.
This fixes rendering issues in Viewtiful Joe (https://bugs.dolphin-emu.org/issues/12525), but it is not entirely hardware accurate, as hardware testing showed other, more complex behavior in this case. However, it should be good enough for our purposes.
Previously we set the texture coordinate to zero, now we set
the texture coordinate *index* to zero. This fixes the ripple
effect of the Mario painting in Luigi's Mansion.
This change should have no behavioral differences itself, but allows for changing the behavior of out of bounds tex coord indices more easily in the next commit. Without this change, returning tex0 for out of bounds cases and then applying the fixed-point logic would use the wrong tex dimension info (tex0 with I_TEXDIMS[1] or such), which is inaccurate.
Additional changes:
- For TevStageCombiner's ColorCombiner and AlphaCombiner, op/comparison and scale/compare_mode have been split as there are different meanings and enums if bias is set to compare. (Shift has also been renamed to scale)
- In TexMode0, min_filter has been split into min_mip and min_filter.
- In TexImage1, image_type is now cache_manually_managed.
- The unused bit in GenMode is now exposed.
- LPSize's lineaspect is now named adjust_for_aspect_ratio.
Now that we've converted all of the shader generators over to using fmt,
we can drop the old Write() member function and perform a rename
operation on the WriteFmt() to turn it into the new Write() function.
All changes within this are the removal of a <cstdarg> header, since the
previous printf-based Write() required it, and renaming. No functional
changes are made at all.
Completes the migration over to using the fmt-formatting WriteFmt
function. The next PR will rename all usages of WriteFmt, while
simultaneously getting rid of the old printf code.
These are only ever used with ShaderCode instances and nothing else.
Given that, we can convert these helper functions to expect that type of
object as an argument and remove the need for templates, improving
compiler throughput a marginal amount, as the template instantiation
process doesn't need to be performed.
We can also move the definitions of these functions into the cpp file,
which allows us to remove a few inclusions from the ShaderGenCommon
header. This uncovered a few instances of indirect inclusions being
relied upon in other source files.
One other benefit is this allows changes to be made to the definitions
of the functions without needing to recompile all translation units that
make use of these functions, making change testing a little quicker.
Moving the definitions into the cpp file also allows us to completely
hide DefineOutputMember() from external view, given it's only ever used
inside of GenerateVSOutputMembers().
Zero-initialization zeroes out all members and padding bits, so this is
safe to do. While we're at it, also add static assertions that enforce
the necessary requirements of a UID type explicitly within the ShaderUid
class.
This way, we can remove several memset calls around the shader
generation code that makes sure the underlying UID data is zeroed out.
Now our ShaderUid class enforces this for us, so we don't need to care about
it at the usage sites.
The Metal shader compiler fails to compile the atomic instructions
when operating on individual components of a vector. Spltting it
into four variables shouldn't make any difference for other
platforms, as they are accessed independently.
floatindex is clamped to the range [0, 9]. For non-negative numbers
floor() is equivalent to trunc(). Truncation happens implicitly when
converting to uint, so the floor() is unnecessary.