Cel-damage depends on lighting being calculated for the first channel
even though there is no color in the vertex format (defaults to the
material color). If lighting for the channel is not enabled, the vertex
will use the default color as before.
The default value of the color is determined by the number of elements in
the vertex format. This fixes the grey cubes in Super Mario Sunshine.
If the color channel count is zero, we set the color to black before the
end of the vertex shader. It's possible that this would be undefined
behavior on hardware if a vertex color index that was greater than the
channel count was used within TEV.
Now that we've converted all of the shader generators over to using fmt,
we can drop the old Write() member function and perform a rename
operation on the WriteFmt() to turn it into the new Write() function.
All changes within this are the removal of a <cstdarg> header, since the
previous printf-based Write() required it, and renaming. No functional
changes are made at all.
Completes the migration over to using the fmt-formatting WriteFmt
function. The next PR will rename all usages of WriteFmt, while
simultaneously getting rid of the old printf code.
Rather than expose the bounding box members directly, we can instead
provide an interface for code to use. This makes it nicer to transition
from global data, as the interface function names are already in
place.
Zero-initialization zeroes out all members and padding bits, so this is
safe to do. While we're at it, also add static assertions that enforce
the necessary requirements of a UID type explicitly within the ShaderUid
class.
This way, we can remove several memset calls around the shader
generation code that makes sure the underlying UID data is zeroed out.
Now our ShaderUid class enforces this for us, so we don't need to care about
it at the usage sites.
Many of the arrays defined within this file weren't declared as
immutable, which can inhibit the strings being put into the read-only
segment. We can declare them constexpr to make them immutable.
While we're at it, we can use std::array, to allow bounds conditional
bounds checking with standard libraries. The declarations can also be
shortened in the future when all platform toolchain versions we use
support std::array deduction guides. Currently macOS and FreeBSD
builders fail on them.
If bounding box is enabled when a UID cache is created, then later disabled,
we shouldn't emit the bounding box portion of the shader.
Fixes pipeline creation errors on D3D12 backend for this case.
The Metal shader compiler fails to compile the atomic instructions
when operating on individual components of a vector. Spltting it
into four variables shouldn't make any difference for other
platforms, as they are accessed independently.
floatindex is clamped to the range [0, 9]. For non-negative numbers
floor() is equivalent to trunc(). Truncation happens implicitly when
converting to uint, so the floor() is unnecessary.
This was causing a warning in the shader compiler, as the rgb components
were not initialized. Which shouldn't be an issue, as the rgb is not
used in the blend equation, only the alpha. However, the lack of
initialization causes crashes in Intel's D3D shader compiler, so we'll
play nice and initialize all the channels.
Tested on a linux Intel Skylake integrated graphics with
blend_func_extended force-disabled, as it's the only platform I have
that doesn't crash with ubershaders and supports fb_fetch
It seems it doesn't like modifying inout variables in place - so instead
use a temporary for ocol0/ocol1 and only write them once at the end of
the shader
Cel-damage uses the color from the lighting stage of the vertex pipeline
as texture coordinates, but sets numColorChans to zero.
We now calculate the colors in all cases, but override the color before
writing it from the vertex shader if numColorChans is set to a lower value.
Currently, this is only the logic op bit, but this will be extended to
the framebuffer fetch/blend modes. In the future, when/if we move to
VideoCommon pipelines, this state will be part of the pipeline UID
anyway, and we can mask it out in the backend by using a two-level map,
so the shaders/programs are shared.
Some games will set q to a different value than 1.0 through
texture matrix manipulations. It seems the console will still
do the division in that case.
Note: It's not 100% perfect, as some of the GPU capablities leak into the
pixel shader UID.
Currently our UIDs don't get exported, so there is no issue. But someone
might want to fix this in the future.
As much as possible, the asserts have been moved out of the GetUID
function. But there are some places where asserts depend on variables
that aren't stored in the shader UID.
Bug Fix: It was theoretically possible for a shader with depth writes
disabled to map to the same UID as a shader with late depth
writes.
No known test cases trigger this.
Bug Fix: The normal stage UIDs were randomly overwriting indirect
stage texture map UID fields. It was possible for multiple
shaders with diffrent indirect texture targets to map to
the same UID.
Once again, it dpesn't look like this bug was ever triggered.
This frees up 21 bits and allows us to shorten the UID struct by an entire
32 bits.
It's not strictly needed (as it's encoded into the length) but I added a
bit for per-pixel lighiting to make my life easier in the following
commits.
The only code which touches xfmem is code which writes directly into
uid_data.
All the rest now read their parameters out of uid_data.
I also simplified the lighting code so it always generated seperate
codepaths for alpha and color channels instead of trying to combine
them on the off-chance that the same equation works for all 4 channels.
As modern (post 2008) GPUs generally don't calcualte all 4 channels
in a single vector, this optimisation is pointless. The shader compiler
will undo it during the GLSL/HLSL to IR step.
Bug Fix: The about optimisation was also broken, applying the color light
equation to the alpha light channel instead of the alpha light
euqation. But doesn't look like anything trigged this bug.
Drivers that don't support GL_ARB_shading_language_420pack require that
the storage qualifier be specified even when inside an interface block.
AMD's driver throws a compile error when "centroid in/out" is used within
an interface block.
Our previous behavior was to include the storage qualifier regardless, but
this wasn't working on AMD, therefore we should check for the presence of
the extension and include based on this, instead.
OS X's shader compiler has a bug with interface blocks where interface block members don't properly inherit the layout qualifier from the interface
block.
Work around this limitation by explicitly stating the layout qualifier on both the interface block and every single member inside of that block.
Fast depth is now more accurate than slow depth and should always be used.
The option will be kept in a different form as it is still used as a hack to fix some games.
Also, the slow depth code path will still be relied upon by cards that don't support GL_ARB_clip_control.