At higher resolutions, our bounding box dimensions end up being
slightly larger than original hardware in some cases. This is not
necessarily wrong, it's just an artifact of rendering at a higher
resolution, due to bringing out detail that wouldn't have appeared on
original hardware. It causes a texel to fall partially on what would
have been a single pixel at native resolution, resulting in the
coordinates getting bumped up to the next valid value. In many cases,
these slightly larger bounding boxes are perfectly fine, as games don't
hard-code expected dimensions. It is problematic in Paper Mario TTYD
though, for a somewhat complicated reason.
Paper Mario TTYD frequently uses EFB copies to pre-render a bunch of
animation frames for a character sprite (especially in Chapter 2), so
that it can then render 100 or more of them without bringing the
GameCube to its knees. Based on my observation, the game seems to set
aside a region of memory to store these EFB copies. This region is
obviously fairly small, as the GameCube only has 24MB of RAM. There are
2 rooms in Chapter 2 where you fight a horde of as many as 100 Jabbies,
which are also rendered using EFB copies, so in this room the game ends
up making 130(!) EFB copies just for Puni and Jabbi sprites. This seems
to nearly fill the region of memory it set aside for them.
Unfortunately, our slightly larger bounding boxes at higher resolutions
results in overflowing this memory, causing very strange behavior. Some
EFB copies partially overlap game state, resulting in reading it as a
garbage RGB5A3 texture that constantly changes. Others apparently
somehow trigger a corner case in our persistent buffer mapping, causing
them to partially overwrite earlier EFB copies.
What this change does is only include the screen coordinates that align
with the equivalent native resolution pixel centers, which generally
results in the bounding boxes being more in line with original
hardware. It isn't perfect, but it's enough to fix Paper Mario TTYD's
Jabbi rooms by avoiding the buffer overflow. Notably, it is more
accurate at odd resolutions than at even resolutions. Native resolution
is completely unaffected by this change, as should be the case. This
change may also have a small positive impact on shader performance at
higher resolutions, as there will be less atomic operations performed.
Running the min/max operation on the upside down, quad-rounded pixel
coordinates before inverting them to the standard upper-left origin
produces wrong results. Therefore, we need to do the inversion before
rounding to pixel quads.
Fragment coordinates always have a 0.5 offset from a whole integer, as
that's where the pixel center is on modern GPUs. Therefore, we want to
always round the fragment coordinates down for bounding box
calculations. This also renders the pixel center offset useless, as 0.5
vs ~0.5833333 makes no difference when rounding down.
This fixes rendering issues in Viewtiful Joe (https://bugs.dolphin-emu.org/issues/12525), but it is not entirely hardware accurate, as hardware testing showed other, more complex behavior in this case. However, it should be good enough for our purposes.
Previously we set the texture coordinate to zero, now we set
the texture coordinate *index* to zero. This fixes the ripple
effect of the Mario painting in Luigi's Mansion.
Co-authored-by: Pokechu22 <Pokechu022@gmail.com>
Additional changes:
- For TevStageCombiner's ColorCombiner and AlphaCombiner, op/comparison and scale/compare_mode have been split as there are different meanings and enums if bias is set to compare. (Shift has also been renamed to scale)
- In TexMode0, min_filter has been split into min_mip and min_filter.
- In TexImage1, image_type is now cache_manually_managed.
- The unused bit in GenMode is now exposed.
- LPSize's lineaspect is now named adjust_for_aspect_ratio.
Cel-damage depends on lighting being calculated for the first channel
even though there is no color in the vertex format (defaults to the
material color). If lighting for the channel is not enabled, the vertex
will use the default color as before.
The default value of the color is determined by the number of elements in
the vertex format. This fixes the grey cubes in Super Mario Sunshine.
If the color channel count is zero, we set the color to black before the
end of the vertex shader. It's possible that this would be undefined
behavior on hardware if a vertex color index that was greater than the
channel count was used within TEV.
Now that we've converted all of the shader generators over to using fmt,
we can drop the old Write() member function and perform a rename
operation on the WriteFmt() to turn it into the new Write() function.
All changes within this are the removal of a <cstdarg> header, since the
previous printf-based Write() required it, and renaming. No functional
changes are made at all.
Completes the migration over to using the fmt-formatting WriteFmt
function. The next PR will rename all usages of WriteFmt, while
simultaneously getting rid of the old printf code.
Rather than expose the bounding box members directly, we can instead
provide an interface for code to use. This makes it nicer to transition
from global data, as the interface function names are already in
place.
Zero-initialization zeroes out all members and padding bits, so this is
safe to do. While we're at it, also add static assertions that enforce
the necessary requirements of a UID type explicitly within the ShaderUid
class.
This way, we can remove several memset calls around the shader
generation code that makes sure the underlying UID data is zeroed out.
Now our ShaderUid class enforces this for us, so we don't need to care about
it at the usage sites.
Many of the arrays defined within this file weren't declared as
immutable, which can inhibit the strings being put into the read-only
segment. We can declare them constexpr to make them immutable.
While we're at it, we can use std::array, to allow bounds conditional
bounds checking with standard libraries. The declarations can also be
shortened in the future when all platform toolchain versions we use
support std::array deduction guides. Currently macOS and FreeBSD
builders fail on them.
If bounding box is enabled when a UID cache is created, then later disabled,
we shouldn't emit the bounding box portion of the shader.
Fixes pipeline creation errors on D3D12 backend for this case.
The Metal shader compiler fails to compile the atomic instructions
when operating on individual components of a vector. Spltting it
into four variables shouldn't make any difference for other
platforms, as they are accessed independently.
floatindex is clamped to the range [0, 9]. For non-negative numbers
floor() is equivalent to trunc(). Truncation happens implicitly when
converting to uint, so the floor() is unnecessary.
This was causing a warning in the shader compiler, as the rgb components
were not initialized. Which shouldn't be an issue, as the rgb is not
used in the blend equation, only the alpha. However, the lack of
initialization causes crashes in Intel's D3D shader compiler, so we'll
play nice and initialize all the channels.
Tested on a linux Intel Skylake integrated graphics with
blend_func_extended force-disabled, as it's the only platform I have
that doesn't crash with ubershaders and supports fb_fetch
It seems it doesn't like modifying inout variables in place - so instead
use a temporary for ocol0/ocol1 and only write them once at the end of
the shader
Cel-damage uses the color from the lighting stage of the vertex pipeline
as texture coordinates, but sets numColorChans to zero.
We now calculate the colors in all cases, but override the color before
writing it from the vertex shader if numColorChans is set to a lower value.
Currently, this is only the logic op bit, but this will be extended to
the framebuffer fetch/blend modes. In the future, when/if we move to
VideoCommon pipelines, this state will be part of the pipeline UID
anyway, and we can mask it out in the backend by using a two-level map,
so the shaders/programs are shared.
Some games will set q to a different value than 1.0 through
texture matrix manipulations. It seems the console will still
do the division in that case.
Note: It's not 100% perfect, as some of the GPU capablities leak into the
pixel shader UID.
Currently our UIDs don't get exported, so there is no issue. But someone
might want to fix this in the future.
As much as possible, the asserts have been moved out of the GetUID
function. But there are some places where asserts depend on variables
that aren't stored in the shader UID.
Bug Fix: It was theoretically possible for a shader with depth writes
disabled to map to the same UID as a shader with late depth
writes.
No known test cases trigger this.
Bug Fix: The normal stage UIDs were randomly overwriting indirect
stage texture map UID fields. It was possible for multiple
shaders with diffrent indirect texture targets to map to
the same UID.
Once again, it dpesn't look like this bug was ever triggered.
This frees up 21 bits and allows us to shorten the UID struct by an entire
32 bits.
It's not strictly needed (as it's encoded into the length) but I added a
bit for per-pixel lighiting to make my life easier in the following
commits.
The only code which touches xfmem is code which writes directly into
uid_data.
All the rest now read their parameters out of uid_data.
I also simplified the lighting code so it always generated seperate
codepaths for alpha and color channels instead of trying to combine
them on the off-chance that the same equation works for all 4 channels.
As modern (post 2008) GPUs generally don't calcualte all 4 channels
in a single vector, this optimisation is pointless. The shader compiler
will undo it during the GLSL/HLSL to IR step.
Bug Fix: The about optimisation was also broken, applying the color light
equation to the alpha light channel instead of the alpha light
euqation. But doesn't look like anything trigged this bug.