Note: It's not 100% perfect, as some of the GPU capablities leak into the
pixel shader UID.
Currently our UIDs don't get exported, so there is no issue. But someone
might want to fix this in the future.
As much as possible, the asserts have been moved out of the GetUID
function. But there are some places where asserts depend on variables
that aren't stored in the shader UID.
Bug Fix: It was theoretically possible for a shader with depth writes
disabled to map to the same UID as a shader with late depth
writes.
No known test cases trigger this.
Bug Fix: The normal stage UIDs were randomly overwriting indirect
stage texture map UID fields. It was possible for multiple
shaders with diffrent indirect texture targets to map to
the same UID.
Once again, it dpesn't look like this bug was ever triggered.
This frees up 21 bits and allows us to shorten the UID struct by an entire
32 bits.
It's not strictly needed (as it's encoded into the length) but I added a
bit for per-pixel lighiting to make my life easier in the following
commits.
The only code which touches xfmem is code which writes directly into
uid_data.
All the rest now read their parameters out of uid_data.
I also simplified the lighting code so it always generated seperate
codepaths for alpha and color channels instead of trying to combine
them on the off-chance that the same equation works for all 4 channels.
As modern (post 2008) GPUs generally don't calcualte all 4 channels
in a single vector, this optimisation is pointless. The shader compiler
will undo it during the GLSL/HLSL to IR step.
Bug Fix: The about optimisation was also broken, applying the color light
equation to the alpha light channel instead of the alpha light
euqation. But doesn't look like anything trigged this bug.
Fixes a major preformance regression in Skies of Arcadia during
battle transisions.
I had plans for a more advanced version of this code after 5.0,
but here is a minimal implemenation for now.
Using glMapBufferRange to read back the contents of the SSBO is extremely
slow on NVIDIA drivers. This is more noticeable at higher internal
resolutions. Using glGetBufferSubData instead does not seem to exhibit
this slowdown.
This is an oversight from pr https://github.com/dolphin-emu/dolphin/pull/3266 . Thanks to degasus for pointing this out.
It's possible that MAX_TEXTURE_BINARY_SIZE can be optimised, but i wanted to play it safe considering the 5.0 stable release.
Drivers that don't support GL_ARB_shading_language_420pack require that
the storage qualifier be specified even when inside an interface block.
AMD's driver throws a compile error when "centroid in/out" is used within
an interface block.
Our previous behavior was to include the storage qualifier regardless, but
this wasn't working on AMD, therefore we should check for the presence of
the extension and include based on this, instead.
I'm not entirely sure what is happening, but this optimisation is causing an issue in Sonic Riders: Zero Gravity. Apparently the issue would also be fixed by PR#3747, but this PR should also fix similar issues.
Games that use partial updates might get slower with this, so some performance regression testing would be nice. Games like New Super Mario Bros, RS2, Zelda TP and Silent Hill. Testing with high graphics settings makes sense, since this would mostly end up in more work for the GPU.
The D3D backend was always forcing Anisotropic filtering when that is enabled regardless of how the game chose to configure the texture filtering registers; this causes the same issues as "Force Filtering" without Anisotropy, such as causing game UI elements to no longer line up adjacent correctly. Historically, OpenGL's Anisotropy support has always worked "better" than D3D's due to seeming to not have this problem; unfortunately, OpenGL's Anisotropy specification only gives GL_LINEAR based filtering modes defined behavior, with only the mipmap setting being required to be considered. Some OpenGL implementations were implicitly disabling Anisotropy when the min/mag filters were set to GL_NEAREST, but this behavior is not required by the spec so cannot be relied on.
- remove an outdated comment about the efb to ram and scaled efb restriction
- when upscaling efb copies, mark the new texture as efb copy
- dx12 fixes for the src box, especially the number of layers for 3D
OS X's shader compiler has a bug with interface blocks where interface block members don't properly inherit the layout qualifier from the interface
block.
Work around this limitation by explicitly stating the layout qualifier on both the interface block and every single member inside of that block.
As confirmed by a hardware test if we are using the texgen type of COLOR_STRGBC0/STRGBC1 then it sets the texture coordinates to those values
regardless of what the input form or source row is.
Thanks to Ornox for testing again
Removes a couple asserts in the vertex shader gen when dealing with the input form.
Typically input form ABC1 is used, so it'll pull in the first three elements and always set the fourth to 1.0
The other input form available is AB11, which sets the last two components to 1.0 (Theoretically).
No titles actually use this input form that we know of except for Project M, but it can have some fairly drastic visual differences.
Confirmed correct by hardware test
This should get Donkey Kong Country Returns characters to be as broken as they should be. They will be fixed in a later pr.
Expected result is:
efbtex: characters are always flickering or invisible, no matter what scaling or IR setting
efb2ram: characters are always working properly at 1xIR, no matter what scaling or IR setting
Fast depth is now more accurate than slow depth and should always be used.
The option will be kept in a different form as it is still used as a hack to fix some games.
Also, the slow depth code path will still be relied upon by cards that don't support GL_ARB_clip_control.
CBoot::BootUp() did call CoreTiming::Advance which itself blocks on the GPU,
but the GPU thread wasn't started already. This commit moves the SyncGPU
initialization into the Fifo.cpp file and call it after BootUp().
Setting this is not required anymore as of commit 40cf1bbacc622 of
FFmpeg.
For users of older versions of the libavcodec library we guard the
change with an #if.
(x % y) is not defined in GLSL when sign(x) != sign(y).
This also has the added benefit of behaving the same as sampler wrapping modes, in regards to negative inputs.
This was causing crashes/driver resets when odd-dimension textures were
being loaded, due to the size we were uploading being larger than the size
of the higher-level texture calculated by the runtime.
People who make texture packs usually release them using a specific ID
(for instance SX4E01). Users who have a different version of the game
(like the PAL version SX4P01) then need to rename the custom texture
folder to match. This is a lot simpler than renaming every texture file,
as was required with the old texture format, but it's still something
that users can forget to do. To make that unnecessary, this change makes
it possible to use three-character region-free IDs for custom texture
folders, similarly to how game INIs can use three-character IDs. Once
most people have updated to Dolphin versions that include this change,
those who make texture packs will be able to name them with
three-character IDs, removing the need for users to rename anything.
We don't throttle by frames, we throttle by coretiming speed.
So looking up VI for calculating the speed was just very wrong.
The new ini option is a float, 1.0f for fullspeed.
In the GUI, percentual values are used.
This fixes the crashes occuring at startup with a non-empty shader cache.
Because LinearDiskCache reads/writes to the storage of ShaderUid, ShaderUid must be trivially copyable.
Additionally, adds a static assert to LinearDiskCache to ensure this doesn't happen in the future.
The initialization of ShaderUid data has been moved to the code generation functions, so the above condition holds true.