PABE accumulation blend:
Idea is to achieve final output Cs when As < 1, we do this with manipulating Cd using the src1 output.
This can't be done with reverse subtraction as we want Cd to be 0 when As < 1.
Blend mix is excluded as no games were found, otherwise it can be added.
PABE Disable blending:
We can disable blending here as an optimization since alpha max is 128
which if alpha is 1 in the formula Cs*Alpha + Cd*(1 - Alpha) will give us a result of Cs.
VK/GL/Metal: Get rid of it completely as it doesn't seem needed anymore.
DX: Only enable it with combination with GPU Palette Conversion enabled as that's when the issue occurs.
Test: See if Metal breaks with no point sampler.
2
Apparently this causes GPU crashes on RDNA3, and didn't provide any
tangible benefit for NVIDIA.
I'll replace this at some point with dynamic rendering local reads,
either before or after the GPUDevice transition.
Instead of breaking the draw into two passes, which breaks when
fragments overlap each other and blending is enabled, use blending to
leave the value of Ad intact when a pixel fails the alpha test.
In the case of DATE being enabled, prefer PrimID over stencil, as since
we are changing Ad on a per-fragment basis, with some fragments not
being modified, stencil DATE will become desynchronized with the value
of Ad.
The idea is to adjust the alpha destination for more
accurate hw blending which will work on all renderers.
Old behavior has Ad in range within 0-1 whereas for blending 0-2 is needed.
copy rt -> adjust the alpha -> copy back the adjusted alpha-> restore old alpha after blending is done
Since we can't do Cd*(Alpha + 1) - Cs*Alpha in hw blend what we can do is adjust the Cs value that will be subtracted,
this way we can get a better result in hw blend. Result is still wrong but less wrong than before.
Fixes Colin McRae Rally 2005 on Vulkan.
Possible others as well on basic blend with barriers or Medium blend with barriers disabled.
Bump shader cache version.
When both rt min and max are equal then we know what Ad value is,
if so use Af bit instead and set AFIX value from rt alpha value that we know.
On OpenGL when BLEND C == 1 but reading the rt is disabled, set the value to 0 instead
of reading an undefined value.