Previous CopyRect function does a memcopy without conversion.
This function will allow to use different format for input/output. Just a
possibility for the future
Basically the code does the alpha multiplication in the shader therefore
the blend unit only does a pure addition. This way the multiplication is
accurate and accurate_blending doesn't requires a costly barrier.
This code also avoid variable duplication to make the code more separated.
Hopefully blending can be done in a separated function
It is preliminary work to support fast color clipping with HDR
v2: fix assertion compilation failure
v3: fix regression in not accurate mode
v3: Cs * As/Af is not an accumulation
Those cases don't need the Cd addition and were already optimized anyway
Fix a regression on GoW2
Do DATE algo selection before blending. This way we can detect bad
interaction.
Regroup all blending/colclip in a single block. Avoid to check abe &&
rt multiple times.
v2: only enable sw blending when abe is true
The updated medium level will run for all sprites. It helps sotc blooming effect and it remains
fast enough to be enabled by default (at least on 3D games)
The new high level will run for all sprites + color clipping
The idea is that sprites are often use for post-processing effect (ofc except 2D games)
Most of the time post-processing supports SW blending with a small speed penality. SW
blending is more accurate so it is better to use it.
Sourceforge was dead for more than a week therefore add the license
information. I could not find the original TGM source (dead link) so I'm not
even sure if this still applies or if the glsl was totally rewritten. None
of the glsl files have a copyright header so it's hard to tell.
Gain: 1% at 4x on SotC (it partially compensates recent additions)
When the color is constant and equal to 128, the MODULATE mode is
equivalent to the DECAL mode. It saves 5 instructions on the FS.
Accurate options do a better jobs. Technically it can still
be useful for old gpu/driver that doesn't support the GL4.5 extension.
On Windows, you can still rely on Dx
On linux, free driver support it (except Intel)
It fixes the bad light on Silent Hill with the SW renderer.
Full story
if Q is NaN, m_vt.m_eq.q becomes wrongly true
/Q will wrongly be optimized in the "Vertex Shader" of the SW
Note: Add an assert for the STQ handler
Code path is quite hot so no need to add extra check for nothing
GS uses integer value and does integer operation too.
This commit trunc the sampled texture, the interpoled fragment color
and the product of the 2.
It impacts negatively the perf of about 3/4% (GPU) but it fixes rendering on
suikoden and potentially some others games too.
Code was completey bitrotten
Code was a partial test (and yet 500 lines already)
Shader is more and more complex and multithreading support greatly
reduce the cost of shader switch
Nvidia allows to get the ASM of the shader of the compiled shader. It is useful
to check the performance.
It also allow me to compile most of shader code path for QA
Dump is enabled in linux replayer + debug_glsl_shader = 2