gsdx ogl: only use geometry shader to convert big enough draw call
The purpose of geometry shader is to reduce bandwidth (72 bytes by sprite)
and CPU load.
Unfortunately it increases CPU load due to extra shader validations.
So geometry shader will only be enabled for draw call with more than
16 sprites (arbitrarily, smallest number before shadow hearts plummet)
v2: don't disable geometry shader in replayer.
It is easier to spot sprite rendering and to manually read vertex info.
Strangely the game uses large texture to handle texture buffer.
I think it plays with WMS/WMT. I'm not sure texture shuffling is 100%
correct here. But without it, it's completely broken.
* 0x9C712FF0, Jak1, EU
* 0x472E7699, Jak1, US
* 0x2479F4A9, Jak2, EU
* 0x12804727, Jak3, EU
* 0xDF659E77, JakX, EU
Please report me the CRC of the US version too so I can add them.
Please test the shadows rendering (openGL HW + accurate blending at least basic)
The game sets the framebuffer as an input texture. So I did the same for
openGL. Code is protected with a CRC. It is working because the game want to sample
pixels.
For the record, I tested it GTA too, it doesn't work as expected because
the game will resize the framebuffer to a smaller one. So you don't have
the guarantee that pixel will be read before a data write.
Note: it requires at least accurate blending set on basic
Note: I need CRC of all Jak games that suffers of this issue. Thanks you :)
When depth primitive is constant and depth test is greater or equal, we can
execute the depth write after color (depth status will only depends on the initial
value)
New case for RGB_ONLY ate:
If the blending equation uses a fixed alpha or a source alpha. We can postpone the alpha write
in a 2nd pass.
If depth can also be postponed, we can guarantee the order of correctness of the value.
1st pass => do RGB
2nd pass => do Alpha & Depth
It fixed Stuntman letter rendering :) Remaining of the game is still broken :(
Value seems wrongly rounded and you can't distinguish 0xFFFF from 0xFFFE
Instead check that depth is constant for the draw call and the value from the vertex buffer
Fix recent regression on GTA (and likely various games)
In FB_ONLY mode the alpha test impacts (discard) only the depth value.
If there is no depth buffer, we don't care about depth write. So alpha
test is useless and we can do the draw with a single draw call and no program
switch
Fix rendering issue on letters on Kengo/burnout 3/...
Default algo will execute the alpha test in 2 passes. However due to blending
you can't handle accurately the color.
Fortunately for us, the rendering uses an always pass depth test so you
can execute first all the color rendering (which doesn't depends on the alpha test)
And then the depth part which depends on the alpha test.
* Code was factorized a bit with the help of max_z
* Add an extra optimization if test is ZTST_GEQUAL and min z value is
the biggest value. Z test will always be pass.
Note: due to float rounding (23 bits mantissa vs 24 bits depth) the test
is done against 0xFF_FFFE and not 0xFF_FFFF. It is wrong but GPU will
also use float so impact will be null.
GLLoader: cast passed parameters to required type.
GSDeviceOGL: cast variables to required type and silence warnings.
GSRendererOGL: cast variables to required type and silence warnings.
* Fast accurate DATE is always enabled, it was faster than standard DATE
* The less fast version is always enabled too. It is likely barely used
so perf impact will be small on few game that could hit this path.
Nice rendering has a higher priority
* The "slow" path will depends on the date option.
Note normally it isn't too slow (-10%) if GL_ARB_shader_image_load_store
is supported but AMD crimson is an epic fail.
Faster :) Reduce further the cost of accurate date
The optimization will clear the stencil to 1. So all pixels will have a
single sample that pass both the depth & stencil test. No primitive
overlaps So the destination alpha test can be done directly in the
shader.
Game often uses date to allow a single pixel pass. If this
use case is detected, stencil buffer will be cleared after first pixels
that pass both depth&stencil test.
It seems to reduce the load on the GPU.
Note: with the help of texture barriere, maybe we could implement the algo
with a single pass.