When depth primitive is constant and depth test is greater or equal, we can
execute the depth write after color (depth status will only depends on the initial
value)
New case for RGB_ONLY ate:
If the blending equation uses a fixed alpha or a source alpha. We can postpone the alpha write
in a 2nd pass.
If depth can also be postponed, we can guarantee the order of correctness of the value.
1st pass => do RGB
2nd pass => do Alpha & Depth
It fixed Stuntman letter rendering :) Remaining of the game is still broken :(
Value seems wrongly rounded and you can't distinguish 0xFFFF from 0xFFFE
Instead check that depth is constant for the draw call and the value from the vertex buffer
Fix recent regression on GTA (and likely various games)
In FB_ONLY mode the alpha test impacts (discard) only the depth value.
If there is no depth buffer, we don't care about depth write. So alpha
test is useless and we can do the draw with a single draw call and no program
switch
Fix rendering issue on letters on Kengo/burnout 3/...
Default algo will execute the alpha test in 2 passes. However due to blending
you can't handle accurately the color.
Fortunately for us, the rendering uses an always pass depth test so you
can execute first all the color rendering (which doesn't depends on the alpha test)
And then the depth part which depends on the alpha test.
* Code was factorized a bit with the help of max_z
* Add an extra optimization if test is ZTST_GEQUAL and min z value is
the biggest value. Z test will always be pass.
Note: due to float rounding (23 bits mantissa vs 24 bits depth) the test
is done against 0xFF_FFFE and not 0xFF_FFFF. It is wrong but GPU will
also use float so impact will be null.
GLLoader: cast passed parameters to required type.
GSDeviceOGL: cast variables to required type and silence warnings.
GSRendererOGL: cast variables to required type and silence warnings.
* Fast accurate DATE is always enabled, it was faster than standard DATE
* The less fast version is always enabled too. It is likely barely used
so perf impact will be small on few game that could hit this path.
Nice rendering has a higher priority
* The "slow" path will depends on the date option.
Note normally it isn't too slow (-10%) if GL_ARB_shader_image_load_store
is supported but AMD crimson is an epic fail.
Faster :) Reduce further the cost of accurate date
The optimization will clear the stencil to 1. So all pixels will have a
single sample that pass both the depth & stencil test. No primitive
overlaps So the destination alpha test can be done directly in the
shader.
Game often uses date to allow a single pixel pass. If this
use case is detected, stencil buffer will be cleared after first pixels
that pass both depth&stencil test.
It seems to reduce the load on the GPU.
Note: with the help of texture barriere, maybe we could implement the algo
with a single pass.
So just reuse GT hle shader :) Acid stage is now correct. However it might need
some tuning for others stages.
Still look awful with uspcaled resolution (note internal game framebuffer is around 160x128)
This reverts commit 53690cf9d0.
Quoting user:
For aliasing, the option allow of reduce a little but always very
visible compared with DX11 even with anisotropic OFF, , furthermore
many textures bug added with option activated (predictable but not see
on DX11 with anisotropic ON).
TL;DR doesn't worth it.
Note: it seem to work on DX because DX uses HW texturing in clamp region
mode (and others invalid case). OpenGL uses SW texturing to ensure accuracy
By default, anisotropic filtering was disabled when textures aren't countinuous.
This hack allows to force it. It can help to reduce aliasing but it would create
unexpected effect on texture boundaries.
Again, someone ought to add the option on Windows too
Texture coordinate could be dummy/float/int integral/int normalized.
Old behavior:
* VS was in charge to select the texture coordinate
* int integral format wasn't supported
New behavior:
* Always compute all formats
* FS will be in charge to select the good format
Impact:
* VS will be slightly slower but it reduces shaders permutation from
little to 0 (won't be bad for CPU)
* FS speed isn't impacted as 2 separate code paths were already required
to support both format
* Rasterizer will be 33% slower but unlikely to be the limited factor of
the GPU
* In future we could directly use the integral format in the FS.
V2: remove useless PSin_t
It would be on by default. Unsafe & fast path.
The hack is a security if someone encounters any issue
v2: update Windows gui file
v3: fix typo in tooltip and linux gui
Initially it was free to do the SW blending because safe fbmask
will already do a sw blending.
Unsafe version uses a fast path with a limited blending. Therefore
SW blending isn't free anymore.
Improve the speed of the previous speed hack (xenosaga 1)
The hack relies on the undefined behavior of the hardware so it can
potentially generate rendering corruption.
This new hack drops the cache flusing when only the alpha channel is masked.
Alpha is a direct copy of the fragment. Normally masked bits will be constant
everywhere (RT, FS output, texture cache) so it would likely work.
Just in case, code is only enabled with the new shiny hack
runion/rempty/rinter requires x < z and y < w
Help issue #762 (accurate blending issue)
If you want to shine, please put better GSVector code (AVX512 is 2 instruction :p)
Function pointer was mangled to avoid any collision. Nowadays all symbols
are hidden so no risk of collision.
Syntax is nicer beside it would allow to put back GLES3.2. I think it
supports most of the used extension.
glActiveTexture & glBlendColor are provided without symbol query.
A couple of useless members were removed too.
Also fix wnd initialization
Coverity:
CID 146955 (#1 of 1): Uninitialized pointer read (UNINIT)
18. uninit_use: Using uninitialized value wnd[i].
* add lengthly comment to explain the format
* Likely reduce the number of shader permutation
* Avoid slow AEM (on GPU)
Expect regressions because TC needs some fixes
v2: fix palette mode