This reverts commit 53690cf9d0.
Quoting user:
For aliasing, the option allow of reduce a little but always very
visible compared with DX11 even with anisotropic OFF, , furthermore
many textures bug added with option activated (predictable but not see
on DX11 with anisotropic ON).
TL;DR doesn't worth it.
Note: it seem to work on DX because DX uses HW texturing in clamp region
mode (and others invalid case). OpenGL uses SW texturing to ensure accuracy
By default, anisotropic filtering was disabled when textures aren't countinuous.
This hack allows to force it. It can help to reduce aliasing but it would create
unexpected effect on texture boundaries.
Again, someone ought to add the option on Windows too
Texture coordinate could be dummy/float/int integral/int normalized.
Old behavior:
* VS was in charge to select the texture coordinate
* int integral format wasn't supported
New behavior:
* Always compute all formats
* FS will be in charge to select the good format
Impact:
* VS will be slightly slower but it reduces shaders permutation from
little to 0 (won't be bad for CPU)
* FS speed isn't impacted as 2 separate code paths were already required
to support both format
* Rasterizer will be 33% slower but unlikely to be the limited factor of
the GPU
* In future we could directly use the integral format in the FS.
V2: remove useless PSin_t
It would be on by default. Unsafe & fast path.
The hack is a security if someone encounters any issue
v2: update Windows gui file
v3: fix typo in tooltip and linux gui
Initially it was free to do the SW blending because safe fbmask
will already do a sw blending.
Unsafe version uses a fast path with a limited blending. Therefore
SW blending isn't free anymore.
Improve the speed of the previous speed hack (xenosaga 1)
The hack relies on the undefined behavior of the hardware so it can
potentially generate rendering corruption.
This new hack drops the cache flusing when only the alpha channel is masked.
Alpha is a direct copy of the fragment. Normally masked bits will be constant
everywhere (RT, FS output, texture cache) so it would likely work.
Just in case, code is only enabled with the new shiny hack
runion/rempty/rinter requires x < z and y < w
Help issue #762 (accurate blending issue)
If you want to shine, please put better GSVector code (AVX512 is 2 instruction :p)
Function pointer was mangled to avoid any collision. Nowadays all symbols
are hidden so no risk of collision.
Syntax is nicer beside it would allow to put back GLES3.2. I think it
supports most of the used extension.
glActiveTexture & glBlendColor are provided without symbol query.
A couple of useless members were removed too.
Also fix wnd initialization
Coverity:
CID 146955 (#1 of 1): Uninitialized pointer read (UNINIT)
18. uninit_use: Using uninitialized value wnd[i].
* add lengthly comment to explain the format
* Likely reduce the number of shader permutation
* Avoid slow AEM (on GPU)
Expect regressions because TC needs some fixes
v2: fix palette mode
Intially GSBlendStateOGL was an alias of the m_blendMapD3D9 array
The object was replaced by an index in the array. Save 2k of memory duplication.
And too much useless code.
v2: push/pop blending state in DATE stuff
v3: remove m_state which is useless now
The purpose is to emulate correctly destination alpha factor
An alpha channel of 128 is 1.0 in the GS but only ~0.5 in the GPU
I think few draw call use destination alpha so impact on perf must remains small.
Actually it can partially be done with GL_ARB_shader_image_load_store
extension. However all drivers that support shader_image have
texture barrier too.
The idea is to use a floating texture to accumulate the data and
then do a final postprocessing pass to apply the modulo
v2:
* use bounding box to
* fix vertex corruption issue
* use negative number in shader which allow to use half float (+12
fps@4x)
Basically the code does the alpha multiplication in the shader therefore
the blend unit only does a pure addition. This way the multiplication is
accurate and accurate_blending doesn't requires a costly barrier.
This code also avoid variable duplication to make the code more separated.
Hopefully blending can be done in a separated function
It is preliminary work to support fast color clipping with HDR
v2: fix assertion compilation failure
v3: fix regression in not accurate mode
v3: Cs * As/Af is not an accumulation
Those cases don't need the Cd addition and were already optimized anyway
Fix a regression on GoW2
Do DATE algo selection before blending. This way we can detect bad
interaction.
Regroup all blending/colclip in a single block. Avoid to check abe &&
rt multiple times.
v2: only enable sw blending when abe is true
The updated medium level will run for all sprites. It helps sotc blooming effect and it remains
fast enough to be enabled by default (at least on 3D games)
The new high level will run for all sprites + color clipping
The idea is that sprites are often use for post-processing effect (ofc except 2D games)
Most of the time post-processing supports SW blending with a small speed penality. SW
blending is more accurate so it is better to use it.
Gain: 1% at 4x on SotC (it partially compensates recent additions)
When the color is constant and equal to 128, the MODULATE mode is
equivalent to the DECAL mode. It saves 5 instructions on the FS.
Accurate options do a better jobs. Technically it can still
be useful for old gpu/driver that doesn't support the GL4.5 extension.
On Windows, you can still rely on Dx
On linux, free driver support it (except Intel)
Code is not yet enabled because it requires extensive test
The idea is to replace point by a 1 pixels sprite with the help of
a geometry shader. In 4x, point will be replaced by a 4x4 sprite.
// GL42 interact very badly with sw blending. GL42 uses the primitiveID to find the primitive
// that write the bad alpha value. Sw blending will force the draw to run primitive by primitive
// (therefore primitiveID will be constant to 1)
It might help to fix a bit the color on a couple of games
accurate_fbmask = 1
Code uses GL4.5 extensions. So far it seems the effect is ony used a couple
of time and often in non-overlapping primitive. Speed impact will likely remain small
GS doesn't supports texture shuffle/swizzle so it is emulated in a
complex way.
The idea is to read/write the 32 bits color format as a 16 bit format.
This way, RG (16 lsb bits) or BA (16 msb bits) can be read or written with
square texture that targets pixels 1-8 or pixels 8-16.
However shuffle is limited. For example you can copy the green channel
to either the alpha channel or another green channel.
Note: Partial masking of channel is not yet implemented
V2: improve logging
V3: better support of green channel in shader
V4: improve detection of destination (issue due to rounding)
Gow uses 24 bits buffer, so only color is updated but blending is configured as Cd
so it is a NOP
In this case, we don't lookup the target in the texture cache. It reduces the complexity
to handle depth which can be located at same address as RT
Note: please test DX renderer
Please test it!
GS supports 3 formats for the output:
32 bits: normal case
=> no change
24 bits: like 32 bits but without alpha channel
=> mask alpha channel (ie don't write it anymore)
=> Always uses 1.0f as blending coefficient
16 bits: RGB5A1, emulated by a 32 bits openGL texture. I think it will be more correct to use
a real 16 bits GL texture. Unfortunately it would cost several (slow) target conversions.
Anyway as a current solution
=> apply a mask of 0xF8 on color when SW blending is used (improve Castlevania shadow)
unfortunately normal blending mode still uses the full range of colors!
This commit also corrects a couple of blending factor. 128/255 is equivalent to 1.0f in PS2, whereas GPU uses 1.0f. So the blending factor must be 255/128 instead of 2
Note: disable CRC hack and enable accurate_colclip to see Castlevania shadow ^^
(issue #380).
Note2: SW renderer is darker on Castlevania. I don't know why maybe linked to the 16 bits format poorly emulated
The purpose of the code is to support alpha channel
of RT uses as an index for a palette texure.
I'm afraid that code will likely break pure palette texture. Only used
if paltex is enabled
It fixes missing shadow in Star Ocean 3 (issue #374) in Native resolution
with filter = 0 (no filtering) or = 2 (normal fitering)
Rendering explanation:
The game emulates a stencil buffer with the alpha channel
The alpha channel of the RT can contains a palette texture index (format 4HH)
The idea is to have a gradient of value in the palette (16/32/48/...).
This way you can implement a +16/-16 and even wrap the alpha value every time
you hit the pixel.
Bilinear filtering breaks the rendering because it interpolates between counts
so you doesn't have the exact count
Upscaling breaks the rendering because the RT is reused as an input texture. It means
that we need to scale it down which again create some interpolations.
* Dump context before the increase of s_n
=> aligned with the global call number
* Don't print colclip not supported when it is optimized away
* dettach the input texture when it is useless
=> avoid to show a wrong texture in the debugger
This way it will allow to implement all blendings operartion in FS.
Of course it will be slow, but it would be nice for debug and quickly check
game error rendering.
Currently colclip uses 2 passes to wrap the output of blending unit
However some blending mode are only a plain copy (of 0 or Cs or Cd).
So no overflow of [0:255], no need to wrap it
Note: I saw those cases in GoW.
Much faster for small batch that write the alpha value. Code can
be enabled with accurate_date option.
Here a summary of all DATE possibilities:
1/ no overlap of primitive
=> texture barrier (pro no setup of stencil and single draw)
2/ alpha written
=> small batch => texture barrier (primitive by primitive). Done in N-primitive draw calls.
(based on GL_ARB_texture_barrier)
=> bigger batch => compute the first good primitive, slow but only 2 draw calls.
(based on GL_ARB_shader_image_load_store)
=> Otherwise there is the UserHacks_AlphaStencil but it is a hack!
3/ alpha written
=> full setup of stencil ( 2 draw calls)
No barrier => draw all primitives
Barrier but without overlap => draw all primitives
Barrier with overlap => draw primitive by primitve
It will ease the implementation of accurate blending and why not date too