Much faster for small batch that write the alpha value. Code can
be enabled with accurate_date option.
Here a summary of all DATE possibilities:
1/ no overlap of primitive
=> texture barrier (pro no setup of stencil and single draw)
2/ alpha written
=> small batch => texture barrier (primitive by primitive). Done in N-primitive draw calls.
(based on GL_ARB_texture_barrier)
=> bigger batch => compute the first good primitive, slow but only 2 draw calls.
(based on GL_ARB_shader_image_load_store)
=> Otherwise there is the UserHacks_AlphaStencil but it is a hack!
3/ alpha written
=> full setup of stencil ( 2 draw calls)
If there is no overlap, it is allowed to directly read from the render target.
On SotC testcase with 6x scaling: 30fps -> 40fps
Note: it requires GL_ARB_texture_barrier extension so be sure to have a recent driver
Note2: it requires a lots of testing too
Open question: in case of complex date (written alpha)
Will it be faster to split the draw call into multiple call with no
primitive overlap
UserHacks_UnscaleSprite = 1 will unscale flat sprites
UserHacks_UnscaleSprite = 2 will unscale all sprites (don't work well so far)
The idea of the hack is to redo the interpolation of texture coordinate
based on the non-upscaled pixel position.
It avoids various glitches but sprites aren't upscaled anymore (so no
more anti-aliasing, potentially a coefficient can be added).
* separate VS/GS and FS
* separate subroutine part of the FS
It already complex enough without subroutine stuff. Besides I'm not sure
we will keep subroutine on the future.
* Only a single VAO
=> Format is set once
=> Only a single bind at startup
=> GSVertexBufferStateOGL is nearly useless
=> barely faster but better than nothing :)
* properly detect gl nv depth extension
* Always show the hack on the gui. Add a new hack option for DATE (gl4.2) only
* Save the scan mode on linux too (f7)
* hopefully fix some crash on some drivers... (ensure aligment 256 bits alignment, and if not use std memcpy)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5888 96395faa-99c1-11dd-bbfe-3dabce05a288
Best setting if you driver support GL_NV_depth_buffer_float => GL_NV_Depth = 1 & logz = 0
Otherwise => GL_NV_Depth = 0 & logz = 1
Explanation of the bug:
Dx z position ranges from 0.0f to 1.0f (FS ranges 0.0f to 1.0f)
GL z Position ranges from -1.0f to 1.0f (FS ranges 0.0f to 1.0f)
Why it sucks:
GS small depth value will be "mapped" to -1.0f. In others all small values will be 1.0f! Terrible lost
of accuraccy.
The GL_NV_depth_buffer_float extension allow to set the near plane as -1.0f.
So
"GL z Position ranges from -1.0f to 1.0f (FS ranges 0.0f to 1.0f)"
will become
"GL z Position ranges from -1.0f to 1.0f (FS ranges -1.0f to 1.0f)"
and therefore
"z posision [0.0f;1.0f] will map to FS [0.0f;1.0f]" as DX
Yes we just get back all precision lost previously :)
However you need hardware (intel?) and driver support (free driver?/gles?) :(
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5860 96395faa-99c1-11dd-bbfe-3dabce05a288
* restore the old fxaa (Asmodeam will be integrated when I got time)
* port the recently added new scanline algo
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5818 96395faa-99c1-11dd-bbfe-3dabce05a288
* try to use more subroutine on VS&PS, unfortunately hit a driver crash!
* Call Attach/DetachContext through GSDevice so I can unmap currently mapped buffer
* Implement glsl part of GL_ARB_bindless texture, again hit another driver crash!
* various fix of GL_ARB_buffer_storage. Basic benchmark show only improvement on 'cold' case, I guess it will improve smoothness
* try to fix GL_clear_texture, no success so far. It seem the extension is limited to basic texture (aka no depth/stencil)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5752 96395faa-99c1-11dd-bbfe-3dabce05a288
* GL_ARB_shader_subroutine for perf
fix for nvidia => add missing shader declaration. Nvidia got +4fps on colin3 :)
For the moment only 2 PS parameters are supported. Code need to be extended to support others games that often
switch shader program (like xenosaga).
require GL4 class hardware and the option override_GL_ARB_shader_subroutine = 1
Note: strangely on AMD linux it is slower!
* GL_ARB_shader_image_load_store for accuraccy (Date)
Use a signed integer texture and reenable color buffer writing
Current status: Amagami_transparency.gs & P3_battle_shadows.gs are now working on Nvidia with a small perf impact.
Current implementation detail:
1/ setup the standard stencil as before
2/ on remaining pixel, draw once to compute first primitive that will write a fail alpha value.
3/ final draw based on primitive id of step 2
Note: I think we would get a bad behavior if depth test&mask are enabled on step 2/3
Note2: on my limited testcase the perf impact was on CPU. It would be possible to merge step1&2 to nullifying it (could
even be faster actually), however it would require more GPU power.
Again require GL4 class hardware. And the option UserHacks_DateGL4 = 1
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5725 96395faa-99c1-11dd-bbfe-3dabce05a288
* some preliminary work to test/benchmark bindless texture in the future (glsl was not yet updated)
Bindless texture allow to get a GPU texture pointer and then set it directly
to the shader as a basic uniform.
=> no more texture unit selection/validation
=> no more texture validation neither texture hash lookup
3rdparty: update gl header to the latest gl4.4
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5720 96395faa-99c1-11dd-bbfe-3dabce05a288
Card that support gs:
remain only a gs to generate sprite from a line. Even dummy gs are costly for the GPU.
Card that don't support gs:
remove useless copy of color for line and triangle primitives
Note for dx: opengl 3.2 (maybe not gles) supports both flat interpolation
convention (GL_FIRST_VERTEX_CONVENTION or GL_LAST_VERTEX_CONVENTION). It might
be possible to shuffle vertex index to put the last vertex in first position.
- buff[0] = head + 0;
- buff[1] = head + 1;
- buff[2] = head + 2;
+ buff[0] = head + 2;
+ buff[1] = head + 1;
+ buff[2] = head + 0;
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5718 96395faa-99c1-11dd-bbfe-3dabce05a288
The idea was to replace shader program swith by pointer function calls inside
shaders. At least parameters that are often changed between draw call. So far
I only ported atst and colclip. Unfortunately code is "slower" (on GSdx standalone).
For the moment keep the code but disabled.
If I understand well the validation of program is done in the "driver thread"
but the additional call are done in the overloaded MTGS thread. Apitrace
profiling shows faster GPU draw calls. Another possibility is that the driver still
need to validate the draw call because of others state change.
Here some stats on colin3 (90 frames):
without subroutine: UseProgram 125246
with subroutine: UseProgram 2906, subroutine 125945 => 3605 extra calls overhead (not
all parameters are ported to subroutine)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5715 96395faa-99c1-11dd-bbfe-3dabce05a288
* preliminary work for GL4.4 extensions (ARB_clear_texture & ARB_multi_bind). Disabled until I got a 4.4 driver
Note: I plan also to use ARB_buffer_storage
* compute texture gl option in the constructor (avoid a couple of swith case)
* redo texture unit management. Unit 0-2 for shaders, Unit 3 for texture operations. MultiBind will allow to bind
shader input without disturbing texture binding points.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5711 96395faa-99c1-11dd-bbfe-3dabce05a288
* add a non-working hack: UserHacks_DateGL4, goal was to replace UserHacks_AlphaStencil
+ Detection of good/bad samples is based on primitive ID variable. However I'm not sure
the behavior is always the same between draw call...Anyway let's keep a copy of the current
work
* Dump integer texture into text csv
* add gl4.2 ARB_shader_image_load_store extension (needed by UserHacks_DateGL4)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5707 96395faa-99c1-11dd-bbfe-3dabce05a288
* uniform was wrongly set before the activation of the program (free driver only)
* Always use only 1 drawbuffer. Easier besides previous setup was wrong for convert:ps_main1
GLES trial (v3):
* add the cmake option GLES_API. Note library (libgles) are hardcoded for the moment
* Disable opengl check
* Disable gl_GetDebugMessageLogARB not supported!
* Emulate gl_DrawElementsBaseVertex, add manually the index offset (surely slow but work)
* Fix hundred of shader error (no implicit cast of integer to float...)
Unfortunately GLES doesn't support dual blending so no blend in hardware renderer. Otherwise it is fine
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5700 96395faa-99c1-11dd-bbfe-3dabce05a288
* replace vertex interface with block interface. It avoid to depends on the ARB_sso extension.
* disable geometry shader on Nvidia & Linux. Slower but better than a black screen !
* default logz to 1, avoid some glitches.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5699 96395faa-99c1-11dd-bbfe-3dabce05a288
* redo glsl2h.pl script to generate only one big glsl headers.
* fix gcc warning in GSVector.h
* fix memory leak of GSDeviceOGL.m_shader
* clean shader compilation function => split generation header & drop malloc stuff
cmake:
* only rebuild shader when asked by the use. Avoid perl dependency to build pcsx2
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5698 96395faa-99c1-11dd-bbfe-3dabce05a288