Function pointer was mangled to avoid any collision. Nowadays all symbols
are hidden so no risk of collision.
Syntax is nicer beside it would allow to put back GLES3.2. I think it
supports most of the used extension.
glActiveTexture & glBlendColor are provided without symbol query.
Actually it can partially be done with GL_ARB_shader_image_load_store
extension. However all drivers that support shader_image have
texture barrier too.
Code was completey bitrotten
Code was a partial test (and yet 500 lines already)
Shader is more and more complex and multithreading support greatly
reduce the cost of shader switch
Nvidia allows to get the ASM of the shader of the compiled shader. It is useful
to check the performance.
It also allow me to compile most of shader code path for QA
Dump is enabled in linux replayer + debug_glsl_shader = 2
Much faster for small batch that write the alpha value. Code can
be enabled with accurate_date option.
Here a summary of all DATE possibilities:
1/ no overlap of primitive
=> texture barrier (pro no setup of stencil and single draw)
2/ alpha written
=> small batch => texture barrier (primitive by primitive). Done in N-primitive draw calls.
(based on GL_ARB_texture_barrier)
=> bigger batch => compute the first good primitive, slow but only 2 draw calls.
(based on GL_ARB_shader_image_load_store)
=> Otherwise there is the UserHacks_AlphaStencil but it is a hack!
3/ alpha written
=> full setup of stencil ( 2 draw calls)
Group opengl calls into a nice name.
Apitrace shows them in a tree format that support folding. Previously it
was a long flat list (10K-40K of lines by frame)
I align the call number with the internal s_n variable. This way it is
easy to map GSdx dump output with the GL debugger :)
ogl_texture_storage 1 creates texture corruption.
Advance date is too slow, code need to be updated (properly) to uses 2 passes only not 3
Maybe one could be enough (sometimes)
Copy the full line into the pbo. Dma will only take GL_UNPACK_ROW_LENGTH
- increase memcpy size by 2 in the pbo
+ single memcpy will be faster and can use sse
Enable buffer_storage extension:
* GL_CLIENT_STORAGE_BIT was required (it is the duty of TexSubImage to copy data into the GPU mem)
* Enable the extension by default
A couple of fallbacks were introduced for the Mesa driver that only support 3.0
DSA will require a recent Mesa which already support GL3.3
Require at least SandyBridge for Intel GPU
* use DebugOutputToFile as a callback of gl error. Add a breakpoint to
find the culprit GL call
* use string instead of char[n]
Note: CheckDebugLog is potentially useless now
* GL_ARB_clip_control: reduce z fighting
* GL_ARB_clear_texture: no real speed gain (but improve code quality)
* GL_ARB_bindless_texture: +1fps (if you're CPU limited)
* Fix to create a 3.0 GLES context
* set a default precision to fix most of shader compilation issue
* Crash later because of GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT
=> need to test opensource driver
Note: DATE is used for shadows (persona 3) and others effect
You can disable it when you disable gl image extension (override_GL_ARB_shader_image_load_store = 0)
You can also enable it on AMD too (set it to 1 this time) but no guarantee.
If you feel the extension is too slow, you can try disable some gl barrier (aka "damn the torpedo full steam ahead!").
It can be done with the option UserHacks_DateGL4 = 1. Otherwise just disable the extension.
Note: don't enable UserHacks_AlphaStencil in the same time. GL_ARB_shader_image_load_store is an alternative implementation.
Enabling both in the same time will lead to undefined (well surely wrong) behavior.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5890 96395faa-99c1-11dd-bbfe-3dabce05a288
* gui refresh
+ Use some tab to reduce heigth for small screen
+ Add logz option
+ remove broken/experimental keyword. GSdx ogl is not too bad ;)
* autodetect GL_NV_depth_buffer_float
Linux tester you are welcome!
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5862 96395faa-99c1-11dd-bbfe-3dabce05a288
Best setting if you driver support GL_NV_depth_buffer_float => GL_NV_Depth = 1 & logz = 0
Otherwise => GL_NV_Depth = 0 & logz = 1
Explanation of the bug:
Dx z position ranges from 0.0f to 1.0f (FS ranges 0.0f to 1.0f)
GL z Position ranges from -1.0f to 1.0f (FS ranges 0.0f to 1.0f)
Why it sucks:
GS small depth value will be "mapped" to -1.0f. In others all small values will be 1.0f! Terrible lost
of accuraccy.
The GL_NV_depth_buffer_float extension allow to set the near plane as -1.0f.
So
"GL z Position ranges from -1.0f to 1.0f (FS ranges 0.0f to 1.0f)"
will become
"GL z Position ranges from -1.0f to 1.0f (FS ranges -1.0f to 1.0f)"
and therefore
"z posision [0.0f;1.0f] will map to FS [0.0f;1.0f]" as DX
Yes we just get back all precision lost previously :)
However you need hardware (intel?) and driver support (free driver?/gles?) :(
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5860 96395faa-99c1-11dd-bbfe-3dabce05a288
* allow to control the gl dectection from the gui (+1 for the user-friendliness ;) )
* disable geometry shader by default on Intel driver
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5783 96395faa-99c1-11dd-bbfe-3dabce05a288
* try to use more subroutine on VS&PS, unfortunately hit a driver crash!
* Call Attach/DetachContext through GSDevice so I can unmap currently mapped buffer
* Implement glsl part of GL_ARB_bindless texture, again hit another driver crash!
* various fix of GL_ARB_buffer_storage. Basic benchmark show only improvement on 'cold' case, I guess it will improve smoothness
* try to fix GL_clear_texture, no success so far. It seem the extension is limited to basic texture (aka no depth/stencil)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5752 96395faa-99c1-11dd-bbfe-3dabce05a288
* GL_ARB_shader_subroutine for perf
fix for nvidia => add missing shader declaration. Nvidia got +4fps on colin3 :)
For the moment only 2 PS parameters are supported. Code need to be extended to support others games that often
switch shader program (like xenosaga).
require GL4 class hardware and the option override_GL_ARB_shader_subroutine = 1
Note: strangely on AMD linux it is slower!
* GL_ARB_shader_image_load_store for accuraccy (Date)
Use a signed integer texture and reenable color buffer writing
Current status: Amagami_transparency.gs & P3_battle_shadows.gs are now working on Nvidia with a small perf impact.
Current implementation detail:
1/ setup the standard stencil as before
2/ on remaining pixel, draw once to compute first primitive that will write a fail alpha value.
3/ final draw based on primitive id of step 2
Note: I think we would get a bad behavior if depth test&mask are enabled on step 2/3
Note2: on my limited testcase the perf impact was on CPU. It would be possible to merge step1&2 to nullifying it (could
even be faster actually), however it would require more GPU power.
Again require GL4 class hardware. And the option UserHacks_DateGL4 = 1
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5725 96395faa-99c1-11dd-bbfe-3dabce05a288
zzogl: fix memory leak (fix#1431, #1432)
GSdx ogl: disable geometry shader on Nvidia/Windows (I will wait a 3rd implementation to find which one is correct)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5724 96395faa-99c1-11dd-bbfe-3dabce05a288
* redo most of the texture upload (PBO): colin3 benchmark: 32 fps now (vs 26 fps 2 weeks ago)
* use the cross vendor vsync extension on linux (previous wasn't supported by nvidia)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5721 96395faa-99c1-11dd-bbfe-3dabce05a288
* some preliminary work to test/benchmark bindless texture in the future (glsl was not yet updated)
Bindless texture allow to get a GPU texture pointer and then set it directly
to the shader as a basic uniform.
=> no more texture unit selection/validation
=> no more texture validation neither texture hash lookup
3rdparty: update gl header to the latest gl4.4
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5720 96395faa-99c1-11dd-bbfe-3dabce05a288
Card that support gs:
remain only a gs to generate sprite from a line. Even dummy gs are costly for the GPU.
Card that don't support gs:
remove useless copy of color for line and triangle primitives
Note for dx: opengl 3.2 (maybe not gles) supports both flat interpolation
convention (GL_FIRST_VERTEX_CONVENTION or GL_LAST_VERTEX_CONVENTION). It might
be possible to shuffle vertex index to put the last vertex in first position.
- buff[0] = head + 0;
- buff[1] = head + 1;
- buff[2] = head + 2;
+ buff[0] = head + 2;
+ buff[1] = head + 1;
+ buff[2] = head + 0;
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5718 96395faa-99c1-11dd-bbfe-3dabce05a288
The idea was to replace shader program swith by pointer function calls inside
shaders. At least parameters that are often changed between draw call. So far
I only ported atst and colclip. Unfortunately code is "slower" (on GSdx standalone).
For the moment keep the code but disabled.
If I understand well the validation of program is done in the "driver thread"
but the additional call are done in the overloaded MTGS thread. Apitrace
profiling shows faster GPU draw calls. Another possibility is that the driver still
need to validate the draw call because of others state change.
Here some stats on colin3 (90 frames):
without subroutine: UseProgram 125246
with subroutine: UseProgram 2906, subroutine 125945 => 3605 extra calls overhead (not
all parameters are ported to subroutine)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5715 96395faa-99c1-11dd-bbfe-3dabce05a288
* move most of gl states into a separate namespace. Extend it to depth/stencil/blend micro state
=> save 10,000 opengl call by frame for colin mcrae 3
* Only setup blend state of first drawbuffer
* Don't request anymore a debug context on dev/release build
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5713 96395faa-99c1-11dd-bbfe-3dabce05a288
* clean extension management and fix compilation of previous gl44 code.
* Use pixel buffer object to upload texture data.
=> avoid crash on AMD driver
=> a bit faster and probably got some margins for the future
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5712 96395faa-99c1-11dd-bbfe-3dabce05a288
* preliminary work for GL4.4 extensions (ARB_clear_texture & ARB_multi_bind). Disabled until I got a 4.4 driver
Note: I plan also to use ARB_buffer_storage
* compute texture gl option in the constructor (avoid a couple of swith case)
* redo texture unit management. Unit 0-2 for shaders, Unit 3 for texture operations. MultiBind will allow to bind
shader input without disturbing texture binding points.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5711 96395faa-99c1-11dd-bbfe-3dabce05a288
* add a non-working hack: UserHacks_DateGL4, goal was to replace UserHacks_AlphaStencil
+ Detection of good/bad samples is based on primitive ID variable. However I'm not sure
the behavior is always the same between draw call...Anyway let's keep a copy of the current
work
* Dump integer texture into text csv
* add gl4.2 ARB_shader_image_load_store extension (needed by UserHacks_DateGL4)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5707 96395faa-99c1-11dd-bbfe-3dabce05a288
* use gles header file, disable opengl code (mainly GLX, ARB_sso, geometry shader)
* Define properly the function pointer, GLES use basic linking whereas GL must get the symbol dynamically
* cmake: properly search and set libglesv2.so
* don't use dual source blending => HW renderer work (only miss unimportant FBA)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5701 96395faa-99c1-11dd-bbfe-3dabce05a288
* uniform was wrongly set before the activation of the program (free driver only)
* Always use only 1 drawbuffer. Easier besides previous setup was wrong for convert:ps_main1
GLES trial (v3):
* add the cmake option GLES_API. Note library (libgles) are hardcoded for the moment
* Disable opengl check
* Disable gl_GetDebugMessageLogARB not supported!
* Emulate gl_DrawElementsBaseVertex, add manually the index offset (surely slow but work)
* Fix hundred of shader error (no implicit cast of integer to float...)
Unfortunately GLES doesn't support dual blending so no blend in hardware renderer. Otherwise it is fine
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5700 96395faa-99c1-11dd-bbfe-3dabce05a288
* replace vertex interface with block interface. It avoid to depends on the ARB_sso extension.
* disable geometry shader on Nvidia & Linux. Slower but better than a black screen !
* default logz to 1, avoid some glitches.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5699 96395faa-99c1-11dd-bbfe-3dabce05a288
* remove the possibility to compile shader from file. Some people loads older shaders...
The feature might be readded later
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5697 96395faa-99c1-11dd-bbfe-3dabce05a288
* don't delete the wnd in GSclose. It can still be used later
* Properly detect the GL_ARB_gpu_shader5 extension for Fxaa
* A couple of fix in Create (GSopen1) of GSWndEGL/GSWndOGL
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5674 96395faa-99c1-11dd-bbfe-3dabce05a288
gsdx:
* move gl function loading into GSwnd. Clean the header and avoid to rely on macro.
* Always require libegl for GSdx. You still need to select the API at compilation.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5670 96395faa-99c1-11dd-bbfe-3dabce05a288
* detect Advanced Micro Devices for newer AMD Card...
* Mess with coordinate for StretchRect. Upscaling seem to work now. Surely the biggest diff between ogl and dx...
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5655 96395faa-99c1-11dd-bbfe-3dabce05a288
* factorize sample object creation
* remember frame buffer attachment state
* Use a basic context on EGL. Allow to use Mesa 9.1 on AMD GPU.
* precompile vertex and geometry shader to avoid benchmark polution on replay
* Try harder to detect FGLRX driver on window
* various clean
Remain to fix the coordinate system for upscaling
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5651 96395faa-99c1-11dd-bbfe-3dabce05a288
* Fix shader alpha issue. Hoppefully fix lots of glitches.
* Use glBufferSubData to upload data into the constant buffer instead of map/unmap. More fps :)
* Fix wrong api setup on EGL
* Be more portable on glsl2h
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5650 96395faa-99c1-11dd-bbfe-3dabce05a288
* clean texture management & use different texture unit for various texture operation
* separate sampler and texture setup
* Always disable GL_SCISSOR_TEST before any cleaning to be sure the all pixels are reset
* properly set upscale_multiplier in the linux gui (was mixed with msaa). Unfortunately it is still broken...
* Fix EGL with GSopen1
* Use stdcall for function definition in the replay (avoid segmentation fault)
* Directly use gl_Position in shader instead of an extra user defined variable
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5648 96395faa-99c1-11dd-bbfe-3dabce05a288
* replace most OGL_FREE_DRIVER with a dynamic detection. Remains the context creation. Closed drivers need 3.3
* Add the CopySubImage fallback
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5647 96395faa-99c1-11dd-bbfe-3dabce05a288
* Emulate Geometry Shader from the CPU.
* add some option to override opengl extension detection
* redo shader interface (again) to compile on the free driver
SW renderer is now working on the free driver.
To test it on your linux box use this cmake option -DEGL_API=TRUE
Note: (need opengl 3.0) I test mesa 9.2 git
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5646 96395faa-99c1-11dd-bbfe-3dabce05a288
* fix a bad interaction when GL_ARB_SSO is supported without GL_ARB_shading_language_420pack
* try to replace struct with flat parameter in glsl interface
=> with some hacks of the free driver, I was able to compile SW renderer shader. Unfortunately
the rendering is broken. Maybe my hack :p
* remove EGL context check (wasn't working as expected)
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5644 96395faa-99c1-11dd-bbfe-3dabce05a288
* fix wrong interaction when both GL41 or GL42 aren't supported
* Add a new define to downgrade opengl requirement to test the free driver and EGL
=> As far as I can tell, they only miss geometry shader& GLSL150 (got GLSL140)
* Found a way to avoid crash in AMD driver. Skip the upload of small texture but rendering is wrong now...
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5643 96395faa-99c1-11dd-bbfe-3dabce05a288
* replace both DISABLE_GL41_SSO and DISABLE_GL42 macro with a dynamic check based on the extension support
* glsl2h: use static instead of extern
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5636 96395faa-99c1-11dd-bbfe-3dabce05a288
* Remove bad redefinition of glActiveTexture and glBlendColor pointer
* Remove glew dependency on linux
* s/OGL_DEBUG/ENABLE_OGL_DEBUG/ and enable it on debug
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5512 96395faa-99c1-11dd-bbfe-3dabce05a288
* move GL loading function into its own namespace
* Use manual GL loading on linux too and mostly drop glew dependency. (Not 100% clean we are still depends on some glew define, it will be fixed later)
Note: shaders still need to be copied manually from GSdx/res to bin/plugins
git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5510 96395faa-99c1-11dd-bbfe-3dabce05a288