Commit Graph

272 Commits

Author SHA1 Message Date
Gregory Hainaut 98c74879bf Merge pull request #718 from PCSX2/depth-color-direct-write
Depth color direct write
2015-08-10 15:50:48 +02:00
Gregory Hainaut 9f92f63194 gsdx-ogl: Use GetAlphaMinMax to limit the scope of FULL accurate blending
Provide a massive speed up in this level.
2015-08-09 13:44:31 +02:00
Gregory Hainaut 61694013a5 gsdx-ogl: compact blending parameter structure
Save 656B of data. It is good for the cache.
2015-08-09 13:44:30 +02:00
Gregory Hainaut df3ade896b gsdx-ogl: use integer for blend factor
Integer argument&comparison might be lighter

V2: Forget to change one OMSetBlendState call
2015-08-09 13:44:05 +02:00
Gregory Hainaut 5b57405517 gsdx-ogl: blend management cleanup
* reorder the blend function
* remove OM bsel object
* add a bit to support pabe (miss the glsl part)
2015-08-08 09:18:09 +02:00
Gregory Hainaut 4d12410707 gsdx-ogl: latch constant buffer in rendering object
* Initialization of the object is done once
* Avoid to reupload it when an useless parameter toggle
 => -10% of UBO update
2015-08-08 09:18:09 +02:00
Gregory Hainaut b17803bb34 gsdx-ogl: wipeout of GL_ARB_bindless_texture
Code is completely broken. It doesn't help to improve speed.

Remove 200 lines ;)
2015-08-08 09:16:49 +02:00
Gregory Hainaut b7e16b5989 gsdx-ogl: clean the blending management
Intially GSBlendStateOGL was an alias of the m_blendMapD3D9 array

The object was replaced by an index in the array. Save 2k of memory duplication.
And too much useless code.

v2: push/pop blending state in DATE stuff
v3: remove m_state which is useless now
2015-08-05 22:55:12 +02:00
Gregory Hainaut 717f0fcb4d gsdx-ogl: optimize fbmask setup 2015-08-05 22:55:12 +02:00
Gregory Hainaut 73e2ff6ff6 gsdx-ogl: change PrimitiveOverlap algo from O(n^2) to O(n)
+ only enable this optimization when it is useful (date or sw blending)

Less impact on the perf even for big vertex array
2015-08-05 17:59:55 +02:00
Gregory Hainaut 36554c3375 Merge branch 'hdr-colclip-32bits' 2015-08-04 21:55:40 +02:00
Gregory Hainaut 45bb27d6db gsdx-ogl: extend HDR colclip to 32 bits texture
Unfortunately 16 bits wasn't enough for Castlevania.
2015-08-04 21:54:27 +02:00
Gregory Hainaut c6ff7531fb gsdx-ogl: performance boost on virtuafighter 2015-08-04 21:26:03 +02:00
Gregory Hainaut d80aa0b0bd gsdx-ogl: remove 2 printfs
GL_INS is a better tracing solution
2015-08-04 20:47:02 +02:00
Gregory Hainaut 744f9ebc09 gsdx-ogl: rare corner case when both texture shuffle and date are enabled
In texture shuffle mode the texture data is either RG or BA. It means
that DATE must either checks MSB of G or A.

Close #693
2015-08-04 20:10:44 +02:00
Gregory Hainaut 3784ea768f gsdx: check null pointer when doing a texture clear 2015-08-04 19:26:17 +02:00
Gregory Hainaut 8424c18e9f Merge pull request #688 from PCSX2/hdr-colclip
Hdr colclip
2015-08-02 18:13:28 +02:00
Gregory Hainaut 1f402b1b56 gsdx-ogl: fix bad detection of overlapping
avoid rendering corruption with SW blending
2015-08-01 13:29:25 +02:00
Gregory Hainaut ec007ac8d0 gsdx-ogl: support accurate blending without geometry shader
For the Mesa driver
2015-08-01 13:26:15 +02:00
Gregory Hainaut eb0fa8c7dc gsdx-ogl: fix bad detection of overlapping
avoid rendering corruption with SW blending
2015-08-01 01:27:22 +02:00
Gregory Hainaut 8452d2ccfe gsdx-ogl: fbmask regression! don't use bit operation with integer 2015-07-31 21:10:58 +02:00
Gregory Hainaut fff59f547d gsdx-ogl: fbmask regression! don't use bit operation with integer 2015-07-31 19:43:06 +02:00
Gregory Hainaut a0edcb58af gsdx-ogl: extend cclip blending level with destination alpha blending
The purpose is to emulate correctly destination alpha factor

An alpha channel of 128 is 1.0 in the GS but only ~0.5 in the GPU

I think few draw call use destination alpha so impact on perf must remains small.
2015-07-31 09:45:28 +02:00
Gregory Hainaut 8554f32086 gsdx-ogl: clean the blend table
Remove old shader define
Prefix macro with BLEND_
Add some notes to explain the special symbol
2015-07-31 09:45:28 +02:00
Gregory Hainaut cfd0fd6cc8 gsdx-ogl: remove old colclip algo 2015-07-31 09:45:28 +02:00
Gregory Hainaut 93c47feb7c gsdx-ogl: replace old colclip algo with the HDR algo
Similar speed but more accurate

Allow to clean the code
2015-07-31 09:45:28 +02:00
Gregory Hainaut 83f874db93 gsdx-ogl: remove bsel.ps
Just clear bsel.abe to disable blending
2015-07-31 09:45:28 +02:00
Gregory Hainaut 25298c70f7 gsdx-ogl: move blending management into a separate function 2015-07-31 09:45:28 +02:00
Gregory Hainaut 25bd5f5e85 gsdx-ogl: request texture barrier to emulate accurate date
Actually it can partially be done with GL_ARB_shader_image_load_store
extension. However all drivers that support shader_image have
texture barrier too.
2015-07-31 09:45:28 +02:00
Gregory Hainaut 2901e94ebc gsdx-ogl: always bind the RT as input texture
To avoid code duplication
2015-07-31 09:45:28 +02:00
Gregory Hainaut 1fe3e04ce3 gsdx-ogl: don't alias m_env/m_context variable
It is cumbersome to move code
2015-07-31 09:45:28 +02:00
Gregory Hainaut 8f27a5a92b gsdx-ogl: only enable fast accurate colclip in level3
Until we drop the old method
2015-07-31 09:45:05 +02:00
Gregory Hainaut e026f1bac6 gsdx-ogl: implement a fast accurate colclip algo
The idea is to use a floating texture to accumulate the data and
then do a final postprocessing pass to apply the modulo

v2:
* use bounding box to
* fix vertex corruption issue
* use negative number in shader which allow to use half float (+12
  fps@4x)
2015-07-30 18:34:52 +02:00
Gregory Hainaut aa8f5848d1 gsdx-ogl: always issue a barrier when requested
Safer this way
2015-07-30 18:24:36 +02:00
Gregory Hainaut 88bd0996f5 gsdx-ogl: only print same tex/rt message when prims overlaps
Avoid most of the false positive
2015-07-30 18:24:32 +02:00
Gregory Hainaut ae8df002af gsdx-ogl: optimize Cs * As + Cd and Cs * Af + Cd blending
Basically the code does the alpha multiplication in the shader therefore
the blend unit only does a pure addition. This way the multiplication is
accurate and accurate_blending doesn't requires a costly barrier.

This code also avoid variable duplication to make the code more separated.
Hopefully blending can be done in a separated function

It is preliminary work to support fast color clipping with HDR

v2: fix assertion compilation failure

v3: fix regression in not accurate mode

v3: Cs * As/Af is not an accumulation

Those cases don't need the Cd addition and were already optimized anyway

Fix a regression on GoW2
2015-07-30 18:22:59 +02:00
Gregory Hainaut 12fdc37599 gsdx-ogl: reorganize blending in the renderer
Do DATE algo selection before blending. This way we can detect bad
interaction.

Regroup all blending/colclip in a single block.  Avoid to check abe &&
rt multiple times.

v2: only enable sw blending when abe is true
2015-07-30 18:21:01 +02:00
Gregory Hainaut caadc73e1b gsdx-ogl: add a new level for accurate blending
The updated medium level will run for all sprites. It helps sotc blooming effect and it remains
fast enough to be enabled by default (at least on 3D games)
The new high level will run for all sprites + color clipping
2015-07-30 18:21:01 +02:00
Gregory Hainaut 01a1b1a5e6 gsdx-ogl: add the code to handle point and line in SW blending 2015-07-30 18:21:01 +02:00
Gregory Hainaut a85894e159 gsdx-ogl: move texture shuffle and fbmask into a dedicated function
DrawPrims is really too big now
2015-07-30 18:21:01 +02:00
Gregory Hainaut 5c1b8986c6 gsdx-ogl: use SW blending when no barrier is required
Speed penality is small (only GPU) but it is more accurate
2015-07-30 18:21:00 +02:00
Gregory Hainaut 8c8fe633a5 gsdx-ogl: merge 3 accurate* option into a nice combobox
It is much easier to configure this way
2015-07-30 18:21:00 +02:00
Gregory Hainaut 8da63cf95a gsdx-ogl: try to enable sw blending for sprite rendering
The idea is that sprites are often use for post-processing effect (ofc except 2D games)

Most of the time post-processing supports SW blending with a small speed penality. SW
blending is more accurate so it is better to use it.
2015-07-30 18:21:00 +02:00
Gregory Hainaut cb4af8fe83 gsdx-ogl: small optimization for the GPU
Gain: 1% at 4x on SotC (it partially compensates recent additions)

When the color is constant and equal to 128, the MODULATE mode is
equivalent to the DECAL mode. It saves 5 instructions on the FS.
2015-07-21 08:19:36 +02:00
Gregory Hainaut 5c740ff41e gsdx-ogl: wipeout AlphaStencil & Alpha hack
Accurate options do a better jobs. Technically it can still
be useful for old gpu/driver that doesn't support the GL4.5 extension.

On Windows, you can still rely on Dx

On linux, free driver support it (except Intel)
2015-07-19 22:43:48 +02:00
Gregory Hainaut 88cd333839 gsdx-ogl: don't enable both HW&SW blending 2015-07-18 16:15:52 +02:00
Gregory Hainaut c701ab4368 glsl: don't use normalized value for color range
Globally shader uses less intruction (except blending part)

It would also allow to improve the rounding of color
2015-07-18 14:41:03 +02:00
Gregory Hainaut 698dda2310 gsdx-ogl: remove subroutine from the gui too 2015-07-17 22:28:59 +02:00
Gregory Hainaut 784822a5c2 glsl: redo blending management to use A/B/C/D directly
1/ Code is much more readable
2/ It will allow to round differently the operation in the future
2015-07-17 21:08:49 +02:00
Gregory Hainaut a4bad8fdbc gsdx-ogl: avoid a bad conflict between accurate option 2015-07-11 14:35:35 +02:00
Gregory Hainaut 15b934eb2a gsdx-ogl: remove useless colclip message 2015-07-11 14:35:35 +02:00
Gregory Hainaut 91fbe6f108 gsdx-ogl: add some code to fix black netting on some renderings
Code is not yet enabled because it requires extensive test

The idea is to replace point by a 1 pixels sprite with the help of
a geometry shader. In 4x, point will be replaced by a 4x4 sprite.
2015-07-11 14:35:35 +02:00
Gregory Hainaut 2ccf108534 gsdx-ogl: add back a selector for the Geometry Shader 2015-07-11 14:35:34 +02:00
Gregory Hainaut abec4bd760 gsdx-ogl: don't enable aout when using accurate fbmask 2015-07-03 21:21:56 +02:00
Gregory Hainaut 4dbe71cba8 gsdx-ogl: disable SW blending when running DATE GL42
// GL42 interact very badly with sw blending. GL42 uses the primitiveID to find the primitive
// that write the bad alpha value. Sw blending will force the draw to run primitive by primitive
// (therefore primitiveID will be constant to 1)
2015-07-03 20:34:52 +02:00
Gregory Hainaut 0c12f232ca gsdx-ogl: don't write depth in first step of DATE 42
Fix shadows in Fifa
2015-07-02 21:08:47 +02:00
Gregory Hainaut be1403cdc2 gsdx-ogl: support texture shuffling on !FST
Mostly fix "Finding Nemo"

It remains a shadows issue when you enable accurate_fbmask and depth
2015-07-01 09:36:54 +02:00
Gregory Hainaut 839003467e gsdx-ogl: add support of partial frame buffer masking
It might help to fix a bit the color on a couple of games

accurate_fbmask = 1

Code uses GL4.5 extensions. So far it seems the effect is ony used a couple
of time and often in non-overlapping primitive. Speed impact will likely remain small
2015-07-01 09:36:53 +02:00
Gregory Hainaut 82818dab3c gsdx-ogl: make some room in AlphaCoefficient variable
The idea will be to use the remaining int to store the FB mask
2015-07-01 09:36:53 +02:00
Gregory Hainaut 3b127f663b gsdx-tc: trace the texture format to detect texture shuffling
It fixes games that uses 16 bits RT (like snow engine games)
2015-07-01 09:36:53 +02:00
Gregory Hainaut 05c72980fc gsdx: avoid to detect PSMT8H as 16 bits 2015-07-01 09:30:20 +02:00
Gregory Hainaut 58ce7d4bb8 gsdx-ogl: emulate texture shuffle
GS doesn't supports texture shuffle/swizzle so it is emulated in a
complex way.

The idea is to read/write the 32 bits color format as a 16 bit format.
This way, RG (16 lsb bits) or BA (16 msb bits) can be read or written with
square texture that targets pixels 1-8 or pixels 8-16.
However shuffle is limited. For example you can copy the green channel
to either the alpha channel or another green channel.

Note: Partial masking of channel is not yet implemented

V2: improve logging
V3: better support of green channel in shader
V4: improve detection of destination (issue due to rounding)
2015-07-01 09:30:20 +02:00
Gregory Hainaut 35081f922a gsdx: GS kinds of support draw without framebuffer
Gow uses 24 bits buffer, so only color is updated but blending is configured as Cd
so it is a NOP

In this case, we don't lookup the target in the texture cache. It reduces the complexity
to handle depth which can be located at same address as RT

Note: please test DX renderer
2015-07-01 09:30:20 +02:00
Gregory Hainaut 8393ba56d6 gsdx-ogl: rework palette texture handling
Redirect the red channel to alpha channel for 8 bits texture.

It avoid special management in the shader
2015-06-24 19:50:09 +02:00
Gregory Hainaut c1d39a5f57 gsdx-ogl: drop UserHacks_DateGL4
Initial goal was to avoid slowdown in buggy driver
2015-06-16 09:57:45 +02:00
Gregory Hainaut 4e82073bfc gsdx-ogl: improve setting bilinear of palette
Instead to disable it, uses GS settings
2015-06-09 09:53:50 +02:00
Gregory Hainaut 2503d9698c gsdx-ogl: re enable bilinear fitlering on palette for paltex mode
However keep it disabled for StarOcean 3 use cases
2015-06-08 21:01:18 +02:00
Gregory Hainaut c3c29945b2 gsdx-ogl: only set a cst blend factor when Ad is used
Fix GT4 and potentially FFX
2015-06-07 16:10:59 +02:00
Gregory Hainaut 0518aaedc9 gsdx-ogl: unattach palette to avoid noise in debug 2015-06-07 12:39:15 +02:00
Gregory Hainaut c2e851b3a5 gsdx-ogl: don't use slow SW blend for nothing
When A == B, coeff is 0 so <= 1
When C == 2, just check the value of coeff

Barely faster in accurate_blend = 2
2015-06-04 18:18:47 +02:00
Gregory Hainaut 3dd3bf6e2b gsdx-ogl: new hidden option accurate_blend = 2
Debug option to emulate all blending draw call in the shader

Of course it is slow but it is very accurate
2015-06-03 09:56:07 +02:00
Gregory Hainaut 995ae51bf4 gsdx-ogl: disable a log message
Too verbose and code was fixed anyway
2015-06-02 09:28:35 +02:00
Gregory Hainaut 2cbde89084 Merge pull request #555 from PCSX2/real-fb-format
GSdx: better framebuffer format
2015-06-01 11:48:07 +02:00
Gregory Hainaut d301848848 gsdx-ogl: fix paltex on opengl
RT uses as palette must use the alpha channel

Palette texture uses the red channel
2015-05-30 19:01:09 +02:00
Gregory Hainaut 419dfe0544 glsl: redo color/alpha management correction
Please test it!

GS supports 3 formats for the output:

32 bits: normal case
=> no change

24 bits: like 32 bits but without alpha channel
=> mask alpha channel (ie don't write it anymore)
=> Always uses 1.0f as blending coefficient

16 bits: RGB5A1, emulated by a 32 bits openGL texture. I think it will be more correct to use
a real 16 bits GL texture. Unfortunately it would cost several (slow) target conversions.
Anyway as a current solution
=>  apply a mask of 0xF8 on color when SW blending is used (improve Castlevania shadow)
unfortunately normal blending mode still uses the full range of colors!

This commit also corrects a couple of blending factor. 128/255 is equivalent to 1.0f in PS2, whereas GPU uses 1.0f. So the blending factor must be 255/128 instead of 2

Note: disable CRC hack and enable accurate_colclip to see Castlevania shadow ^^
(issue #380).
Note2: SW renderer is darker on Castlevania. I don't know why maybe linked to the 16 bits format poorly emulated
2015-05-26 16:49:43 +02:00
Gregory Hainaut 9ee3a173d0 gsdx-ogl: use a local ALPHA register
It would allow to easy tune the parameter to support 24 bits format
2015-05-26 15:36:48 +02:00
Gregory Hainaut d31bd97d59 gsdx-ogl: add a variable to select FB output
Either 32bits/24bits/16bits
2015-05-26 14:59:07 +02:00
Gregory Hainaut 3be5a6036b gsdx-ogl: remove assertion + debug message
opengl use a different object to compute the vertex count
2015-05-26 11:03:59 +02:00
Gregory Hainaut 580d177951 gsdx: improve commit 11708486d8
This time linear filtering is disabled only for the bad draw call
(RT used as a palette texture).
2015-05-25 09:46:51 +02:00
Gregory Hainaut b0af54d33e gsdx-ogl: better support of palette
The purpose of the code is to support alpha channel
of RT uses as an index for a palette texure.

I'm afraid that code will likely break pure palette texture. Only used
if paltex is enabled

It fixes missing shadow in Star Ocean 3 (issue #374) in Native resolution
with filter = 0  (no filtering) or = 2 (normal fitering)

Rendering explanation:
The game emulates a stencil buffer with the alpha channel

The alpha channel of the RT can contains a palette texture index (format 4HH)
The idea is to have a gradient of value in the palette (16/32/48/...).
This way you can implement a +16/-16 and even wrap the alpha value every time
you hit the pixel.

Bilinear filtering breaks the rendering because it interpolates between counts
so you doesn't have the exact count

Upscaling breaks the rendering because the RT is reused as an input texture. It means
that we need to scale it down which again create some interpolations.
2015-05-24 18:07:16 +02:00
Gregory Hainaut 7f614401a6 gsdx-ogl: set accurate_blend as default
Rendering is better and I think speed impact remains small.
2015-05-23 15:10:04 +02:00
Gregory Hainaut b884e0c0c0 gsdx-debug: improve debug experience
* Dump context before the increase of s_n
  => aligned with the global call number
* Don't print colclip not supported when it is optimized away
* dettach the input texture when it is useless
  => avoid to show a wrong texture in the debugger
2015-05-23 12:23:05 +02:00
Gregory Hainaut 183af4ece6 gsdx-ogl: don't enable SW blending when there is no blending
Mostly to avoid useless message
2015-05-20 09:36:01 +02:00
Gregory Hainaut 8d3e3e6c5b gsdx-ogl: more blend rework to support accurate_colclip
So far few blending equations are implemented in PS. It is only
for test the behavior on GoW
2015-05-20 08:00:40 +02:00
Gregory Hainaut c5341a2711 gsdx-ogl: update blending management
This way it will allow to implement all blendings operartion in FS.

Of course it will be slow, but it would be nice for debug and quickly check
game error rendering.
2015-05-20 00:12:52 +02:00
Gregory Hainaut 7979dec5b0 gsdx-ogl: optimize colclip 0
Currently colclip uses 2 passes to wrap the output of blending unit

However some blending mode are only a plain copy (of 0 or Cs or Cd).
So no overflow of [0:255], no need to wrap it

Note: I saw those cases in GoW.
2015-05-17 13:05:08 +02:00
Gregory Hainaut 6ced837360 gsdx-debug: add some PERF info in trace
The idea is to detect easily effect that are known to be slow on opengl
2015-05-17 13:05:08 +02:00
Gregory Hainaut b1ea081fc3 gsdx-debug: improve tracing interface
Basically move the format and c_str() in the macro
2015-05-17 13:05:08 +02:00
Gregory Hainaut c567198967 gsdx-ogl: replaced draw_count by s_n
This way error message is aligned with everything else :)

It is not perfect (normally it must be done in start of main draw)
2015-05-16 13:59:13 +02:00
Gregory Hainaut d870188d21 gsdx: sed/o/off/ 2015-05-15 20:40:09 +02:00
Gregory Hainaut 2e34d48e97 gsdx-ogl: add a virtual GetID method for texture
Much more readable
2015-05-12 17:41:41 +02:00
Gregory Hainaut e0012811ae gsdx-debug: more debug info in gl trace 2015-05-12 12:36:34 +02:00
Gregory Hainaut 921fa3bab8 gsdx-ogl/linux: drop fba option
It is useless as DX11
2015-05-11 13:57:26 +02:00
Gregory Hainaut 1523b9534f gsdx-debug: compact the code 2015-05-11 11:19:00 +02:00
Gregory Hainaut 5565544ba6 gsdx-ogl: DATE with texture barrier
Much faster for small batch that write the alpha value. Code can
be enabled with accurate_date option.

Here a summary of all DATE possibilities:
1/ no overlap of primitive
    => texture barrier (pro no setup of stencil and single draw)

2/ alpha written
    => small batch => texture barrier (primitive by primitive). Done in N-primitive draw calls.
    (based on GL_ARB_texture_barrier)

    => bigger batch => compute the first good primitive, slow but only 2 draw calls.
    (based on GL_ARB_shader_image_load_store)

    => Otherwise there is the UserHacks_AlphaStencil but it is a hack!

3/ alpha written
    => full setup of stencil ( 2 draw calls)
2015-05-09 19:54:01 +02:00
Gregory Hainaut f029e4763f gsdx-ogl: add the alpha value in a constant buffer 2015-05-09 15:02:34 +02:00
Gregory Hainaut 472608b879 gsdx-ogl: add accurate blending implementation 2015-05-09 15:02:13 +02:00
Gregory Hainaut 1e8aea033c gsdx-ogl: add a wrapper to control the drawing of primitives
No barrier => draw all primitives
Barrier but without overlap => draw all primitives
Barrier with overlap => draw primitive by primitve

It will ease the implementation of accurate blending and why not date too
2015-05-09 15:01:39 +02:00
Gregory Hainaut 380e420cdd gsdx-ogl: add a blend parameter to shader 2015-05-08 20:28:50 +02:00
Gregory Hainaut 8e1db43431 gsdx-debug: debug stuff 2015-05-08 19:28:17 +02:00
Gregory Hainaut 3c66da4d82 gsdx: remove old code 2015-05-08 19:28:16 +02:00
Gregory Hainaut 1addae1993 gsdx-ogl: don't use extra shader for sprite hack
Atst == 2 && sprite_hack is equivalent to Atst == 1

It frees a bit in the shader selector, and reduces shader combinations.
2015-05-08 19:28:16 +02:00
Gregory Hainaut b490085214 gsdx-ogl: add 2 options in the GUI to ease testing
accurate_date => use an emulated stencil to compute destination alpha
    test (could be slow)

accurate_blend => do nothing (future)
2015-05-08 19:28:16 +02:00
Gregory Hainaut d6448183d7 gsdx-ogl-debug: insert error message in GL stream
This way, you can see your message in the GL debugger
2015-05-08 00:16:31 +02:00
Gregory Hainaut cc4713d379 gsdx-debug: extend ogl debug capabilities
Group opengl calls into a nice name.

Apitrace shows them in a tree format that support folding. Previously it
was a long flat list (10K-40K of lines by frame)

I align the call number with the internal s_n variable. This way it is
easy to map GSdx dump output with the GL debugger :)
2015-05-06 19:09:13 +02:00
Gregory Hainaut 530e4ce776 gsdx-ogl: drop hack that rescale primitive (to avoid upscale glitch)
Rendering is bad. It renders sprites at native resolution.
2015-05-06 19:09:13 +02:00
Gregory Hainaut 6d65867b26 gsdx-ogl: comment point_sampler
It is not enabled on the shader so I will reuse the bit
2015-05-06 19:09:12 +02:00
Gregory Hainaut 3fcac07120 gsdx-ogl: print some message when blending is not properly supported 2015-05-06 19:09:12 +02:00
Gregory Hainaut 8032e2c369 gsdx-ogl: separate color mask state from the blending state
Unlike DX they're uncorrelated.
2015-05-05 10:26:01 +02:00
Gregory Hainaut 73d04e33e9 gsdx ogl: clean various comment and old code 2015-05-01 20:04:23 +02:00
Gregory Hainaut 335695bd0e purge GLES from GSdx !
mobile will use vulkan (or any new API) anyway
2015-05-01 20:02:17 +02:00
Gregory Hainaut 7367b22e03 gsdx-ogl: reduce toggling of scissor state for DATE 2015-05-01 00:59:49 +02:00
Gregory Hainaut baf84b98c4 gsdx-ogl: add override_GL_ARB_texture_barrier option
To ease regression test.
2015-04-25 09:59:25 +02:00
Gregory Hainaut c207632e49 gsdx-ogl: improve date performance for GL45
If there is no overlap, it is allowed to directly read from the render target.

On SotC testcase with 6x scaling: 30fps -> 40fps

Note: it requires GL_ARB_texture_barrier extension so be sure to have a recent driver

Note2: it requires a lots of testing too

Open question: in case of complex date (written alpha)
Will it be faster to split the draw call into multiple call with no
primitive overlap
2015-04-24 21:12:33 +02:00
Gregory Hainaut 795ae50ecd gsdx-ogl: fix the recently broken advance date feature
Now it is really working with a 2 stages shaders but it is still slow.
2015-04-24 20:13:38 +02:00
Gregory Hainaut 672e3f9533 gsdx-ogl: use DSA for texture management
Yeah code is much nicer :)
2015-04-24 19:34:17 +02:00
Gregory Hainaut 6e386df535 gsdx-ogl: avoid to clean fully texture in DATE
Is is useless and it has a small impact on performance for big upscale
2015-04-24 18:32:08 +02:00
Gregory Hainaut bd6ea17bdc gsdx-ogl: speed improvement for DATE
DATE is implemented in 2 ways.
1/ with stencil
2/ purely in FS (sw)

I kept method 1 to reduce the work on method 2. It sucks for performance.
So it would be either 1 or 2.

Note: DATE has a big impact on higher upscaling
Note2: you can disable the 2nd method with this configuration parameter

override_GL_ARB_shader_image_load_store = 0
2015-04-22 00:36:34 +02:00
Gregory Hainaut 31f8c065db gsdx-ogl: implement a new hack UserHacks_UnscaleSprite for opengl
UserHacks_UnscaleSprite = 1 will unscale flat sprites
UserHacks_UnscaleSprite = 2 will unscale all  sprites (don't work well so far)

The idea of the hack is to redo the interpolation of texture coordinate
based on the non-upscaled pixel position.

It avoids various glitches but sprites aren't upscaled anymore (so no
more anti-aliasing, potentially a coefficient can be added).
2015-04-20 07:18:08 +02:00
Gregory Hainaut 6124eb844e gsdx-ogl: only compile useful VS
logz is a constant
wildhack is only compatbile with TME/FST

Compilation goes down from 64 to 20 vertex shaders.
2015-04-20 07:17:58 +02:00
Gregory Hainaut 65a84b9f64 gsdx hack: drop previous filtering hack
superseeded by UserHacks_round_sprite_offset
2015-04-15 09:26:01 +02:00
Gregory Hainaut 418f2e69a8 gsdx-ogl: implement the wildhack on the GPU
Likely much faster for opengl and much easier to implement

Note: hopefully UserHacks_round_sprite_offset will replace it
2015-04-13 22:14:36 +02:00
Gregory Hainaut 98d8ad7484 gsdx: new anti-glitches upscaling hack: UserHacks_round_sprite_offset
It is replacement of the previous hack (UserHacks_stretch_sprite). Don't enable both in the same time!

The idea of the hack is to move the sprite to the pixel boundary. It
avoids most of rounding issue. It also rescales verticaly the sprite (avoid horizontal line on ace combat).
I don't like this rescaling maybe we can limit it to only 1 pixels.

On my limited testcase, results are much better with any upscaling factor.

I still have a bad line in Kingdom heart. If you have issue with others
game please provide us a GS dump.
2015-04-07 19:18:24 +02:00
Gregory Hainaut 5269e54f02 gsdx: tune previous hack
Only disable bilinear on the sprite that were forced by the user.
If the PS2 requires a bilinear filtering, there is likely a big enough texture
2015-04-03 20:09:02 +02:00
Gregory Hainaut e1a5736583 gsdx: anti-upscale-glitch hack UserHacks_SkipDraw
2x upscaling is pixel perfects. Bigger upscaling is better but not yet perfect

Feedbacks are welcomes (note it doesn't solve all upscaling issue, only wrong texture sampling)

For the history:
If you have a texture of [0;16[ texels and draws a primitive [0;16[

The formulae to sample last pixels of texture is
0.5 + (16*s-1)/(16*s) * 16
Native (s==1): 15.5 (good)
2x     (s==2): 16 (bad, outside of the texture)
4x     (s==4): 16.25 (bad, really outside of the texure))
2015-04-03 18:33:05 +02:00
Gregory Hainaut ccc1137e12 gsdx-ogl: merge the two vertex buffer format
* Only a single VAO
    => Format is set once
    => Only a single bind at startup
    => GSVertexBufferStateOGL is nearly useless
    => barely faster but better than nothing :)
2014-10-02 20:44:22 +02:00
Gregory Hainaut 10c7be8c50 gsdx-ogl: Use 32B strides for all VBO 2014-10-02 20:44:22 +02:00
Gregory Hainaut 0d45e6d70e gsdx-ogl: avoid to send constant to the GPU
It was a waste of bandwith
2014-04-06 10:44:40 +02:00
Gregory Hainaut b020bd76c6 gsdx-ogl: restore gles build
Add the --gles build option to the linux main script
Ifdef all gl code not supported on gles3 (note some will be reenabled for gles3.1)

Note: it probably doesn't run anymore. My Nvidia driver doesn't support
yet egl/gles so I can't test it. Feel free to contribute.
2014-03-29 11:55:02 +01:00
gregory.hainaut 1922598f0f the "don't let users shoot themself in the foot" commit
UserHacks_AlphaStencil will take precedence on override_GL_ARB_shader_image_load_store


git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5891 96395faa-99c1-11dd-bbfe-3dabce05a288
2014-02-08 11:22:51 +00:00
gregory.hainaut 3d7b86660a gsdx ogl: Enable by default advance DATE effect on nvidia (GL4 GPU)
Note: DATE is used for shadows (persona 3) and others effect

You can disable it when you disable gl image extension (override_GL_ARB_shader_image_load_store = 0)
You can also enable it on AMD too (set it to 1 this time) but no guarantee.

If you feel the extension is too slow, you can try disable some gl barrier (aka "damn the torpedo full steam ahead!").
It can be done with the option UserHacks_DateGL4 = 1. Otherwise just disable the extension.

Note: don't enable UserHacks_AlphaStencil in the same time. GL_ARB_shader_image_load_store is an alternative implementation.
Enabling both in the same time will lead to undefined (well surely wrong) behavior.




git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5890 96395faa-99c1-11dd-bbfe-3dabce05a288
2014-02-08 11:05:26 +00:00
gregory.hainaut 6cf75e086b gsdx ogl: Only enable advance DATE when user request it
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5864 96395faa-99c1-11dd-bbfe-3dabce05a288
2014-02-01 18:46:18 +00:00
gregory.hainaut e80b002929 gsdx ogl: Flush various pending work
* try to use more subroutine on VS&PS, unfortunately hit a driver crash!
* Call Attach/DetachContext through GSDevice so I can unmap currently mapped buffer
* Implement glsl part of GL_ARB_bindless texture, again hit another driver crash!
* various fix of GL_ARB_buffer_storage. Basic benchmark show only improvement on 'cold' case, I guess it will improve smoothness
* try to fix GL_clear_texture, no success so far. It seem the extension is limited to basic texture (aka no depth/stencil)



git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5752 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-10-24 20:54:27 +00:00
gregory.hainaut e01c6cd9ce gsdx ogl: the proof of concept commit
* GL_ARB_shader_subroutine for perf
fix for nvidia => add missing shader declaration. Nvidia got +4fps on colin3 :) 

For the moment only 2 PS parameters are supported. Code need to be extended to support others games that often
switch shader program (like xenosaga).
require GL4 class hardware and the option override_GL_ARB_shader_subroutine = 1
Note: strangely on AMD linux it is slower!

* GL_ARB_shader_image_load_store for accuraccy (Date)
Use a signed integer texture and reenable color buffer writing

Current status: Amagami_transparency.gs & P3_battle_shadows.gs are now working on Nvidia with a small perf impact.
Current implementation detail:
1/ setup the standard stencil as before
2/ on remaining pixel, draw once to compute first primitive that will write a fail alpha value.
3/ final draw based on primitive id of step 2

Note: I think we would get a bad behavior if depth test&mask are enabled on step 2/3
Note2: on my limited testcase the perf impact was on CPU. It would be possible to merge step1&2 to nullifying it (could 
even be faster actually), however it would require more GPU power.

Again require GL4 class hardware. And the option UserHacks_DateGL4 = 1



git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5725 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-08-28 08:44:16 +00:00
gregory.hainaut 07605941ef gsdx ogl:
* some preliminary work to test/benchmark bindless texture in the future (glsl was not yet updated)
Bindless texture allow to get a GPU texture pointer and then set it directly
to the shader as a basic uniform.
=> no more texture unit selection/validation
=> no more texture validation neither texture hash lookup

3rdparty: update gl header to the latest gl4.4


git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5720 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-08-17 08:57:52 +00:00
gregory.hainaut b4084047be gsdx ogl: Used a basic flat interpolation for color interpolation (line & tri primitives)
Card that support gs:
remain only a gs to generate sprite from a line. Even dummy gs are costly for the GPU.

Card that don't support gs:
remove useless copy of color for line and triangle primitives

Note for dx: opengl 3.2 (maybe not gles) supports both flat interpolation
convention (GL_FIRST_VERTEX_CONVENTION or GL_LAST_VERTEX_CONVENTION).  It might
be possible to shuffle vertex index to put the last vertex in first position.

- buff[0] = head + 0;
- buff[1] = head + 1;
- buff[2] = head + 2;
+ buff[0] = head + 2;
+ buff[1] = head + 1;
+ buff[2] = head + 0;



git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5718 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-08-14 10:18:38 +00:00
gregory.hainaut 0f603a98d5 gsdx ogl: Test the ARB_shader_subroutine GL4.0 extension
The idea was to replace shader program swith by pointer function calls inside
shaders.  At least parameters that are often changed between draw call. So far
I only ported atst and colclip. Unfortunately code is "slower" (on GSdx standalone).
For the moment keep the code but disabled.

If I understand well the validation of program is done in the "driver thread"
but the additional call are done in the overloaded MTGS thread. Apitrace
profiling shows faster GPU draw calls. Another possibility is that the driver still
need to validate the draw call because of others state change.

Here some stats on colin3 (90 frames):
without subroutine: UseProgram 125246
with subroutine: UseProgram 2906, subroutine 125945 => 3605 extra calls overhead (not
all parameters are ported to subroutine)



git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5715 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-08-10 19:43:59 +00:00
gregory.hainaut c9755361ec gsdx ogl: remove useless ps.rt
Note: I think we can do the same on DX11

Perf wise: on colin mcrae 3 it reduces shader prog setup from 3005 to 2086 each frames. It saves 2 ms of CPU processing (27->29fps)



git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5714 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-08-06 06:48:44 +00:00
gregory.hainaut a46b489a24 gsdx ogl: various minor optimization.
* move most of gl states into a separate namespace. Extend it to depth/stencil/blend micro state
=> save 10,000 opengl call by frame for colin mcrae 3
* Only setup blend state of first drawbuffer
* Don't request anymore a debug context on dev/release build



git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5713 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-08-05 20:25:25 +00:00
gregory.hainaut 64f783410e gsdx ogl:
* preliminary work for GL4.4 extensions (ARB_clear_texture & ARB_multi_bind). Disabled until I got a 4.4 driver
Note: I plan also to use ARB_buffer_storage
* compute texture gl option in the constructor (avoid a couple of swith case)
* redo texture unit management. Unit 0-2 for shaders, Unit 3 for texture operations. MultiBind will allow to bind 
shader input without disturbing texture binding points.


git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5711 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-08-02 16:38:12 +00:00
gregory.hainaut e394de86dc gsdx ogl:
* add a non-working hack: UserHacks_DateGL4, goal was to replace UserHacks_AlphaStencil
 + Detection of good/bad samples is based on primitive ID variable. However I'm not sure
 the behavior is always the same between draw call...Anyway let's keep a copy of the current
 work
* Dump integer texture into text csv
* add gl4.2 ARB_shader_image_load_store extension (needed by UserHacks_DateGL4)


git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5707 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-07-28 14:40:43 +00:00
gregory.hainaut d3e1e4da0d gsdx ogl:
* allow to switch renderer with F9
* skip first frame in stat of the replayer
* drop msaa. Fxaa and internal resolution will do the job
* move texture attachment from texture object into device object (allow to keep sanely the state)
* split the write buffer and attachment setup
* completely split sampler and texture input setup
* redo GSDeviceOGL::CopyOffscreen to avoid an extra copy.



git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5704 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-07-19 19:25:50 +00:00
gregory.hainaut 916a091f8a gsdx ogl: GLes
* use gles header file, disable opengl code (mainly GLX, ARB_sso, geometry shader)
* Define properly the function pointer, GLES use basic linking whereas GL must get the symbol dynamically
* cmake: properly search and set libglesv2.so
* don't use dual source blending => HW renderer work (only miss unimportant FBA)



git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5701 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-07-13 11:39:45 +00:00
gregory.hainaut a764950468 gsdx ogl:
* replace vertex interface with block interface. It avoid to depends on the ARB_sso extension.
* disable geometry shader on Nvidia & Linux. Slower but better than a black screen !
* default logz to 1, avoid some glitches.


git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5699 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-07-11 17:08:42 +00:00
gregory.hainaut 3c7167be50 clean (some) gdb warning: round 1
* use svnrev.h on linux too
* replace sprintf_s with snprintf (hope it still compile on Windows)
* init integer with 0 instead of NULL
* various int -> u32/uint32/uint on for loop index
* remove a couple of unused variable
* init few variable
* disable unused warning results


git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5683 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-06-28 10:43:50 +00:00
gregory.hainaut ca1edbf2cb gsdx ogl:
* Separate state and shader compilation into separate function
* replace various hash_map by basic array
* Compact VertexScale and offset into a single vec4
* add the new option "ogl_vertex_subdata": subdata is faster on FGLRX, test are welcome on Nvidia drivers
    0 => use map/unmap
    1 => use subdata

replay: add "linux_replay" option and compute some nice stat (mean, standard deviation)

cmake: recreate shader header at build time


git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5682 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-06-26 20:09:07 +00:00
gregory.hainaut 19961175c9 gsdx-ogl-wnd: port (minor) renderer update of r5649 to opengl
Now the brach is ready to be merged :)

Dears Window users. If you can test that:
1/ still compile
2/ still running on DX
3/ can run with opengl


git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5663 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-06-15 09:24:13 +00:00
gregory.hainaut 382258ec03 gsdx-ogl-wnd:
* properly set the stencil write mask
* Multiply Y coodinates by -1. Let's hope it fixes shadows gliches.


git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5656 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-06-12 20:57:16 +00:00
gregory.hainaut 58699923e0 gsdx-ogl-wnd:
* factorize sample object creation
* remember frame buffer attachment state
* Use a basic context on EGL. Allow to use Mesa 9.1 on AMD GPU.
* precompile vertex and geometry shader to avoid benchmark polution on replay
* Try harder to detect FGLRX driver on window
* various clean

Remain to fix the coordinate system for upscaling



git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5651 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-06-09 16:41:58 +00:00
gregory.hainaut 75418aba43 gsdx-ogl-wnd:
* Emulate Geometry Shader from the CPU.
* add some option to override opengl extension detection
* redo shader interface (again) to compile on the free driver

SW renderer is now working on the free driver.

To test it on your linux box use this cmake option -DEGL_API=TRUE
Note: (need opengl 3.0) I test mesa 9.2 git



git-svn-id: http://pcsx2.googlecode.com/svn/branches/gsdx-ogl-wnd@5646 96395faa-99c1-11dd-bbfe-3dabce05a288
2013-05-27 16:53:38 +00:00