This way ogl_vertex_storage must be safer to activate
And it brings a nice performance boost (game with lots of primitives and
reasonable upscaling)
SotC testcase 4x: 61fps => 78fps
If there is no overlap, it is allowed to directly read from the render target.
On SotC testcase with 6x scaling: 30fps -> 40fps
Note: it requires GL_ARB_texture_barrier extension so be sure to have a recent driver
Note2: it requires a lots of testing too
Open question: in case of complex date (written alpha)
Will it be faster to split the draw call into multiple call with no
primitive overlap
ogl_texture_storage 1 creates texture corruption.
Advance date is too slow, code need to be updated (properly) to uses 2 passes only not 3
Maybe one could be enough (sometimes)
DATE is implemented in 2 ways.
1/ with stencil
2/ purely in FS (sw)
I kept method 1 to reduce the work on method 2. It sucks for performance.
So it would be either 1 or 2.
Note: DATE has a big impact on higher upscaling
Note2: you can disable the 2nd method with this configuration parameter
override_GL_ARB_shader_image_load_store = 0
Note yet enabled because I'm afraid of data corruption but feel free to test it
The option:
ogl_vertex_storage = 1
Performance note (warm cache+gs replay on colin3)
60 fps -> 76 fps
UserHacks_UnscaleSprite = 1 will unscale flat sprites
UserHacks_UnscaleSprite = 2 will unscale all sprites (don't work well so far)
The idea of the hack is to redo the interpolation of texture coordinate
based on the non-upscaled pixel position.
It avoids various glitches but sprites aren't upscaled anymore (so no
more anti-aliasing, potentially a coefficient can be added).
* separate VS/GS and FS
* separate subroutine part of the FS
It already complex enough without subroutine stuff. Besides I'm not sure
we will keep subroutine on the future.
If someone has a more elegant solution, feel free to share it
spin_thread = 0
spin_thread = 1 // the faster but GS thread will never stop, very bad for laptop
It is faster on linux, it requires less code, and it is "portable"
It requires boost (only hpp files) + MSVC 2013 (for atomic) (seem doable by 2012 too)
Actually there are several queues that either use spinlock or full sleep
* Use tooltip for hack
* Update string of previous hack
* Remove unused hack
Note: hack_sprite_check requires 3 states (and potentially others hack too) but
I don't know how to do it. It likely requires a "scale button"
It works as bad as a "clever" implementation.
It seems to be enough for games such as venus/taisho-monoke/FFX
Note: it might creates glitches. Code will never be nice, so it is just
a trade-off
-1 is annoying because minimum value is 0. Instead to add more logic,
let's try to use 0 which seems to be good enough (fix regressions on DQ8/AT)
Unfortunately it causes a mini regression on taisho-mononoke. Rotation of sprite is done with 2 triangles.
Potentially previous value wrongly recover this section.
Tekken is broken with setting UserHacks_round_sprite_offset = 2 use 1 instead.
=> game use upscaling internaly so my rounding of coordinate break everything
Yakuza uses float coordinate but this hack only correct the integer coordinate
=> the solution will be to add a dedicated hack for this game.
Colin3 got a regression but I don't know when...
PS2 uses a -0.5 offset before sampling so texels must be rounded to half-pixel boundary
If fixes glitches on Venus and taisho-mononoke
Note: I didn't fixed yet texture render in reverse because I don't have any test for it.
UserHacks_round_sprite_offset = 1 <= enable correction of flat sprites
UserHacks_round_sprite_offset = 2 <= enable correction of all sprites (better on a couple of dump but not sure of the consequence)
I completely redo the algorithm. This time I do the projection and
interpolation of the 2 extrem vertex. This way I can compute the min/max
valid texture coordinate.
It gives stronger guarantee that texture sampling will be done inside the texture.
However it might have a performance impact, likely reasonable because it
is limited to sprite vertex.
A big thanks you to all people that provide me GS dump and test reports.
It is replacement of the previous hack (UserHacks_stretch_sprite). Don't enable both in the same time!
The idea of the hack is to move the sprite to the pixel boundary. It
avoids most of rounding issue. It also rescales verticaly the sprite (avoid horizontal line on ace combat).
I don't like this rescaling maybe we can limit it to only 1 pixels.
On my limited testcase, results are much better with any upscaling factor.
I still have a bad line in Kingdom heart. If you have issue with others
game please provide us a GS dump.
2x upscaling is pixel perfects. Bigger upscaling is better but not yet perfect
Feedbacks are welcomes (note it doesn't solve all upscaling issue, only wrong texture sampling)
For the history:
If you have a texture of [0;16[ texels and draws a primitive [0;16[
The formulae to sample last pixels of texture is
0.5 + (16*s-1)/(16*s) * 16
Native (s==1): 15.5 (good)
2x (s==2): 16 (bad, outside of the texture)
4x (s==4): 16.25 (bad, really outside of the texure))
+ This is not yet perfect. Really, this standard seems like a load of crap to me in fact...
At least it works now. Should test again when gcc 5 & new c++ libs gets out.. Until then, it will do.
On Linux, CPUs with AVX2 instruction sets that have TSX disabled (by
microcode update or otherwise) fail to build GSdx. The __RTM__ macro is
undefined, with leads to the TSX RTM instruction set (_xbegin, _xend,
_xabort, etc.) being unavailable.
Modify the preprocessor check so that the RTM instructions are only used
if available.
It conflicts with the global definition
I don't remember why this option was set on GSdx. Potentially it could be dropped (or fixed correctly)
Anyway, it will help to enable Address Sanitizer on Linux Build
+ Implement threads/mutex using std::thread/std::mutex/std::condition_variable
+ Old implementation can be used by commenting out the "define _STD_THREAD_ in GSThread.h
+ Commit should be much cleaner than previous one.
Note: of course it requires a glsl shader ;)
On windows, you can set the path on the ini file. Here an example with linux path:
shaderfx_conf = /home/gregory/playstation/emulateur/pcsx2_merge/bin/GSdx_FX_Settings.ini
shaderfx_glsl = /home/gregory/playstation/emulateur/pcsx2_merge/bin/shader.fx
I need to check carefully the consequence of ABI change. So far wx is very unhappy!
Fatal Error: Mismatch between the program and library build versions detected.
The library used 2.8 (no debug,Unicode,compiler with C++ ABI 1002,wx containers,compatible with 2.6),
and your program used 2.8 (no debug,Unicode,compiler with C++ ABI 1006,wx containers,compatible with 2.6).
Copy the full line into the pbo. Dma will only take GL_UNPACK_ROW_LENGTH
- increase memcpy size by 2 in the pbo
+ single memcpy will be faster and can use sse
Enable buffer_storage extension:
* GL_CLIENT_STORAGE_BIT was required (it is the duty of TexSubImage to copy data into the GPU mem)
* Enable the extension by default
Null is equivalent to a clear to 0.
Note: Code is not yet used because both stencil and depth are cleared.
Future note: stencil can potentially be replaced by load_store_image
* Don't use the stack
* Don't compare MinMax parameter (depends of others)
* Don't store not-compared parameter in the cache (HalfTexel/MinMax)
+0.3fps/46fps (well better than nothing)
* Only a single VAO
=> Format is set once
=> Only a single bind at startup
=> GSVertexBufferStateOGL is nearly useless
=> barely faster but better than nothing :)
A couple of fallbacks were introduced for the Mesa driver that only support 3.0
DSA will require a recent Mesa which already support GL3.3
Require at least SandyBridge for Intel GPU
* use DebugOutputToFile as a callback of gl error. Add a breakpoint to
find the culprit GL call
* use string instead of char[n]
Note: CheckDebugLog is potentially useless now
* GL_ARB_clip_control: reduce z fighting
* GL_ARB_clear_texture: no real speed gain (but improve code quality)
* GL_ARB_bindless_texture: +1fps (if you're CPU limited)
* require a 3.1 context
* unattach texture of the fbo when they're not used
(avoid to have a texture and depth_stencil with different size)
Note: except minor shader bug it works on Nvidia 340.23.01
This doesn't update the file to the latest version from mingw32 since this is already a custom header stripped from mingw32.
This also fixes the functions for x86_64(verified that it still works for both architectures) and also updates the version inside of GSdx.
Also removes a comment in the GSdx header saying that these functions are broken since they no longer are.
deassert gsopen_done before delete the gs object. Compiler and CPU will
reordonate the store (or not because it is a global variable). Probably not
perfect but better.
Got a crash on GSKeyEvent which I guess wasn't thread safe. So let's ensure
gs is open.
There is no such thing as MGS3 Substance only Subsistence. Substance is MGS2 :P
Also this new CRC is US Disc 2 of Subsistence. The blue stripes should be fixed now I guess.
1/ initReadFifo will be first called on the GS thread
(openGL can only be done on the GS thread)
2/ readFifo will be called on the EE thread. It is not safe too access eeMem from GS
because of memory is virtual
Fix "recent" regression (crash) on Kingdom heart and others game too.
v2: add a len check on GSState::InitReadFIFO
* Fix to create a 3.0 GLES context
* set a default precision to fix most of shader compilation issue
* Crash later because of GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT
=> need to test opensource driver
Remains 3 files that I don't know the source
pcsx2/windows/DwmSetup.cpp: *No copyright* UNKNOWN
pcsx2/windows/SamplProf.cpp: *No copyright* UNKNOWN
pcsx2/windows/VCprojects/IopSif.cpp: *No copyright* UNKNOWN
Remains 1 files in common that I don't know the source
common/include/comptr.h: *No copyright* UNKNOWN
Remains too much files in plugins that I don't know the source :(
Add the --gles build option to the linux main script
Ifdef all gl code not supported on gles3 (note some will be reenabled for gles3.1)
Note: it probably doesn't run anymore. My Nvidia driver doesn't support
yet egl/gles so I can't test it. Feel free to contribute.
A bit slower. Maybe because SubData does the copy in the driver thread. My memcpy is done on
the main thread. I'm not sure it would worth an extra thread to copy vertex data to the GPU
Note: testers are welcome. You need to edit the ini file.
"ogl_vertex_storage=1" <= enable the extension
"ogl_vertex_storage=0" <= disable the extension
Again you need the support of GL_arb_buffer_storage (i.e. not catalyst)
Improve arb_buffer_storage implementation
Try harder to align the texture buffer
Strangely arb_buffer_storage is 3 times slower on my PC (nvidia)
Tester are welcome! Open the ini file
"ogl_texture_storage = 1" <= enable the extension
"ogl_texture_storage = 0" <= disable the extension
Note: you need an opengl 4.4 driver or one that support arb_buffer_storage (i.e. not catalyst)
This hack was used because GSReadFifo was called from the EE thread.
Previous commit move the call to the GSThread.
Hopefully avoid flushing the full GPU contex would improve openGL
performance (at least avoid some hiccups ;) )
Note: newer GSdx ogl won't be compatible with older PCSX2
Crashfix for a weird GIF_FLG_IMAGE2 situation in Wallace and Gromit Project Zoo. Needs further investigation.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5912 96395faa-99c1-11dd-bbfe-3dabce05a288
reduce_gl_requirement_for_free_driver => set it to 1 to only required a 3.0 context (you still required the good extensions)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5910 96395faa-99c1-11dd-bbfe-3dabce05a288
Reformatted fx files that were causing issues on certain text editors. They should now display correctly in those editors.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5897 96395faa-99c1-11dd-bbfe-3dabce05a288
Also removed the fallback recovery ps, and replaced the compile fail catch to a simple console print. Which I think is safer, and faster.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5894 96395faa-99c1-11dd-bbfe-3dabce05a288
UserHacks_AlphaStencil will take precedence on override_GL_ARB_shader_image_load_store
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5891 96395faa-99c1-11dd-bbfe-3dabce05a288
Note: DATE is used for shadows (persona 3) and others effect
You can disable it when you disable gl image extension (override_GL_ARB_shader_image_load_store = 0)
You can also enable it on AMD too (set it to 1 this time) but no guarantee.
If you feel the extension is too slow, you can try disable some gl barrier (aka "damn the torpedo full steam ahead!").
It can be done with the option UserHacks_DateGL4 = 1. Otherwise just disable the extension.
Note: don't enable UserHacks_AlphaStencil in the same time. GL_ARB_shader_image_load_store is an alternative implementation.
Enabling both in the same time will lead to undefined (well surely wrong) behavior.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5890 96395faa-99c1-11dd-bbfe-3dabce05a288
* properly detect gl nv depth extension
* Always show the hack on the gui. Add a new hack option for DATE (gl4.2) only
* Save the scan mode on linux too (f7)
* hopefully fix some crash on some drivers... (ensure aligment 256 bits alignment, and if not use std memcpy)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5888 96395faa-99c1-11dd-bbfe-3dabce05a288
Also disabled the gsdx AF options for the OGL renderer (because it's not implemented for that yet).
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5881 96395faa-99c1-11dd-bbfe-3dabce05a288
Fixes a small oversight on my part. Which was setting the maximum anisotropy to (0,1,2,3,4) respectively, instead of (1,2,4,8,16). So when you selected x16 for example, you were actually only getting x4 >.>
Corrected now, and you will be getting the full x16 when selected :)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5879 96395faa-99c1-11dd-bbfe-3dabce05a288
Adds anisotropic texture filtering (1x-16x) to the hardware settings. Enhances the visual quality of textures that are at oblique viewing angles.
Anisotropic filtering is automatically disabled if: 8-bit textures are enabled.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5878 96395faa-99c1-11dd-bbfe-3dabce05a288
As before: use the hotkey(f7) to switch between them. It will now save the chosen scanline effect, instead of always defaulting to disabled, on load.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5874 96395faa-99c1-11dd-bbfe-3dabce05a288
* gui refresh
+ Use some tab to reduce heigth for small screen
+ Add logz option
+ remove broken/experimental keyword. GSdx ogl is not too bad ;)
* autodetect GL_NV_depth_buffer_float
Linux tester you are welcome!
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5862 96395faa-99c1-11dd-bbfe-3dabce05a288
Best setting if you driver support GL_NV_depth_buffer_float => GL_NV_Depth = 1 & logz = 0
Otherwise => GL_NV_Depth = 0 & logz = 1
Explanation of the bug:
Dx z position ranges from 0.0f to 1.0f (FS ranges 0.0f to 1.0f)
GL z Position ranges from -1.0f to 1.0f (FS ranges 0.0f to 1.0f)
Why it sucks:
GS small depth value will be "mapped" to -1.0f. In others all small values will be 1.0f! Terrible lost
of accuraccy.
The GL_NV_depth_buffer_float extension allow to set the near plane as -1.0f.
So
"GL z Position ranges from -1.0f to 1.0f (FS ranges 0.0f to 1.0f)"
will become
"GL z Position ranges from -1.0f to 1.0f (FS ranges -1.0f to 1.0f)"
and therefore
"z posision [0.0f;1.0f] will map to FS [0.0f;1.0f]" as DX
Yes we just get back all precision lost previously :)
However you need hardware (intel?) and driver support (free driver?/gles?) :(
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5860 96395faa-99c1-11dd-bbfe-3dabce05a288
* use same path as game index db for cheats and cheats_ws
* install the new cheat zip file on cmake and debian installer
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5850 96395faa-99c1-11dd-bbfe-3dabce05a288
This fixes the crashes on F9 for me. We don't want to call GSgetTitleInfo2() when GSOpen() isn't finished yet.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5834 96395faa-99c1-11dd-bbfe-3dabce05a288
HTB123 on our forums came up with a hack that removes the broken main character shadow in hardware rendering
in SMT Nocturne.
This should work for all GSdx recognized versions / regions of the game (tested EU and NTSC-U).
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5819 96395faa-99c1-11dd-bbfe-3dabce05a288
* restore the old fxaa (Asmodeam will be integrated when I got time)
* port the recently added new scanline algo
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5818 96395faa-99c1-11dd-bbfe-3dabce05a288
Add another scanline algorithm, made by pseudonym. This one is pretty fancy, using a cosinus function for generating the dark lines.
It only works on unscaled output, so use software rendering or native resolution for hardware.
It's an effect that works best on 240p converted games (like the SNES titles in Mega Man Collection).
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5812 96395faa-99c1-11dd-bbfe-3dabce05a288
Finish up my scanlines attempt. Now the 2 old shaders are back in and all 3 cycle on F7.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5809 96395faa-99c1-11dd-bbfe-3dabce05a288
Slight adjustments to positions in the GUI also (OCD'd the spacing a little :P)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5796 96395faa-99c1-11dd-bbfe-3dabce05a288
- FXAA shader replaced by Asmodeans shader suite 'PCSX2 Fx 2.00 Revised'.
This is an entire enhancement suite with many configurable options like FXAA, texture sharpening
or lighting tweaks. Some of the effects are pretty advanced so kudos to him allowing it to be integrated with GSdx! :)
Note that there's no interface for the tweaks yet. Until those are done, the defaults will be used.
Release thread here: http://forums.pcsx2.net/Thread-Custom-Shaders-for-GSdx?pid=334766#pid334766
- Disabled an option from the widescreen patches for SH:Shattered Memories.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5793 96395faa-99c1-11dd-bbfe-3dabce05a288
* allow to control the gl dectection from the gui (+1 for the user-friendliness ;) )
* disable geometry shader by default on Intel driver
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5783 96395faa-99c1-11dd-bbfe-3dabce05a288
Added a check box and config code for the fxaa shader.
It's not currently hooked up since the shader setup might get replaced in a day or two.
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5777 96395faa-99c1-11dd-bbfe-3dabce05a288
Added a simple scanlines filter (2nd one activated by F7). No tweaking done yet.
It shows how to get scanlines that aren't affected by input resolution or scaling.
A good next step would be determining the exact number of dark lines to show :)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5770 96395faa-99c1-11dd-bbfe-3dabce05a288
Removed some missing headers from the vs2010 and vs2012 project files that were causing vs to always claim the projects were out of date.
Also removed some other entries for c/cpp files that were disabled but also missing (I did not search exhaustively).
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5763 96395faa-99c1-11dd-bbfe-3dabce05a288
(Did this through the google code file editing so if it's broken i apologise :P )
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5762 96395faa-99c1-11dd-bbfe-3dabce05a288
* try to use more subroutine on VS&PS, unfortunately hit a driver crash!
* Call Attach/DetachContext through GSDevice so I can unmap currently mapped buffer
* Implement glsl part of GL_ARB_bindless texture, again hit another driver crash!
* various fix of GL_ARB_buffer_storage. Basic benchmark show only improvement on 'cold' case, I guess it will improve smoothness
* try to fix GL_clear_texture, no success so far. It seem the extension is limited to basic texture (aka no depth/stencil)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5752 96395faa-99c1-11dd-bbfe-3dabce05a288
Performance note, it might be faster to replace the MODULO with an AND. Not sure on the impact
for the new time stretcher algo.
GSdx: fixed use-after-free (linux)
PCSX2:
* add a define to support address sanitizer (both rely on 0x20000000-0x3fffffff memory ranges..)
* sio_buffer out of bond (-1). Maybe we can move the flush in the 2 if previous branch. It would
avoid the extra test.
* wxGetEnv (linux) generates double free (maybe not thread safe). Cache the result so it doesn't crash
anymore when switching renderer
Comment/Review/Improvement are welcome :)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5727 96395faa-99c1-11dd-bbfe-3dabce05a288
* GL_ARB_shader_subroutine for perf
fix for nvidia => add missing shader declaration. Nvidia got +4fps on colin3 :)
For the moment only 2 PS parameters are supported. Code need to be extended to support others games that often
switch shader program (like xenosaga).
require GL4 class hardware and the option override_GL_ARB_shader_subroutine = 1
Note: strangely on AMD linux it is slower!
* GL_ARB_shader_image_load_store for accuraccy (Date)
Use a signed integer texture and reenable color buffer writing
Current status: Amagami_transparency.gs & P3_battle_shadows.gs are now working on Nvidia with a small perf impact.
Current implementation detail:
1/ setup the standard stencil as before
2/ on remaining pixel, draw once to compute first primitive that will write a fail alpha value.
3/ final draw based on primitive id of step 2
Note: I think we would get a bad behavior if depth test&mask are enabled on step 2/3
Note2: on my limited testcase the perf impact was on CPU. It would be possible to merge step1&2 to nullifying it (could
even be faster actually), however it would require more GPU power.
Again require GL4 class hardware. And the option UserHacks_DateGL4 = 1
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5725 96395faa-99c1-11dd-bbfe-3dabce05a288
zzogl: fix memory leak (fix#1431, #1432)
GSdx ogl: disable geometry shader on Nvidia/Windows (I will wait a 3rd implementation to find which one is correct)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5724 96395faa-99c1-11dd-bbfe-3dabce05a288
* redo most of the texture upload (PBO): colin3 benchmark: 32 fps now (vs 26 fps 2 weeks ago)
* use the cross vendor vsync extension on linux (previous wasn't supported by nvidia)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5721 96395faa-99c1-11dd-bbfe-3dabce05a288
* some preliminary work to test/benchmark bindless texture in the future (glsl was not yet updated)
Bindless texture allow to get a GPU texture pointer and then set it directly
to the shader as a basic uniform.
=> no more texture unit selection/validation
=> no more texture validation neither texture hash lookup
3rdparty: update gl header to the latest gl4.4
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5720 96395faa-99c1-11dd-bbfe-3dabce05a288
Card that support gs:
remain only a gs to generate sprite from a line. Even dummy gs are costly for the GPU.
Card that don't support gs:
remove useless copy of color for line and triangle primitives
Note for dx: opengl 3.2 (maybe not gles) supports both flat interpolation
convention (GL_FIRST_VERTEX_CONVENTION or GL_LAST_VERTEX_CONVENTION). It might
be possible to shuffle vertex index to put the last vertex in first position.
- buff[0] = head + 0;
- buff[1] = head + 1;
- buff[2] = head + 2;
+ buff[0] = head + 2;
+ buff[1] = head + 1;
+ buff[2] = head + 0;
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5718 96395faa-99c1-11dd-bbfe-3dabce05a288
The idea was to replace shader program swith by pointer function calls inside
shaders. At least parameters that are often changed between draw call. So far
I only ported atst and colclip. Unfortunately code is "slower" (on GSdx standalone).
For the moment keep the code but disabled.
If I understand well the validation of program is done in the "driver thread"
but the additional call are done in the overloaded MTGS thread. Apitrace
profiling shows faster GPU draw calls. Another possibility is that the driver still
need to validate the draw call because of others state change.
Here some stats on colin3 (90 frames):
without subroutine: UseProgram 125246
with subroutine: UseProgram 2906, subroutine 125945 => 3605 extra calls overhead (not
all parameters are ported to subroutine)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5715 96395faa-99c1-11dd-bbfe-3dabce05a288
Note: I think we can do the same on DX11
Perf wise: on colin mcrae 3 it reduces shader prog setup from 3005 to 2086 each frames. It saves 2 ms of CPU processing (27->29fps)
git-svn-id: http://pcsx2.googlecode.com/svn/trunk@5714 96395faa-99c1-11dd-bbfe-3dabce05a288