It's more portable.
Use _mm_shuffle_epi32 instead of _mm_movehl_ps - I think it avoids
data bypass delays going from integer to float domains on older
processors, and Agner's tables indicate that the instruction has the
same latency and occasionally has higher throughput (depending on cpu).
And switch the _mm_xor_si128 and _mm_unpacklo_epi8 around so the same
constant can be used for both C bias and alpha.
* Fast accurate DATE is always enabled, it was faster than standard DATE
* The less fast version is always enabled too. It is likely barely used
so perf impact will be small on few game that could hit this path.
Nice rendering has a higher priority
* The "slow" path will depends on the date option.
Note normally it isn't too slow (-10%) if GL_ARB_shader_image_load_store
is supported but AMD crimson is an epic fail.
Fixes games like Time Crisis 2/3, which use two half-width display
rectangles placed side by side in split screen mode by using different
DISPLAY.DX values.
Issue1: Depth buffer is wrongly invalidated only the first page is detected.
Issue2: First page seems to be partially written. Could be a GSdx transfer bug.
Anyway, invalidation only support a page granularity.
So here a quick workaround that will clear depth buffer in case of very small partial write.
Might worth to check regression on nocturne/digital saga
Also make the build noisier to prevent some build failures when download
speeds are slow from whatever repository.
Note: The 64-bit devel build currently doesn't benefit from ccache for
whatever reason.
[skip appveyor]
It would requires some texture dynamic width convert shaders.
So as a quick solution, let's add a new CRC hack.
For issue #1362 (granted the CRC is correct)
Faster :) Reduce further the cost of accurate date
The optimization will clear the stencil to 1. So all pixels will have a
single sample that pass both the depth & stencil test. No primitive
overlaps So the destination alpha test can be done directly in the
shader.
Game often uses date to allow a single pixel pass. If this
use case is detected, stencil buffer will be cleared after first pixels
that pass both depth&stencil test.
It seems to reduce the load on the GPU.
Note: with the help of texture barriere, maybe we could implement the algo
with a single pass.
If a color buffer is still attached and is smaller than depth buffer,
the latter won't be fully cleared.
As a faster alternative, use GL4.4 clear texture function. Avoid to fiddle with
framebuffer and pixel tests.
Fix #1362x Ar Tolenico 2 map clip