The ring buffer is composed of severals read/write by transaction.
Atomic operations are only required at the start/end of the full
transaction. In the middle, you can use normal variable (optimization
opportunity for the compiler)
Use acquire/release semantics on isBusy and vuCycles to remain 100% safe
(relaxed might be doable but better be safe than sorry)
Use dedicated cache line for atomic variable to avoid any conflict between CPU
The struct is copied in various ring buffer (hot path)
We only need the return status of the function so use a reference instead of
a state variable
Side note: if we align the struct to 16B maybe the compiler can use SSE to copy it.
Warning: it breaks save state compatibility
This reverts commit
d6383e6c21
It created a regression in Everybody's Golf 4/Hot Shots Golf 4, breaking the renderering when depth emulation is disabled/when using a Direct3D Hardware renderer.
The code is now a mirror of the ::add. So 1 insert == 1 erase
This way it won't crash on future update. And it will support future GS
memory wrapping improvement.
ZoE2:
RemoveAt overhead plummet to 0.5%. It was 17% !
However insertion is a bit slower. Due to the begin() after the push_front
v2: use std:: for lists and arrays
Removes Alpine Racer 3 hack. Issue has been resolved.
Moves NanoBreaker hack. Issue has been resolved for OpenGL and hack has
been moved to DX only.
Moves Tri-Ace games hacks. Hacks are also necessary for OpenGL with "Partial" CRC Hack Level to prevent massive slowdown.
Move Tales Of Legendia hack back as it's also necessary for OpenGL with "Partial" CRC Hack Level to prevent graphical issues.
Close: https://github.com/PCSX2/pcsx2/issues/1698
Added PAL and NTSC-U CRC's for Ar tonelico II.
Unfortunately it requires at least GCC6. If a nice guy can check the generated code on GCC6.
I don't know clang status.
Here the only example, I have found on the web
https://developers.redhat.com/blog/2016/02/25/new-asm-flags-feature-for-x86-in-gcc-6/
Current generated code in GSTextureCache::SourceMap::Add
38b3: bsf eax,esi
38b6: add esp,0x10
38b9: test esi,esi
38bb: jne 387e <GSTextureCache::SourceMap::Add(GSTextureCache::Source*, GIFRegTEX0 const&, GSOffset*)+0x6e>
BSF already set the Z flag when input (esi) is 0. So it would be better
to not put a silly add before the jump and to skip the test operation.
OMG, Zone of Ender got a speed boost from 11 fps to 45 fps
Seriously, the goal is to allow benchmarking GSdx without too much overhead of the main renderer draw call
Note: unlike the null renderer, texture/vertex uploading, 2D draw, texture conversions are still done.
* move the post-processing frame into the OSD tab
* Rename Global Settings to Renderer Settings
* put monitor and indicator check box on the same line
At least we have a similar number of options by tab
This reverts commit f77c1900fa.
Conflicts:
plugins/GSdx/GSTextureCache.cpp
Another fix was done later for Jak cut scene (or FMV). One game got a regression (don't remember which)
It's only ever updated after the queue is updated, so its state will
always lag slightly behind it. It's sufficient to just use empty().
This seems to fix some caching issues that were noticeable on Skylake
CPUs (#998).
In the previous code, the worker thread would notify the MTGS thread
while the mutex is still locked, which could cause the MTGS thread to
wake up and immediately go back to sleep again since it can't lock the
mutex.
Use a separate mutex for waiting, which avoids the issue.
Some PSX games seem to store image data of the drawing results in an undeterminate area out of range from the current context buffer. At such cases, calculate the height of both the frame memory rectangles combined.
What happens on "Crash bash" -
* At first draw, scissoring is limited to SCAY0- 0 & SCAY1- 255
* At second draw, scissoring is limited to SCAY0- 255 & SCAY0-511
Previously, we limited the height to the value of one single output texture, so instead of that let's calculate the total height of both the two buffers combined to prevent such issues.