This reverts commit f77c1900fa.
Conflicts:
plugins/GSdx/GSTextureCache.cpp
Another fix was done later for Jak cut scene (or FMV). One game got a regression (don't remember which)
It's only ever updated after the queue is updated, so its state will
always lag slightly behind it. It's sufficient to just use empty().
This seems to fix some caching issues that were noticeable on Skylake
CPUs (#998).
In the previous code, the worker thread would notify the MTGS thread
while the mutex is still locked, which could cause the MTGS thread to
wake up and immediately go back to sleep again since it can't lock the
mutex.
Use a separate mutex for waiting, which avoids the issue.
Some PSX games seem to store image data of the drawing results in an undeterminate area out of range from the current context buffer. At such cases, calculate the height of both the frame memory rectangles combined.
What happens on "Crash bash" -
* At first draw, scissoring is limited to SCAY0- 0 & SCAY1- 255
* At second draw, scissoring is limited to SCAY0- 255 & SCAY0-511
Previously, we limited the height to the value of one single output texture, so instead of that let's calculate the total height of both the two buffers combined to prevent such issues.
Previously, we only calculated the width of a single output circuit which lead to missing a single pixel from the other output circuit which in turn causes offset issues in Persona games, I have customized GetDisplayRect() to now also calculate the dimensions of the merged rectangle when both the output circuits are enabled through the PMODE register, so this hack is no longer needed. :)
TL;DR - The above commit of mine accurately handles the offset issues by calculating union of the rects, removing this stupid hack. (not insulting any other developers, this stupid hack was mine :)
Passes the merged output circuit as the base size for texture cache scaling code. Helps fixing scaling issues where games use both of the output circuits for rendering.
Future Note: Alter the behavior of IsEnabled() check always preferring the second output circuit for some weird reason. I plan on changing it to a better auto-output circuit selection mechanism but that could probably be done some time in the future.
* Upgrade the counter to signed 32 bits. 16 bits is too small to contains the 64K value.
* Read ThreadProc/m_count when the mutex is locked
* Use old value of the fetch instead to read back the new value
Hopefully this translates well to slower systems :)
Tekken Tag:
Before: 79-81fps
After: 82-84fps
Front Mission 4 intro (as it pans over the roofs)
Before: 158-159fps
After: 165-166fps
Previously, the seconds variable of the RTC was updated on progressive modes after every 50 Vsyncs, which was obviously wrong. The code has been adjusted to update the RTC with respect to the vertical frequencies of various other video modes.
Avoid reading past the end of the disk.
Avoid waiting when there are prefetches remaining.
Fix the maths so that the first prefetch after a request attempts to
read the next block of sectors and not the block of sectors that was
just read (which will just be skipped anyway because the data has just
been cached).
Avoid potential prefetch after disk is swapped (though disc swap doesn't
work properly if you just eject and insert a different disk).
Stop prefetching on disk read failure (Suikoden hits this case - 2048
byte reads are requested, but only 2352 byte reads will succeed).
Also reduce the read retry count to 2.
16B alignment is now useless for nVifBlock (no more SSE)
However update the alignment of bucket to 64B. It will reduce cache miss
probability in the find loop
It avoids memory stalls and greatly reduces the overhead of the dVifUnpack function
Here a vtune summary of this branch (done on SotC init)
dVifUnpack<1> was 14.5% of effective VU thread time
dVifUnpack<1> is now 3.8% of effective VU thread time
I hope it will translate to better fps