It avoids memory stalls and greatly reduces the overhead of the dVifUnpack function
Here a vtune summary of this branch (done on SotC init)
dVifUnpack<1> was 14.5% of effective VU thread time
dVifUnpack<1> is now 3.8% of effective VU thread time
I hope it will translate to better fps
It allow to compare only 8B in the lookup so SSE could be replaced with general instruction
As a bonus, it allow to compute the hash key with a mov rather than modulo (which was an 'and')
Inline the execution part
Add a num parameter to dVifsetVUptr
Use a local variable for the nVifBlock instead of a global struct state
The goal is to ease future update of the nVifBlock struct
Previous implementation saved the both the chain pointer and the chain size
Rational: size is useful to add new element and to detect the end of the chain
Vif cache is rarely miss. So 'add' is barely called and the end of a chain is
barely reached.
New implementation will add a null cell at the end of the chain. As a
cell contains a x86 pointer, if is null you could conclude that you
reach the end of the chain.
The 'add' function will traverse the chain to get the current size. It is
a cold path besides the chain is often short (< 4).
The 'find' function only need to check the startPtr bytes to detect the end
of the loop.
Note: SizeChain was replaced with a std::array
Safety:
* check remaining space before compilation
* clear hash if recompiler is reset
Perf:
* don't research the hash after a miss
* reduce branching in Unpack/ExecuteUnpack
Note: a potential speed optimization for dVifsetVUptr
Precompute the length and store in the cache. However it need 2B on the
nVifBlock struct. Maybe we can compact cl/wl. Or merge aligned with upkType
(if some bits are useless)
I misses some early return in my first tentative. Now VTune shows me
properly the time in VU recompiler.
Note: It seem some block overlap (likely due to the branching mess). But it is still way better than no data
GS_Packet constructor calls memset which is quite slow and useless as data is overwritten
Vtune overhead of Gif_Unit::Execute goes from 5.8% to 3.0% (EE thread)
Several PSX titles lack a backslash in the elf path, which made the disc
serial contain 'cdrom:', this caused savestate issues in those ganes.
Solves: https://github.com/PCSX2/pcsx2/issues/1692
Use a reinterpret_cast instead of casting the function pointer address
to a void** and dereferencing it.
Also remove an unnecessary (void) and avoid including stdafx.h.
Don't adjust 'image' and just use an additional offset.
'success' was kinda unnecessary when true or false could just be
directly returned.
Move 'compression' clamping out to GSPng::Save instead.
And throw in a whole bunch of const for good measure.
wchar_t is 16-bits on Windows, which can't actually properly fit all
Unicode characters.
Use the wx3.0.x wxTextWrapper approach of using iterators that increment
by actual characters to fix the issue, and also switch to using the
std::string style functions in wxString.
Previously the boot menu items always displayed "Boot CDVD" regardless of the current source medium, this behavior has been fixed to properly adjust the text when source medium is changed. Now it'll display Boot CDVD/ISO/BIOS with respect to the current source medium.
v2: Some instances of "Iso" have been changed to "ISO" for consistency.
v3: Remove the unnecessary "Reboot" on menu item labels, saves some string translations.
v4: Add a new shortcut key for the primary boot menu item.
There is already a dedicated bind event to handle the gray out of the menu item, so let's just gray it out initially and let the bind event handler do it's thing.
The previous behavior would only gray out the menu item when all the plugins are in a non-active state which didn't seem ideal as the plugins were shutdown only when closing PCSX2 (or) switching plugins.
Adds a device select option that hides bindings and disables binding new
inputs from all non-selected devices on the bindings list. This also
avoids input conflict issues when one controller is recognized as
several devices through different APIs.
Updates the UI by reducing the height of the plugin window. This has
been achieved by removing some buttons below the diagnostics and
bindings list and incorporating those functions into the
lists(accessible by right-clicking in the list). The binding
configurations on the Pad tabs have been moved to a separate page, like
the Forcefeedback bindings, to separate the configuration from the
bindings.
Adds a skip deadzone option to the Pad tabs.
With the normal deadzone, if the control input value is below the
deadzone threshold, the input is ignored.
However, some controllers also benefit from shortening the input range
by skipping a deadzone.
JMMT uses a bigger display height on NTSC progressive scan mode, which is not really unusual hence adjust the saturation hack to only take effect on interlaced NTSC mode.
However, the whole double screen issue on FMV still exists. As a bit of information, this game has the second output disabled but seems to have some valid data inside of it, maybe the second output data is leaked into the first one? most likely a bug in the frambuffer data management rather than a CRTC issue (needs to be investigated)
in the top-level source directory. The build folder should NOT be
transferred between computers when PGO is used, though I don't
see why anyone would be doing so anyway.
Also adds support for PGO and LTO to the build.sh script.
The purpose is to stop vtune profiling in a predictable way. It allows
to compare multiple runs.
ERET is called every syscall/interrupt return so it is proportional to
the EE program execution.
Mapping the full buffer is killer on Vtune (either crash or requires a huge processing time).
Instead keep the same ID for code in the same buffers.
I think all buffers are correctly mapped now but I still miss the frame pointer
for VU code.
Cons:
* requires ~180MB of physical memory (virtual memory is the same so it
doesn't impact the 4GB limit)
From steam: 98.81% got at least 2GB of RAM. 83.62% got at least 4GB of RAM.
That being said, it might not really increase RAM requirements as OS could put the
new allocation in the swap.
Pro:
* code is much easier
* remove at least half of the signal listener
* last but not least, it is way easier for profiler/debugger
Previously, the height of the frame offset was also considered for the total height of the texture which was obviously wrong as the portion before the offset value isn't part of the frame memory.
In the previous code, the threads were created and destroyed in the base
class constructor and destructor, so the threads could potentially be
active while the object is in a partially constructed or destroyed state.
The thread however, relies on a virtual function to process the queue
items, and the vtable might not be in the desired state when the object
is partially constructed or destroyed.
This probably only matters during object destruction - no items are in
the queue during object construction so the virtual function won't be
called, but items may still be queued up when the destructor is called,
so the virtual function can be called. It wasn't an issue because all
uses of the thread explicitly waited for the queues to be empty before
invoking the destructor.
Adjust the constructor to take a std::function parameter, which the
thread will use instead to process queue items, and avoid inheriting
from the GSJobQueue class. This will also eliminate the need to
explicitly wait for all jobs to finish (unless there are other external
factors, of course), which would probably make future code safer.