Corrects XMM count for COP2 ops (some might be wrong, keep an eye out in the logs)
Fixes a hidden microVU bug with a SUB shortcut + some reg allocation bugs in QMFC/QMTC hidden by flushes.
Fixes Tennis Court Smash and Love Smash games which previously couldn't be fixed.
WRC no longer requires a patch, just the xgkickhack option.
Note: it's not a hack anymore, it just has to be called that :P
Some games like to write directly to VU memory once the program has finished and I have no easy way to update the kick without being super slow. so for now, we'll just flush it.
FULL clamping mode fix games where hack was used, so is no longer required.
This commit remove hack, adjust GameDB according to that change, and rename fpuFloat4 function to fpuFloat3.
Last change is because fpuFloat3 was just wrapper for fpuFloat4 with added check for compare hack.
Additionally fpuFloat2 was only function that called fpuFloat4 directly, so that one call was changed to fpuFloat3 to respect previous changes.
Sometimes (CoD Finest Hour) can somehow end up with blocks missing from a program, not sure how, but it still finds the current program, so we check if the block exists, if not, recompile new ones.
Another small piece of #3451
Moves all VTLB pointer manipulation into dedicated classes for the purpose, which should allow the algorithm to be changed much more easily in the future (only have to change the class and recVTLB.cpp assembly since it obviously can't use the class)
Also some of the functions that manipulated the VTLB previously used POINTER_SIGN_BIT (which 1 << 63 on 64-bit) while others used a sign-extended 0x80000000. Now they all use the same one (POINTER_SIGN_BIT)
Note: recVTLB.cpp was updated to keep it compiling but the rest of the x86-64 compatibility changes were left out
Also, Cache.cpp seems to assume VTLB entries are both sides of the union at the same time, which is impossible. Does anyone know how this actually worked (and if this patch breaks it) or if it never worked properly in the first place?
Allocate memory in an x86-64-compatible way
Another part of #3451
Note: While this shouldn't change how anything works, it's been the #1 source of breakage of 32-bit builds in #3451 (it was the cause for the failure of win32 to allocate memory and the failure of linux-32 afterward) so we should definitely make sure it gets tested
see #3523 for more information
This change makes the EE recompiler not hardcoded to working with 32 MB of RAM, and instead work with the amount of RAM set in Ps2MemSize::MainRam. The rest of PCSX2 seems to work fine with more than 32 MB of RAM - it is only the EE recompiler that has trouble. If the Ps2MemSize::MainRam value is not changed from the default 32 MB, there should be no change: 32 MB / 0x10000 = 0x200, the value that was there previously.
This may be helpful if anybody else in the future wants to emulate a PS2 dev kit with 128 MB or RAM, or maybe the PSX dvr thing which I think has 64 MB of RAM. I've confirmed that with the change, you could set Ps2MemSize::MainRam to 128 MB, and execute code with the recompiler that's above the first 32 MB of RAM, and do VIF and scratchpad DMA transfers from this upper memory as well.
* Modify VU addressing so it only multiplies by 8 before entering the program
Fixes issues with VU1 TPC being read multiplied by 8 (bad)
* Removed assert on SuperVU which no longer makes sense
Fixes booting issues in the following games:
Jak X, Namco 50th anniversary, Spongebob the Movie, Spongebob Battle for Bikini Bottom,
The Incredibles, The Incredibles rize of the underminer, Soukou kihei armodyne, Garfield Saving Arlene, Tales of Fandom Vol. 2.
The games will no longer require a patch to boot.
16B alignment is now useless for nVifBlock (no more SSE)
However update the alignment of bucket to 64B. It will reduce cache miss
probability in the find loop
It avoids memory stalls and greatly reduces the overhead of the dVifUnpack function
Here a vtune summary of this branch (done on SotC init)
dVifUnpack<1> was 14.5% of effective VU thread time
dVifUnpack<1> is now 3.8% of effective VU thread time
I hope it will translate to better fps
It allow to compare only 8B in the lookup so SSE could be replaced with general instruction
As a bonus, it allow to compute the hash key with a mov rather than modulo (which was an 'and')
Inline the execution part
Add a num parameter to dVifsetVUptr
Use a local variable for the nVifBlock instead of a global struct state
The goal is to ease future update of the nVifBlock struct
Previous implementation saved the both the chain pointer and the chain size
Rational: size is useful to add new element and to detect the end of the chain
Vif cache is rarely miss. So 'add' is barely called and the end of a chain is
barely reached.
New implementation will add a null cell at the end of the chain. As a
cell contains a x86 pointer, if is null you could conclude that you
reach the end of the chain.
The 'add' function will traverse the chain to get the current size. It is
a cold path besides the chain is often short (< 4).
The 'find' function only need to check the startPtr bytes to detect the end
of the loop.
Note: SizeChain was replaced with a std::array
Safety:
* check remaining space before compilation
* clear hash if recompiler is reset
Perf:
* don't research the hash after a miss
* reduce branching in Unpack/ExecuteUnpack
Note: a potential speed optimization for dVifsetVUptr
Precompute the length and store in the cache. However it need 2B on the
nVifBlock struct. Maybe we can compact cl/wl. Or merge aligned with upkType
(if some bits are useless)
I misses some early return in my first tentative. Now VTune shows me
properly the time in VU recompiler.
Note: It seem some block overlap (likely due to the branching mess). But it is still way better than no data
Cons:
* requires ~180MB of physical memory (virtual memory is the same so it
doesn't impact the 4GB limit)
From steam: 98.81% got at least 2GB of RAM. 83.62% got at least 4GB of RAM.
That being said, it might not really increase RAM requirements as OS could put the
new allocation in the swap.
Pro:
* code is much easier
* remove at least half of the signal listener
* last but not least, it is way easier for profiler/debugger
Doesn't fully work yet
* Unknown stack frame
* Outside any known module
Potential root cause:
* Nvidia driver
* VU code as ebp is required for emulation so likely no frame
Warning can be reenabled on GCC
A warning isn't fixed as potentially the code is wrong
../pcsx2/gui/MemoryCardFolder.cpp: In member function ‘void FolderMemoryCard::FlushFileEntries(u32, u32, const wxString&, MemoryCardFileMetadataReference*)’:
../pcsx2/gui/MemoryCardFolder.cpp:1027:10: warning: unused variable ‘filenameCleaned’ [-Wunused-variable]
bool filenameCleaned = FileAccessHelper::CleanMemcardFilename( cleanName );
CID 146985 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)8.
uninit_member: Non-static class member vuxy is not initialized in this
constructor nor in any functions that it calls
Templace is nicer but give a hard time to compiler.
New version compile in both gcc&clang without hack
v2: add an uptr cast too for VS2013 sigh...
v3: use an ugly function pointer cast to please VS2013
Also slightly modify the textual description of the other underclock items.
All previous values available at the slider are still there, but since
the new value is now the mildest (slider == -1), it "pushes" the previous
-1 and -2 values one notch down.
This restores the mildest value to be identical to how it behaved before
90b11b2f , which is measured as about 75% speed.
Because the "balanced" preset uses the -1 slider value, it means this
restored mild value is now also used by the balanced preset.
As a note, while the message for the mildest value was always "reduce by
about 33%", before 90b11b2f it was actually about 25% reduction (75% speed,
like with this commit now), and after that commit it was about 40% reduction
(60% speed).
Also, since we add new value to the slider only on one side, the "0"
(default) slider position is now not at the exact middle. That's fine,
but maybe we could also add a milder overclock value on the other side
to have that symetric again.
The ee cyclerate percentage values at the slider text were inaccurate,
and sometimes wildely so.
Add some code to measure the actual speed at runtime (disabled by default)
and update the (static) slider text values according to the measurements.
Also change the description from increase/reduce "by AA %" to "to BB %".
This makes it slightly easier to grasp. E.g. "reduce speed to 10%" is
easier to grasp than "reduce speed by 90%", and similarly, "increase
speed to 300%" is easier to grasp than "increase speed by 200%".
Commit 330704a added code which applies the patches before recompiling the
elf entry block, but because at that stage the patches for the current
CRC were not yet loaded, effectively it did nothing.
Now it actually loads the patches before applying them.
As a result, it should now be possible for patches (with place=0) to be
effective before the elf is executed.
This is a hack, because the emulation loads the patches while it's not
paused. It works, but it's not great. A better way would be to pause the
emulation once the entry point is detected, then make the setting get
applied normally (which also loads the patches normally), and then resume
the emulation.
This _should_ properly fix#627 (the test case works as expected now).
patches and cheats are exactly the same (pnach style patch line) but we
stored two sets of them depending on their source: "patches" for
GameIndex.dbf patches, and "cheats" for all the rest (cheats, widescreen,
etc).
Unify patches and cheats and keep only "patches", cleanup and rename the
public API at Patch.h, and add documentation.
Also: add some console messages on invalid "place" value, and when we skip
searching cheats_ws.zip because a pnach file was found at cheats_ws dir.
Also: removed checks before applying different kinds of patches/cheats
because we don't need them (we didn't have disabled patches loaded anyway).
The checks removal _shouldn't_ have any effect, except that the checks were
wrong and accidentally prevented loading widescreen hacks which have a place
value of 0. No one probably noticed it since all the widescreen patches
which I looked at have a place value of 1. So now ws patches with place=0
would load correctly too. If we'll ever have such.
This commit doesn't change any behavior, but documents the "places" value
of the patch structure ("place" is 1 in patch=1,... and 0 in patch=0,...)
and also uses an enum to make its use explicit.
x86emitter : Convert variable type from u8 to bool.
recVTLB: Cast "sign" to bool to prevent a warning.
R5900OpcodeImpl: Cast all the values in array to u64 instead of s64.
isBreakpointNeeded returns if breakpoints are needed for any combination
of the current pc and delay slot.
dynarecCheckBreakpoint checks conditions for each breakpoint slot.
memory copy will be done in SSE or X86 only. It is very unlikely that
it was used anyway (need 64 bits transfer and no XMM register available)
Remove the now useless _allocMMXreg and _getFreeMMXreg too
Code need to be enabled with a define (NO_MMX 1)
Code was tested with ps2autotest but it need more tests. I need to check
alignement issue too.
Globally code is potentially a little slower than SSE.
The trick is that we need to shift only the 64 lsb whereas SSE will
shift the full 128 bits register.
Current implementation flush the lsb and drop the full register. It is
unlikely that next intruction will be done in SSE anyway.
Note: it would be easier in x64 arch
pcsx2/Dump.cpp: In function 'void iDumpBlock(u32, u32, uptr, u32)':
pcsx2/Dump.cpp:258:4: error: cannot convert 'wxString' to 'const char*' for argument '1' to 'int system(const char*)'
pcsx2/x86/iR3000A.cpp: In function 'void iIopDumpBlock(int, u8*)':
pcsx2/x86/iR3000A.cpp:285:45: error: cannot convert 'wxString' to 'const char*' for argument '1' to 'int system(const char*)'