FULL clamping mode fix games where hack was used, so is no longer required.
This commit remove hack, adjust GameDB according to that change, and rename fpuFloat4 function to fpuFloat3.
Last change is because fpuFloat3 was just wrapper for fpuFloat4 with added check for compare hack.
Additionally fpuFloat2 was only function that called fpuFloat4 directly, so that one call was changed to fpuFloat3 to respect previous changes.
Sometimes (CoD Finest Hour) can somehow end up with blocks missing from a program, not sure how, but it still finds the current program, so we check if the block exists, if not, recompile new ones.
Another small piece of #3451
Moves all VTLB pointer manipulation into dedicated classes for the purpose, which should allow the algorithm to be changed much more easily in the future (only have to change the class and recVTLB.cpp assembly since it obviously can't use the class)
Also some of the functions that manipulated the VTLB previously used POINTER_SIGN_BIT (which 1 << 63 on 64-bit) while others used a sign-extended 0x80000000. Now they all use the same one (POINTER_SIGN_BIT)
Note: recVTLB.cpp was updated to keep it compiling but the rest of the x86-64 compatibility changes were left out
Also, Cache.cpp seems to assume VTLB entries are both sides of the union at the same time, which is impossible. Does anyone know how this actually worked (and if this patch breaks it) or if it never worked properly in the first place?
Allocate memory in an x86-64-compatible way
Another part of #3451
Note: While this shouldn't change how anything works, it's been the #1 source of breakage of 32-bit builds in #3451 (it was the cause for the failure of win32 to allocate memory and the failure of linux-32 afterward) so we should definitely make sure it gets tested
see #3523 for more information
This change makes the EE recompiler not hardcoded to working with 32 MB of RAM, and instead work with the amount of RAM set in Ps2MemSize::MainRam. The rest of PCSX2 seems to work fine with more than 32 MB of RAM - it is only the EE recompiler that has trouble. If the Ps2MemSize::MainRam value is not changed from the default 32 MB, there should be no change: 32 MB / 0x10000 = 0x200, the value that was there previously.
This may be helpful if anybody else in the future wants to emulate a PS2 dev kit with 128 MB or RAM, or maybe the PSX dvr thing which I think has 64 MB of RAM. I've confirmed that with the change, you could set Ps2MemSize::MainRam to 128 MB, and execute code with the recompiler that's above the first 32 MB of RAM, and do VIF and scratchpad DMA transfers from this upper memory as well.
* Modify VU addressing so it only multiplies by 8 before entering the program
Fixes issues with VU1 TPC being read multiplied by 8 (bad)
* Removed assert on SuperVU which no longer makes sense
Fixes booting issues in the following games:
Jak X, Namco 50th anniversary, Spongebob the Movie, Spongebob Battle for Bikini Bottom,
The Incredibles, The Incredibles rize of the underminer, Soukou kihei armodyne, Garfield Saving Arlene, Tales of Fandom Vol. 2.
The games will no longer require a patch to boot.
16B alignment is now useless for nVifBlock (no more SSE)
However update the alignment of bucket to 64B. It will reduce cache miss
probability in the find loop
It avoids memory stalls and greatly reduces the overhead of the dVifUnpack function
Here a vtune summary of this branch (done on SotC init)
dVifUnpack<1> was 14.5% of effective VU thread time
dVifUnpack<1> is now 3.8% of effective VU thread time
I hope it will translate to better fps
It allow to compare only 8B in the lookup so SSE could be replaced with general instruction
As a bonus, it allow to compute the hash key with a mov rather than modulo (which was an 'and')
Inline the execution part
Add a num parameter to dVifsetVUptr
Use a local variable for the nVifBlock instead of a global struct state
The goal is to ease future update of the nVifBlock struct
Previous implementation saved the both the chain pointer and the chain size
Rational: size is useful to add new element and to detect the end of the chain
Vif cache is rarely miss. So 'add' is barely called and the end of a chain is
barely reached.
New implementation will add a null cell at the end of the chain. As a
cell contains a x86 pointer, if is null you could conclude that you
reach the end of the chain.
The 'add' function will traverse the chain to get the current size. It is
a cold path besides the chain is often short (< 4).
The 'find' function only need to check the startPtr bytes to detect the end
of the loop.
Note: SizeChain was replaced with a std::array
Safety:
* check remaining space before compilation
* clear hash if recompiler is reset
Perf:
* don't research the hash after a miss
* reduce branching in Unpack/ExecuteUnpack
Note: a potential speed optimization for dVifsetVUptr
Precompute the length and store in the cache. However it need 2B on the
nVifBlock struct. Maybe we can compact cl/wl. Or merge aligned with upkType
(if some bits are useless)
I misses some early return in my first tentative. Now VTune shows me
properly the time in VU recompiler.
Note: It seem some block overlap (likely due to the branching mess). But it is still way better than no data
Cons:
* requires ~180MB of physical memory (virtual memory is the same so it
doesn't impact the 4GB limit)
From steam: 98.81% got at least 2GB of RAM. 83.62% got at least 4GB of RAM.
That being said, it might not really increase RAM requirements as OS could put the
new allocation in the swap.
Pro:
* code is much easier
* remove at least half of the signal listener
* last but not least, it is way easier for profiler/debugger
Doesn't fully work yet
* Unknown stack frame
* Outside any known module
Potential root cause:
* Nvidia driver
* VU code as ebp is required for emulation so likely no frame
Warning can be reenabled on GCC
A warning isn't fixed as potentially the code is wrong
../pcsx2/gui/MemoryCardFolder.cpp: In member function ‘void FolderMemoryCard::FlushFileEntries(u32, u32, const wxString&, MemoryCardFileMetadataReference*)’:
../pcsx2/gui/MemoryCardFolder.cpp:1027:10: warning: unused variable ‘filenameCleaned’ [-Wunused-variable]
bool filenameCleaned = FileAccessHelper::CleanMemcardFilename( cleanName );
CID 146985 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)8.
uninit_member: Non-static class member vuxy is not initialized in this
constructor nor in any functions that it calls
Templace is nicer but give a hard time to compiler.
New version compile in both gcc&clang without hack
v2: add an uptr cast too for VS2013 sigh...
v3: use an ugly function pointer cast to please VS2013
Also slightly modify the textual description of the other underclock items.
All previous values available at the slider are still there, but since
the new value is now the mildest (slider == -1), it "pushes" the previous
-1 and -2 values one notch down.
This restores the mildest value to be identical to how it behaved before
90b11b2f , which is measured as about 75% speed.
Because the "balanced" preset uses the -1 slider value, it means this
restored mild value is now also used by the balanced preset.
As a note, while the message for the mildest value was always "reduce by
about 33%", before 90b11b2f it was actually about 25% reduction (75% speed,
like with this commit now), and after that commit it was about 40% reduction
(60% speed).
Also, since we add new value to the slider only on one side, the "0"
(default) slider position is now not at the exact middle. That's fine,
but maybe we could also add a milder overclock value on the other side
to have that symetric again.
The ee cyclerate percentage values at the slider text were inaccurate,
and sometimes wildely so.
Add some code to measure the actual speed at runtime (disabled by default)
and update the (static) slider text values according to the measurements.
Also change the description from increase/reduce "by AA %" to "to BB %".
This makes it slightly easier to grasp. E.g. "reduce speed to 10%" is
easier to grasp than "reduce speed by 90%", and similarly, "increase
speed to 300%" is easier to grasp than "increase speed by 200%".
Commit 330704a added code which applies the patches before recompiling the
elf entry block, but because at that stage the patches for the current
CRC were not yet loaded, effectively it did nothing.
Now it actually loads the patches before applying them.
As a result, it should now be possible for patches (with place=0) to be
effective before the elf is executed.
This is a hack, because the emulation loads the patches while it's not
paused. It works, but it's not great. A better way would be to pause the
emulation once the entry point is detected, then make the setting get
applied normally (which also loads the patches normally), and then resume
the emulation.
This _should_ properly fix#627 (the test case works as expected now).
patches and cheats are exactly the same (pnach style patch line) but we
stored two sets of them depending on their source: "patches" for
GameIndex.dbf patches, and "cheats" for all the rest (cheats, widescreen,
etc).
Unify patches and cheats and keep only "patches", cleanup and rename the
public API at Patch.h, and add documentation.
Also: add some console messages on invalid "place" value, and when we skip
searching cheats_ws.zip because a pnach file was found at cheats_ws dir.
Also: removed checks before applying different kinds of patches/cheats
because we don't need them (we didn't have disabled patches loaded anyway).
The checks removal _shouldn't_ have any effect, except that the checks were
wrong and accidentally prevented loading widescreen hacks which have a place
value of 0. No one probably noticed it since all the widescreen patches
which I looked at have a place value of 1. So now ws patches with place=0
would load correctly too. If we'll ever have such.
This commit doesn't change any behavior, but documents the "places" value
of the patch structure ("place" is 1 in patch=1,... and 0 in patch=0,...)
and also uses an enum to make its use explicit.
x86emitter : Convert variable type from u8 to bool.
recVTLB: Cast "sign" to bool to prevent a warning.
R5900OpcodeImpl: Cast all the values in array to u64 instead of s64.
isBreakpointNeeded returns if breakpoints are needed for any combination
of the current pc and delay slot.
dynarecCheckBreakpoint checks conditions for each breakpoint slot.
memory copy will be done in SSE or X86 only. It is very unlikely that
it was used anyway (need 64 bits transfer and no XMM register available)
Remove the now useless _allocMMXreg and _getFreeMMXreg too
Code need to be enabled with a define (NO_MMX 1)
Code was tested with ps2autotest but it need more tests. I need to check
alignement issue too.
Globally code is potentially a little slower than SSE.
The trick is that we need to shift only the 64 lsb whereas SSE will
shift the full 128 bits register.
Current implementation flush the lsb and drop the full register. It is
unlikely that next intruction will be done in SSE anyway.
Note: it would be easier in x64 arch
pcsx2/Dump.cpp: In function 'void iDumpBlock(u32, u32, uptr, u32)':
pcsx2/Dump.cpp:258:4: error: cannot convert 'wxString' to 'const char*' for argument '1' to 'int system(const char*)'
pcsx2/x86/iR3000A.cpp: In function 'void iIopDumpBlock(int, u8*)':
pcsx2/x86/iR3000A.cpp:285:45: error: cannot convert 'wxString' to 'const char*' for argument '1' to 'int system(const char*)'
Add a nop between instruction
Dump mips instruction
Add pretty print support
Note: it would be nicer to plug pretty print in the system command directly
Note: CL = 0 behaviour is still not completely accurate, the first vector is incorrect, but will look at that another day, need a game that does it first really so we can see if it helps :)
Irx module will be loaded at the end of the ROM (limited
at 256KB)
At the execution of the boot the list of module addresses are
hacked to add the new module.
For #1130
Overclock is now positive and underclock is now negative (it used to be
the other way round), so the range check should reflect that.
Coverity CID 156245 Bad bit shift operation(BAD SHIFT)
Code needs to work with xAddressReg however the x32 inheritance doesn't
exits anymore on 64 bits.
Note: it might be possible to uses some kind of autoconversion with
xRegister32or64. Could be a future improvement.
SuperVU wasn't converted (unlikely to be ported to 64 bits)
A couple of calls weren't converted because they require extra work
but there are not mandatory (debug/MTVU/...)
* Rework a bit MVU to support xScopedStackFrame. Potentially
stack frame can be optimized (save 5 instructions)
* I removed the recompiler stack check. Address sanitizer is more efficient anyway
Give both EE and x86 code.
Don't rely on global variable. The dump still dump the content of the register.
Of course value will be wrong if you don't dump it at the start of the block.
It help to detect register/memory access
the cpu struct address is also printed to easily postprocess the x86 memory pointer
(see next commit)
It misses jump & FPU. Jump need to be handled manually.
Syntax isn't perfect yet because of various casts. But it will allow to have a
single emitter syntax and rely on type safety on the future.
Conflicts:
pcsx2/x86/iR3000Atables.cpp
pcsx2/x86/ix86-32/iR5900-32.cpp
There are 2 pages that contains a register context to handle Syscall/Interrupt.
Write protection is useless here.
The only purpose is to reduce the number of useless SIGSEGV at the start of a game
VIF recompilers: full size
EE/MVU/IOP: 1/4th of the size
Lazy allocation is still enabled but it will less triggered.
Next step is to always commit the first block
Let's the kernel manage the memory either with builtin lazy allocation or
swapped memory.
Avoid to handle SIGSEGV manually (nicer for debug) and removes 250 lines of code.
From the comment, it used to be bad to gcc alignment issue.
Recent gcc align the stack on 16B boundary. Maybe it solved the issue.
I play severals minutes without any crashes but it requires more tests.
On linux, it breaks the 16B stack alignment requirement.
2 dispatchers were added to handle the call to the function. It avoid
any performance impact and remove the extra inlined asm
Fix#506
Coverity seems to not like the assert trick and only considers the macro as a single statement, so let's rework branchAddr and branchAddrN to satisfy coverity and improve the code quality. The following patch fixes 8 coverity issues with the same problem , CID 151736 - 151743.
(#1 of 1): Misused comma operator (NO_EFFECT)extra_comma: Part !!((mVU.prog.IRinfo.curPC & 1U) == 0U) of statement (!!((mVU.prog.IRinfo.curPC & 1U) == 0U)) , ((mVU.prog.IRinfo.curPC + 2U + (s32)((mVU.code & 0x400U) ? 0xfffffc00U | (mVU.code & 0x3ffU) : (mVU.code & 0x3ffU)) * 2 & mVU.progMemMask) * 4U) has no effect due to the comma.
CID 146870 (#1 of 1): Negative array index write (NEGATIVE_RETURNS)
5. negative_returns: Using variable xmmreg as an index to array ...
Open discussion: how to handle correctly bad register allocation?
Currently negative index is returned and a message printed. It means
we need to propagate the index check everywhere in order to not use it.
I suspect that Instruction Generation is more or less corrupted so
potentially we could just fire an exception.
-This was actually a bug, may improve some games that were buggy in superVU, but these functions aren't often used.
-Coverity CID 146865 & 146864: In recVUMI_ESIN(VURegs *, int): Missing break statement between cases in switch statement (CWE-484)
-Coverity CID 146863 & 146862: In recVUMI_EEXP(VURegs *, int): Missing break statement between cases in switch statement (CWE-484)
-Coverity CID 146855 & 146854: In recVUMI_EATAN(VURegs *, int): Missing break statement between cases in switch statement (CWE-484)