Now it should be easier to merge more than 2-instruction-long sequences.
Also correct some minor inconsistencies in behavior between instruction
merging cases.
Optimistically assume used GQRs are 0 in blocks that only use one GQR, and
bail at the start of the block and recompile if that assumption fails.
Many games use almost entirely unquantized stores (e.g. Rebel Strike, Sonic
Colors), so this will likely be a big performance improvement across the board
for games with heavy use of paired singles.
Won't work with all games, but provides a nice way to spend extra CPU to make
a variable framerate game faster (e.g. Spyro or The Last Story), or to make
a game use less CPU at the cost of a lower framerate (e.g. Rogue Leader).
Retry a failed connection after a short delay -- hardware sometimes needs some
time to settle, or other Bluetooth programs are attempting to query the
device as well (e.g. blueman-manager).
An uninitialized struct member "l2_bdaddr_type" was making most connect calls
fail with "Invalid argument". The connection could succeed if the unitialized
memory happened to have a zero byte in the appropriate location.
Update register wasn't being loaded in to the cache prior pushing the address in to it.
Adds float push and pop routines around the calls that need it as well.
The old delay was probably a hack to make up for the incorrect
disc speeds. Using it with the new disc speeds made
Resident Evil Archives: Resident Evil Zero freeze when starting.
In rare cases, this can result in a violation of the JIT block cache constraint
that blocks must end in the same place. This can cause instability, lockups,
due to blocks not properly being invalidated properly.
l Please enter the commit message for your changes. Lines starting
Instead of passing the value around constantly, just store it in the regcache,
note where it is, and restore it on the exception path.
This saves a whole bunch of pushing and popping and gives a ~5% speed boost
in Rebel Strike. It's a bit ugly, but it simplifies a lot of code and is
faster, too.
Should be slightly faster, and also lets us skip the nops on the way back.
Remove the trampoline cache, since it isn't really useful anymore with this.
Instead of jumping over update code and similar, just jump directly to the
handler.
This avoids redundant exception checks in the case where we can't do fastmem
memory operations (e.g. paired loadstore).
Small TLB lookup optimizations: this is the hot path for MMU code, so try to
make it better.
Template the TLB lookup functions based on the lookup type (opcode, data,
no exception).
Clean up the Read/Write functions and make them more consistent.
Add an early-exit path for MMU accesses to ReadFromHardware/WriteToHardware.
I'm not quite sure why the float paired stores were written how they were,
but it should be more consistent now.
Also get rid of the use of a psTemp global that wasn't really needed.
Add some comments.
When we try to JIT from a block which doesn't exist, don't JIT any code;
just update the PPC state to indicate an ISI. This is a little simpler,
and avoids abusing the JIT block cache.
These instructions are all implemented with fastmem support.
Currently loads with update are disabled due to an issue that I've yet to figure out.
I'm sure I'll figure that out later.
Currently supports only integer loadstores. Floating point loadstores will come later.
This system is semi based on the ARMv7 backpatching routine, where we need to initialize our backpatch routine sizes prior to actually using them so
we know we won't be overwriting any memory.
dcbz: just don't use GetPointer, that can't be right anyways
ppcanalyst: don't print "instruction hex 0" messages in MMU mode, where ISIs
are expected.
This hack is there for quite a long time, and lots of games crashes if it's disabled.
But it's still a hack, so it shouldn't be enabled hard coded. This commit create a new
ini option for this hack which is enabled by default.
Maybe some games does still run very fine without this hack.
This is a one instruction optimization for integer loadstores.
Makes sure to enable nop padding in some cases where a fault can still happen and cause us to overwrite other instructions that aren't meant to be.
Align our dispatcher to a page so we can jump to it with a ADRP+BR pair instead of ADRP+ADD+BR.
Also make sure to save /all/ of our callee saved registers that we are supposed to save.
Requires PR #1705 prior to merging.
Adds the ability to flush the cache and maintain state.
Adds the BindToRegister ability.
Sorts register usage as callee saved used first, reduces dumping pressure when jumping to external routines/interpreter.
Adds a function to store a register, for use when flushing a register that won't be used during the rest of a block.
Unfortunately the map files in American Mensa Academy don't correspond to the release version.
But at least now if other games use those map files we will be able to load them.
Fixes#7917
The first memset was clearing the delicate bits of the std::string
in the struct, causing segfaults.
I also removed the rest of the memsets because they were paranoid,
unneeded and waseful. We shouldn't be managing the ssl libraries
structs for it.
I checked and the ssl library's functions were already memsetting
those structs as needed.
Even in games that require MMU mode, loads outside the area specified by
the BAT are rare, so fastmem is a substantial improvement.
All of the interesting changes are in the backpatch handler, to make it
generate DSI exceptions correctly.
Previously the two operand register selection bits were inverted, causing "CMPAR AC1, AX0.H" to be disassembled as "CMPAR AC0, AX1.H".
DSP RE is always fun: on the one hand Nintendo does a lot of stupid shit, so anything weird could be a legitimate bug of the UCode that is not supposed to make any sense. On the other hand, Dolphin *also* does a lot of stupid shit, so there's always that doubt.
Note: completely untested change - done with the GH text editor, just to show you how much I care :) . These operand descriptions are only used for disassembly, so no real behavior change is expected.
The libusb driver must be installed on the adapter (e.g. zadig can be used to install the driver in Windows). GameCube pad controllers are supported and will override the current input device assigned to the port. GameCube controller buttons are auto-configured and cannot be re-assigned. Rumble is supported. Hotplug is supported while playing a game. If a controller is unplugged from the adapter, Dolphin will fallback to using the host input device on that port. If a port on the adapter is unused, Dolphin will use the host input device for that port, allowing a mixture of host input devices and controllers connected to the adapter.
The adapter support can be disabled in the Controllers config if the OS driver is preferred (allowing the pad buttons to be reconfigured).
One adapter per system is supported.
Updated PTE.R bit on Write and Instruction fetch.
Added code to read the PTE from MEM2 if the PTE is stored there.
Refactored the two hash functions to reduce code duplication.
Updated save state version.
We try to keep as many registers as possible in callee saved registers, so if we have guest registers in the correct registers and the interpreter
call we are falling back to doesn't need the registers then we can dump just those ones. Which means we don't have to dump 100% of our register state
when falling to the interpreter.
ComputeRC was a bit unclear by using 64bit registers for setting the immediate and then calling SXTW on a 6b4it register which is just a bit obscure.
When the source register is an immediate in cntlzwx, just use the built in GCC function instead of our own implementing for counting leading zeros.
Before block linking was enabled but it wasn't ever implemented.
Implements link blocks and destroy block functions and moves the downcount check in the WriteExit function so it doesn't get overwritten when linking.
Changes the dispatcher to make sure to we are saving the LR(X30) to the stack. Also makes sure to keep the stack aligned.
AArch64's AAPCS64 mandates the stack to be quad-word aligned.
Fixes the dispatcher from infinite looping due to a downcount check jumping to the dispatcher. This was because checking exceptions and the state
pointer wouldn't reset the global conditional flags. So it would leave the timing/exception, jump to the start of the dispatcher and then jump back
again due to the conditional branch.
Removes the REG_AWAY nonsense I was doing. I've got to get the JIT more up to speed before thinking of insane register cache things.
Also fixes a bug in immediate setting where if the register being set to an immediate already had a host register tied to it then it wouldn't free the
register it had. Resulting in register exhaustion.
lmw/stmw weren't properly setting input and output registers since they use multiple registers.
dcbz was just missing a flag in the instruction tables.
This code was obviously wrong, we were sign extending 8 bit unsigned values and loading from the wrong offset as well.
This fixes a bug in Muramasa where some colours were going insane.
If the inputs are both float singles, and the top half is known to be identical
to the bottom half, we can use packed arithmetic instead of scalar to skip
the movddup.
This is slower on a few rather old CPUs, plus the Atom+Silvermont, so detect
Atom and disable it in that case.
Also avoid PPC_FP on stores if we know that the output came from a float op.
Rather than playing terrible hacks to determine the start of input
frames, just update system input periodically. Specifically, every
60th of a second.
The game can never change these, so there's no reason to make it
dynamic. Just put the constants in the code.
While we're at it, take the time to clean up the code and also
and document several of the hacks we're doing inside to make the code
clearer to understand.
This is the only use for a lot of unused methods and structs, and it's a
poorly written mess that doesn't even compile. Just remove it so we can
clean up the rest of a lot of code.
This takes the giant mess of controller-like devices (dance mat and
steering wheel) down to something more manageable, similar to how
the Donkey Konga bongo controller works.
Based-on-a-patch-by: comex <comexk@gmail.com>
This doesn't seem to be necessary anymore now that FPRF is implemented in the
JIT. Technically, this isn't the same as before, since the JIT doesn't
implement the fcmp exception semantics, but as far as testing has shown, this
doesn't seem necessary.
This should make games that use FPRF a few percent faster (e.g. F-Zero GX)
since fcmpx no longer has to be fallbacked.
Also, this avoids keeping the system awake if a game is not being
played.
Frankly, I don't know what the point of precisely tracking these things
is, but that's how the API works. Feel free to add analogous
functionality on other platforms.
Move the JITed function/basic-block registration logic out of the CPU
subsystem in order to add JIT registration to JITed DSP and
Video/VertexLoader code.
This necessary in order to add /tmp/perf-$pid.map support to other
JITed code as they need to write to the same file.
'perf' is the standard builtin tool for performance analysis on recent
Linux kernel. Its source code is shipped within the kernel repository.
'perf' has basic support for JIT. For each process, it can read a file
named /tmp/perf-$PID.map. This file contains mapping from address
range to function name in the format:
41187e2a 1a EmuCode_804a33fc
with the following entries:
1. beginning of the range (hexadecimal);
2. size of the range (hexadecimal);
3. name of the function.
We supply the PowerPC address of the basic block as function name.
Usage:
DOLPHIN_PERF_DIR=/tmp dolphin-emu &
perf record -F99 -p $(pgrep dolphin-emu) --call-graph dwarf
perf script | stackcollapse-perf.pl | grep EmuCode__ | flamegraph.pl > profile.svg
Issue: perf does not have support for region invalidation. It reads
the file in postprocessing. It probably does not work very well if a
JIT region is reused for another basic block: wrong results should be
expected in this case. Currently, nothing is done to prevent this.
This is a fairly lengthy change that can't be separated out to multiple commits well due to the nature of fastmem being a bit of an intertangled mess.
This makes my life easier for maintaining fastmem on ARMv7 because I now don't have to do any terrible instruction counting and NOP padding. Really
makes my brain stop hurting when working with it.
This enables fastmem for a whole bunch of new instructions, which basically means that all instructions now have fastmem working for them. This also
rewrites the floating point loadstores again because the last implementation was pretty crap when it comes to performance, even if they were the
cleanest implementation from my point of view.
This initially started with me rewriting the fastmem routines to work just like the previous/current implementation of floating loadstores. That was
when I noticed that the performance tanked and decided to rewrite all of it.
This also happens to implement gatherpipe optimizations alongside constant address optimization.
Overall this comment brings a fairly large speedboost when using fastmem.
* Added country flags for games from Netherlands and Spain
* Added separate category for Region Free games (Uses European flag as placeholder)
* Added missing country filter options in "show regions" menu
* Rearranged country filters for readability
* Incremented CACHE_REVISION
Also fixed various country filters not showing up as options in the "Show regions" menu.
There was a longstanding hack that defined ucontext_t manually to work
around the lack of this header on the Android NDK. However, it looks
like newer NDK versions now have it like good little POSIX boys, and my
recent header reshuffle broke the build on those versions, presumably
because the real and fake definitions of ucontext_t end up included in
the same file where they weren't under the old organization.
Rather than try to revert the conflict, this commit just removes the
hack. The buildbot's NDK will need to be upgraded.
Changes the read speed of GC discs from 3 MiB/s to 2-3.3 MiB/s,
depending on the location of the data. I also attempted to change the
speeds for Wii discs, but it has very little effect right now because
Wii games use IPC_HLE instead of DVDInterface. It does affect Wii
homebrew that reads Wii discs, though.
This was interesting implementing.
Our generic QueryPerformanceCounter function on ARMv7 was so slow that profiling a block was impossible.
I waited about five minutes and I couldn't even get a single frame to output.
This instead uses ARMv7's PMU to get cycle counts, which are a relatively minor performance drop in my testing.
One disadvantage of this method is that the kernel can lock us out of using these co-processor registers, but it seems to work on my Jetson board.
Another disadvantage is that we aren't having block times in "real" time but cycles instead, not too big of a deal.
This also removes instruction run counts from profiling because that's just annoying and we don't expose an interface for even getting those results
from our UI.
This implements a new system for fastmem backpatching on ARMv7 that is less of a mindfsck to deal with.
This also implements stfs under the default loadstore path as well, not sure why it was by itself in the first place.
I'll be moving the rest of the loadstore methods over to this new way in a few days.
These are causing issues in games. In particular you get pink on the screen in Animal Crossing.
Disable until fully investigated.
This also disables fastmem on floating point loadstore instructions which are horribly broken and won't actually backpatch when an invalid read/write
is encountered.
m_strGPUDeterminismMode can be set by either the global or game
settings. Either way, it's then supposed to be parsed into an enum,
m_GPUDeterminismMode. However, the code to do this was placed right
after checking for game settings, which doesn't happen at all if there
isn't a valid title ID. Move it outside the if block.
This caused invalidations that only affected the last portion of a JIT block
to fail, breaking Wii64's block linking. It might affect a bunch of other
games too; I haven't tested.
This code originally tried to map the "low space" for the Gamecube's
memory layout, but since has expanded to mapping all of the easily
mappable memory on the system. Change the name to "GrabSHMSegment" to
indicate that we're looking for a shared memory segment we can map into
our process space.
These are effectively unused, since the memmap already maps them in one
place. For 32-bit, they might have some slight advantage, but we already
special-case the regular "high-mem" pointer for 32-bit, so just use the
one we already have...