Move the JITed function/basic-block registration logic out of the CPU
subsystem in order to add JIT registration to JITed DSP and
Video/VertexLoader code.
This necessary in order to add /tmp/perf-$pid.map support to other
JITed code as they need to write to the same file.
'perf' is the standard builtin tool for performance analysis on recent
Linux kernel. Its source code is shipped within the kernel repository.
'perf' has basic support for JIT. For each process, it can read a file
named /tmp/perf-$PID.map. This file contains mapping from address
range to function name in the format:
41187e2a 1a EmuCode_804a33fc
with the following entries:
1. beginning of the range (hexadecimal);
2. size of the range (hexadecimal);
3. name of the function.
We supply the PowerPC address of the basic block as function name.
Usage:
DOLPHIN_PERF_DIR=/tmp dolphin-emu &
perf record -F99 -p $(pgrep dolphin-emu) --call-graph dwarf
perf script | stackcollapse-perf.pl | grep EmuCode__ | flamegraph.pl > profile.svg
Issue: perf does not have support for region invalidation. It reads
the file in postprocessing. It probably does not work very well if a
JIT region is reused for another basic block: wrong results should be
expected in this case. Currently, nothing is done to prevent this.
This is the same extension that we all know and love but under a different name with some different requirements.
In regular OpenGL fashion, you can't just move a desktop OpenGL extension to OpenGL ES without ratifying a new extension, which is why this falls
under a EXT extension, which in turn causes it to have suffixes attached to their function names.
This is the first step in our way towards conquering all mobile GPUs that don't support desktop OpenGL, hopefully we also can add support for
buffer_storage to OpenGL ES as well so we can make full use of this extension.
This is a fairly lengthy change that can't be separated out to multiple commits well due to the nature of fastmem being a bit of an intertangled mess.
This makes my life easier for maintaining fastmem on ARMv7 because I now don't have to do any terrible instruction counting and NOP padding. Really
makes my brain stop hurting when working with it.
This enables fastmem for a whole bunch of new instructions, which basically means that all instructions now have fastmem working for them. This also
rewrites the floating point loadstores again because the last implementation was pretty crap when it comes to performance, even if they were the
cleanest implementation from my point of view.
This initially started with me rewriting the fastmem routines to work just like the previous/current implementation of floating loadstores. That was
when I noticed that the performance tanked and decided to rewrite all of it.
This also happens to implement gatherpipe optimizations alongside constant address optimization.
Overall this comment brings a fairly large speedboost when using fastmem.
Fifo overflow fix
The idea behind separating the CPU and GPU thread path is to prevent two threads executing the same function (SetCPStatusFromCPU) at the same time. The CPU thread gets to that function via GetherPipeBursted and the GPU thread gets there via SetCPStatusFromGPU. I wrote the original (factored) version as I like to keep code duplication to a minimum. This worked most of the time but not all of the time. It was a good move to separate it to a GPU version and a CPU version, but then the GPU version called the CPU version at the tail end. Removing the call (as in this PR) prevents the FIFO overflow in Starfox Adventures.
The "Updating the CPStatus before FIFO events" change is simply there to keep the DC code path and the SC code path consistent in the !GPLinked scenario. The SC code path does the same thing when !GPLinked via RunGPU. Putting the SetCPStatus call before the !GPLinked state changes the DC code path to do the same thing. The more we keep the DC and SC code paths consistent, the better.
Forcing the exception check on interrupts is the change for The Last Story. It makes the emulator process interrupts more responsively on CP interrupts.
The HiWatermark check is something that old (pre-2010) versions of Dolphin used to do. Games like Battalion Wars 2 do not expect the GPU to run so slowly and do not have the code to react to that situation, so this change makes the emulator extra careful in those situations.
* Added country flags for games from Netherlands and Spain
* Added separate category for Region Free games (Uses European flag as placeholder)
* Added missing country filter options in "show regions" menu
* Rearranged country filters for readability
* Incremented CACHE_REVISION
Also fixed various country filters not showing up as options in the "Show regions" menu.