3990002250
Most of the InvalidateICache calls are for a 32 bytes block: this is the number of bytes invalidated by PowerPC dcb*/icb* instructions. Profiling shows that a lot of CPU time is spent checking if there are any JIT blocks covered by these 32 bytes (using std::map::lower_bound). This patch adds a bitset containing the state of every 32 bytes block in RAM (JIT cached/not JIT cached). Using that, a 32 bytes InvalidateICache can check in the bitset if any JIT block might be invalidated. A bitset check is a lot faster than an std::map::lower_bound operation, improving performance of JitCache::InvalidateICache by more than 100%. Some practical numbers: * Xenoblade Chronicles (PAL) 56.04FPS -> 59.28FPS (+5.78%) * The Last Story (PAL) 30.9FPS -> 32.83FPS (+6.25%) * Super Mario Galaxy (PAL) 59.76FPS -> 62.46FPS (+4.52%) This function still takes more time than it should - more optimization in this area might be possible (specializing for 32 bytes blocks to avoid useless memcpy, for example). |
||
---|---|---|
.. | ||
Src | ||
CMakeLists.txt | ||
Core.vcxproj | ||
Core.vcxproj.filters |