PS and VS making. Untested and won't work for now.
Add in program shader cache files.
Readd NativeVertexFormat stuffs.
Add in PS and VS cache things.
SetShaders in places.
Fixed EFB cache index computations in OpenGL renderer.
The previous computation was very likely to go out of array bounds,
which could result in crashes on EFB access.
Also, the cache size was rounded down instead of up. This is a problem
since EFB_HEIGHT (528) is not a multiple of EFB_CACHE_RECT_SIZE (64).
Most of the InvalidateICache calls are for a 32 bytes block: this is the
number of bytes invalidated by PowerPC dcb*/icb* instructions. Profiling
shows that a lot of CPU time is spent checking if there are any JIT blocks
covered by these 32 bytes (using std::map::lower_bound).
This patch adds a bitset containing the state of every 32 bytes block in
RAM (JIT cached/not JIT cached). Using that, a 32 bytes InvalidateICache
can check in the bitset if any JIT block might be invalidated. A bitset
check is a lot faster than an std::map::lower_bound operation, improving
performance of JitCache::InvalidateICache by more than 100%.
Some practical numbers:
* Xenoblade Chronicles (PAL)
56.04FPS -> 59.28FPS (+5.78%)
* The Last Story (PAL)
30.9FPS -> 32.83FPS (+6.25%)
* Super Mario Galaxy (PAL)
59.76FPS -> 62.46FPS (+4.52%)
This function still takes more time than it should - more optimization in
this area might be possible (specializing for 32 bytes blocks to avoid
useless memcpy, for example).
Very useful to compare performance between two builds, check the impact of
a configuration option, etc. FPS log is stored in User/Logs/fps.txt and is
reset each time you launch a game. Only enabled if you check the "Log FPS
to file" option in your graphics settings.
Could be improved a bit: currently logs only every 1s (so you can't really
see small variations), maybe output more infos to the fps.txt like
average/stddev (but Excel/Libreoffice/Google Docs can compute that easily
too).
This was not needed for most games before because the external exception was
itself delayed. aram-dma-fixes changed that and made the external exception
happen a lot quicker, breaking games that relied on the memcard operations
delay.
Fixes issue 5583.
These merges, while in theory improving emulation accuracy, cause issues
in other parts of the emulator based on invalid assumptions. memcard-delay
fixed some of these issues in the EXI memcard code, but several other
problems still exist and I don't have the time to debug that right now.
The problem here was the logic that detects SDL in the main CMakeLists.txt
is not the same as it is in DolphinWX/CmakeLists.txt to set libraries. When
using SDL from Externals it failed at link time because -lSDL was never set.
This fixes the problem by using the same condition logic to set the libs
as used when detecting SDL in the first place.
Super Mario Sunshine is using a cool trick: To determine how much goop has been cleaned in ep. 6 of Sirena Beach, it counts the number of pixels that are input to the blending stage. For that it's using the PE performance registers ;)
Fixes issue 1498.
This was not needed for most games before because the external exception was
itself delayed. aram-dma-fixes changed that and made the external exception
happen a lot quicker, breaking games that relied on the memcard operations
delay.
Fixes issue 5583.
Reason:
- It's wrong, zcomploc can't be emulated perfectly in HW backends without severely impacting performance.
- It provides virtually no advantages over the previous hack while introducing lots of code.
- There is a better alternative: If people insist on having some sort of valid zcomploc emulation, I suggest rendering each primitive separately while using a _clean_ dual-pass approach to emulate zcomploc.
This reverts commit 0efd4e5c29.
This reverts commit b4ec836aca.
This reverts commit bb4c9e2205.
This reverts commit 146b02615c.