Now that our timings are much more accurate it doesn't look like we
need it anymore. And the instant ARAM DMA mode + scheduling fixes
ctually breaks ATV: Quad Power Racing 2 (causing all sorts of werid
bugs).
Fundamentally, all this does is enforce the invariant that we always
translate effective addresses based on the current BAT registers and
page table before we do anything else with them.
This change can be logically divided into three parts. The first part is
creating a table to represent the current BAT state, and keeping it up to
date (PowerPC::IBATUpdated, PowerPC::DBATUpdated, etc.). This does
nothing by itself, but it's necessary for the other parts.
The second part (mostly in MMU.cpp) is simply removing all the hardcoded
checks for specific untranslated addresses, and consistently translating
addresses using the current BAT configuration. Very straightforward, but a
lot of code changes because we hardcoded assumptions all over the place.
The third part (mostly in Memmap.cpp) is making the fastmem arena reflect
the current BAT configuration. We do this by redoing the mapping (calling
memmap()) based on the BAT table whenever it changes.
One additional minor change is that translation can fail in two ways:
either the segment is a direct store segment, or page table lookup failed.
The difference doesn't usually matter, but the difference affects cache
instructions, like dcbz.
Init cannot be called more than once because it registers the
CoreTiming callbacks, that trips the assertions and will cause
anyone with PanicAlerts disabled to crash.
CoreTiming gets restored before ExpansionInterface so CoreTiming
events need to already be registered before the save state loading
begins. This means that the callbacks must be registered
unconditionally instead of on-demand.
Replace adhoc linked list with a priority heap. Performance
characteristics are mostly the same, but is more cache friendly.
[Priority Queues have O(log n) push/pop compared to the linked
list's O(n) push/O(1) pop but the queue is not big enough for
that to matter, so linear is faster over linked. Very slight gains
when framelimit is unlimited (Wind Waker), 1900% -> 1950%]
Some compilers don't have an automatic abs() overload for floats.
Doesn't really matter if they use the integer variant here, but
it's better to be explicit about the fact that we're using floats.