Improves thread-switching performance using the following techniques:
- Interleave stores/loads
- Restore the stack pointer and link register as early as possible
The calling convention specifies that d8-d15 (the lower 64 bits of
vector registers q8-q15) are callee-saved. However, libco was
erroneously saving and restoring general-purpose registers x8-x15
instead.
SFC: disable math color bleed for first pixel
(fixes green line on the left-edge of Jurassic Park)
SFC/GG: attach Screen settings to Screen node, not PPU/VDP node
(fixes remembering Screen settings)
byuu says:
I finally pass blargg's dmg-sound and cgb-sound tests, but at quite a cost.
Reads and writes can't happen on an exact T-cycle (clock cycle) point within an
M-cycle (opcode cycle) for the DMG. Writes to trigger take effect two clocks
after writes to wave RAM, for instance. Probably going to be a lot more of this
in low-level PPU emulation, so I'm biting the bullet and slowly converting the
Game Boy bus handler to this new format, which I'll use as a test bench for
doing this later to other systems with, since Game Boy performance isn't as
critical (it's a drop from 220fps to 200fps to have to poll the bus four times
per memory access and synchronize the CPU four times as often, so a lot less bad
than I'd feared at least.)