The compiler now can fully inline the co_switch, and with most registers being specified as clobbers and not saved explicitly, the compiler can choose to save only what it needs to (we don't have to defensively save everything).
Practically speaking, the co_switch calls are usually inlined, but the functions they're in don't seem to be that big and don't make direct use of r12..r15 too much anyway, so (push r12..r15, switch, pop r12..r15) is a common emit. But I see a miniscule FPS increase.
This replaces the old managed one. The only direct effect of this is to fix some hard to reproduce crashes in bsnes.
In the long run, we'll use this new code to help build more waterbox features.
Cores that used the .invisible section to store data were saving it; this was a regression from before, so PCFX states should be back down to the previous release size, or perhaps a bit smaller.
Add the ability to dirty track libco cothreads, as used in the bsnes core. This saves a lot of space in those states and they're now quite competitive in size.
some bsnes cothreads call callbacks that hit managed threads. We shouldn't do that, but we do, and sometimes those threads run MSVC's __stkchk which can, depending on circumstances, blow up if the thread extents aren't set.
This also means that we cannot save space on a lot of cothread stacks because __stkchck will blow up any detection guards we try
The waterbox system now uses host os facilities to track whether memory has been written to, to automatically choose what thing to savestate. This results in a large size decrease for some cores, like snes9x or gpgx (when running cartridge games). Doesn't do much for cores that were already memory efficient, or for bsnes because of libco compatibility issues; but those cores don't regress either.
Create an all new waterbox build environment:
WSL2 + Ubuntu 20.04 LTS (Other linuxes may work)
Musl libc with waterbox customizations
LLVM's libclang-rt, libunwind, libcxxabi, libcxx
Static linking to elf files
Compared with the old system, this is easier to set up a dev env for and easier to update in the future. The executables are larger but produce smaller savestates due to static linking. The modern toolchain means advanced library features and language features that sometimes appear in some upstream cores will be reusable.