The description in the previous commit is accurate, but the problem runs deeper and was on the whole a complete failure for me to appreciate the difference between active and swapped in on memoryblocks. Bleeecch.
This was broken by 175556529e, with two related issues: When we allowed for some operations to happen even when the block is inactive, we didn't account for the fact that in swapin, the block technically is not active yet (the lock is not on the self), and similarly in swapout, the lock has already been moved out of self. The former caused all memory areas to revert to RWX at the host OS level after a swap, so no dirty detection was done. After the former was fixed, the latter caused saved costacks to still get missed.
At the same time we ran into a perfect storm with costacks on Windows; if a stack page is not yet dirty, but we hit a fault for something else, Windows will not call our VEH handler unless the TIB stack extents are satisfactory, since it needs userspace to fix up the TIB extents via VEH or SEH handler, but there's already an exception pending.
The compiler now can fully inline the co_switch, and with most registers being specified as clobbers and not saved explicitly, the compiler can choose to save only what it needs to (we don't have to defensively save everything).
Practically speaking, the co_switch calls are usually inlined, but the functions they're in don't seem to be that big and don't make direct use of r12..r15 too much anyway, so (push r12..r15, switch, pop r12..r15) is a common emit. But I see a miniscule FPS increase.
This broke any waterbox core that called in to native code in the same EnterExit() right after sealing. All nyma cores were broken, 32x was not, didn't check the rest. Regressed in 175556529e.
It worked fine in release mode, theoretically
Set up a second mirror of guest memory; easily accomplished because we were already using memfd_create / CreateFileMappingW.
This lets us simplify a lot of host code that has to access guest memory that may not be active right now, or might have been mprotect()ed to something weird. Activate is only needed now to run guest code, or when the C# side wants to peer into guest memory for memory domains and such (waterboxhost does not share the mirror address with the C# side).
Waterbox guest code now runs on a stack inside the guest memory space. This removes some potential opportunities for nondeterminism and makes future porting of libco-enabled cores easier.
This replaces the old managed one. The only direct effect of this is to fix some hard to reproduce crashes in bsnes.
In the long run, we'll use this new code to help build more waterbox features.
Cores that used the .invisible section to store data were saving it; this was a regression from before, so PCFX states should be back down to the previous release size, or perhaps a bit smaller.
Add the ability to dirty track libco cothreads, as used in the bsnes core. This saves a lot of space in those states and they're now quite competitive in size.
some bsnes cothreads call callbacks that hit managed threads. We shouldn't do that, but we do, and sometimes those threads run MSVC's __stkchk which can, depending on circumstances, blow up if the thread extents aren't set.
This also means that we cannot save space on a lot of cothread stacks because __stkchck will blow up any detection guards we try
The waterbox system now uses host os facilities to track whether memory has been written to, to automatically choose what thing to savestate. This results in a large size decrease for some cores, like snes9x or gpgx (when running cartridge games). Doesn't do much for cores that were already memory efficient, or for bsnes because of libco compatibility issues; but those cores don't regress either.
and who said waterbox can't thread. well, it sort of can't. but it sort of can.
the speedup isn't that great, but speed is now pretty close (5%?) to snes9x in the only game that matters (final fantasy 5)
We get some small display timing fixes.
To get build working again, This reverts part of commit 0f687ff84e.
Not sure what was up with that commit. It's clearly non functional; some member names were changed in Blip_Buffer.h but there is no new checkin of Blip_Buffer.cpp at all. It's completely broken and I need it out so I can actually compile this again.
Nyma cores have to move some big complex structs on init and it's annoying and error prone. This solution is not fast, but these are one time transfers anyway, and it does keep code size and saved size down. Architecture yay.
This was benign, because libunwind will ignore a frame header it doesn't understand (__eh_frame was still fine). But now there's no spew in the console. And over the next 50 years it will save a combined 0.3 seconds of cpu time
Add 'nyma' project
The goal is to eventually update all of our Mednafen cores. For now, there is a work in progress import of the Mednafen pce core. Basic gameplay with hucard, turbocd, and supergrafix is supported, but the core is not complete yet.