bsnes/higan/sfc/sfc.hpp

61 lines
1.6 KiB
C++
Raw Normal View History

#pragma once
Update to v082r04 release. byuu says: So, here's the deal. I now have three emulators. I don't think the NES/GB ones are at all useful, but I do want them to be eventually. And having them have those pathetic little GUIs like ui-gameboy, and keeping everything in separate project folders, just doesn't work well for me. I kind of "got around" the issue with the Game Boy, by only allowing SGB mode emulation. But there is no "Super Nintendo" ... er ... wait ... uhmm ... well, you know what I mean anyway. So, my idea is to write a multi-emulator GUI, and keep the projects together. The GUI is not going to change much. The way I envision this working: At startup, you have a menubar with: "Cartridge, Settings, Tools, Help". Cartridge has "Load NES Cartridge", "Load SNES Cartridge", etc. When you load something, Cartridge is replaced with the appropriate system menu, eg "SNES". Here you have all your regular items: "power, reset, controller port selection, etc." There is also a new "Unload Cartridge" option, which is how you restore the "Cartridge" menu again. I have no plans to emulate any other systems, but if I ever do emulate something that doesn't take cartridges, I'll change the name to just "Load" or something. The cheat editor / state manager will look and act exactly the same. The settings panel will look exactly the same. I'll simply show/hide system-specific options as needed, like NES/SNES aspect ratio correction, etc. The input mapping window will just have settings for the currently loaded system. Video and audio tweaking will apply cross-system, as will hotkey mapping. The GUI stuff is mostly copy-paste, so it should only take me a week to get it 95% back to where it was, so don't worry, this isn't total GUI rewrite #80. I am, however, making all the objects pointers, so that I can destruct them all prior to main() returning, which is certainly one way of fixing that annoying Windows/Qt crash. Please only test on Linux. The Windows port is broken to hell, and will give you a bad impression of the idea: - menu groups are not hiding for some reason (all groups are showing, it looks hideous) - Timer interval(0) is taking 16ms per call, capping the FPS to ~64 tops [FWIW, bsnes/accuracy gets 130fps, bgameboy gets 450fps, bnes gets 800fps; all run at lowest possible granularity] - the OS keeps beeping when you press keys (AGAIN) Of course, Qt and GTK+ don't let you shrink a window from the requested geometry size, because they suck. So the video scaling stuff doesn't work all that great yet. Man, a metric fuckton of things need to be fixed in phoenix, and I really don't know how to fix any of them :/
2011-09-09 04:08:38 +00:00
//license: GPLv3
//started: 2004-10-14
#include <emulator/emulator.hpp>
#include <emulator/thread.hpp>
#include <emulator/scheduler.hpp>
#include <emulator/cheat.hpp>
#include <processor/arm/arm.hpp>
#include <processor/gsu/gsu.hpp>
#include <processor/hg51b/hg51b.hpp>
#include <processor/r65816/r65816.hpp>
#include <processor/spc700/spc700.hpp>
#include <processor/upd96050/upd96050.hpp>
#if defined(SFC_SUPERGAMEBOY)
#include <gb/gb.hpp>
#endif
namespace SuperFamicom {
#define platform Emulator::platform
namespace File = Emulator::File;
using Scheduler = Emulator::Scheduler;
using Cheat = Emulator::Cheat;
extern Scheduler scheduler;
extern Cheat cheat;
Update to v100r14 release. byuu says: (Windows: compile with -fpermissive to silence an annoying error. I'll fix it in the next WIP.) I completely replaced the time management system in higan and overhauled the scheduler. Before, processor threads would have "int64 clock"; and there would be a 1:1 relationship between two threads. When thread A ran for X cycles, it'd subtract X * B.Frequency from clock; and when thread B ran for Y cycles, it'd add Y * A.Frequency from clock. This worked well and allowed perfect precision; but it doesn't work when you have more complicated relationships: eg the 68K can sync to the Z80 and PSG; the Z80 to the 68K and PSG; so the PSG needs two counters. The new system instead uses a "uint64 clock" variable that represents time in attoseconds. Every time the scheduler exits, it subtracts the smallest clock count from all threads, to prevent an overflow scenario. The only real downside is that rounding errors mean that roughly every 20 minutes, we have a rounding error of one clock cycle (one 20,000,000th of a second.) However, this only applies to systems with multiple oscillators, like the SNES. And when you're in that situation ... there's no such thing as a perfect oscillator anyway. A real SNES will be thousands of times less out of spec than 1hz per 20 minutes. The advantages are pretty immense. First, we obviously can now support more complex relationships between threads. Second, we can build a much more abstracted scheduler. All of libco is now abstracted away completely, which may permit a state-machine / coroutine version of Thread in the future. We've basically gone from this: auto SMP::step(uint clocks) -> void { clock += clocks * (uint64)cpu.frequency; dsp.clock -= clocks; if(dsp.clock < 0 && !scheduler.synchronizing()) co_switch(dsp.thread); if(clock >= 0 && !scheduler.synchronizing()) co_switch(cpu.thread); } To this: auto SMP::step(uint clocks) -> void { Thread::step(clocks); synchronize(dsp); synchronize(cpu); } As you can see, we don't have to do multiple clock adjustments anymore. This is a huge win for the SNES CPU that had to update the SMP, DSP, all peripherals and all coprocessors. Likewise, we don't have to synchronize all coprocessors when one runs, now we can just synchronize the active one to the CPU. Third, when changing the frequencies of threads (think SGB speed setting modes, GBC double-speed mode, etc), it no longer causes the "int64 clock" value to be erroneous. Fourth, this results in a fairly decent speedup, mostly across the board. Aside from the GBA being mostly a wash (for unknown reasons), it's about an 8% - 12% speedup in every other emulation core. Now, all of this said ... this was an unbelievably massive change, so ... you know what that means >_> If anyone can help test all types of SNES coprocessors, and some other system games, it'd be appreciated. ---- Lastly, we have a bitchin' new about screen. It unfortunately adds ~200KiB onto the binary size, because the PNG->C++ header file transformation doesn't compress very well, and I want to keep the original resource files in with the higan archive. I might try some things to work around this file size increase in the future, but for now ... yeah, slightly larger archive sizes, sorry. The logo's a bit busted on Windows (the Label control's background transparency and alignment settings aren't working), but works well on GTK. I'll have to fix Windows before the next official release. For now, look on my Twitter feed if you want to see what it's supposed to look like. ---- EDIT: forgot about ICD2::Enter. It's doing some weird inverse run-to-save thing that I need to implement support for somehow. So, save states on the SGB core probably won't work with this WIP.
2016-07-30 03:56:12 +00:00
struct Thread : Emulator::Thread {
Update to v100r15 release. byuu wrote: Aforementioned scheduler changes added. Longer explanation of why here: http://hastebin.com/raw/toxedenece Again, we really need to test this as thoroughly as possible for regressions :/ This is a really major change that affects absolutely everything: all emulation cores, all coprocessors, etc. Also added ADDX and SUB to the 68K core, which brings us just barely above 50% of the instruction encoding space completed. [Editor's note: The "aformentioned scheduler changes" were described in a previous forum post: Unfortunately, 64-bits just wasn't enough precision (we were getting misalignments ~230 times a second on 21/24MHz clocks), so I had to move to 128-bit counters. This of course doesn't exist on 32-bit architectures (and probably not on all 64-bit ones either), so for now ... higan's only going to compile on 64-bit machines until we figure something out. Maybe we offer a "lower precision" fallback for machines that lack uint128_t or something. Using the booth algorithm would be way too slow. Anyway, the precision is now 2^-96, which is roughly 10^-29. That puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly referring to it as the byuusecond. The other 32-bits of precision allows a 1Hz clock to run up to one full second before all clocks need to be normalized to prevent overflow. I fixed a serious wobbling issue where I was using clock > other.clock for synchronization instead of clock >= other.clock; and also another aliasing issue when two threads share a common frequency, but don't run in lock-step. The latter I don't even fully understand, but I did observe it in testing. nall/serialization.hpp has been extended to support 128-bit integers, but without explicitly naming them (yay generic code), so nall will still compile on 32-bit platforms for all other applications. Speed is basically a wash now. FC's a bit slower, SFC's a bit faster. The "longer explanation" in the linked hastebin is: Okay, so the idea is that we can have an arbitrary number of oscillators. Take the SNES: - CPU/PPU clock = 21477272.727272hz - SMP/DSP clock = 24576000hz - Cartridge DSP1 clock = 8000000hz - Cartridge MSU1 clock = 44100hz - Controller Port 1 modem controller clock = 57600hz - Controller Port 2 barcode battler clock = 115200hz - Expansion Port exercise bike clock = 192000hz Is this a pathological case? Of course it is, but it's possible. The first four do exist in the wild already: see Rockman X2 MSU1 patch. Manifest files with higan let you specify any frequency you want for any component. The old trick higan used was to hold an int64 counter for each thread:thread synchronization, and adjust it like so: - if thread A steps X clocks; then clock += X * threadB.frequency - if clock >= 0; switch to threadB - if thread B steps X clocks; then clock -= X * threadA.frequency - if clock < 0; switch to threadA But there are also system configurations where one processor has to synchronize with more than one other processor. Take the Genesis: - the 68K has to sync with the Z80 and PSG and YM2612 and VDP - the Z80 has to sync with the 68K and PSG and YM2612 - the PSG has to sync with the 68K and Z80 and YM2612 Now I could do this by having an int64 clock value for every association. But these clock values would have to be outside the individual Thread class objects, and we would have to update every relationship's clock value. So the 68K would have to update the Z80, PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds per clock step event instead of one. As such, we have to account for both possibilities. The only way to do this is with a single time base. We do this like so: - setup: scalar = timeBase / frequency - step: clock += scalar * clocks Once per second, we look at every thread, find the smallest clock value. Then subtract that value from all threads. This prevents the clock counters from overflowing. Unfortunately, these oscillator values are psychotic, unpredictable, and often times repeating fractions. Even with a timeBase of 1,000,000,000,000,000,000 (one attosecond); we get rounding errors every ~16,300 synchronizations. Specifically, this happens with a CPU running at 21477273hz (rounded) and SMP running at 24576000hz. That may be good enough for most emulators, but ... you know how I am. Plus, even at the attosecond level, we're really pushing against the limits of 64-bit integers. Given the reciprocal inverse, a frequency of 1Hz (which does exist in higan!) would have a scalar that consumes 1/18th of the entire range of a uint64 on every single step. Yes, I could raise the frequency, and then step by that amount, I know. But I don't want to have weird gotchas like that in the scheduler core. Until I increase the accuracy to about 100 times greater than a yoctosecond, the rounding errors are too great. And since the only choice above 64-bit values is 128-bit values; we might as well use all the extra headroom. 2^-96 as a timebase gives me the ability to have both a 1Hz and 4GHz clock; and run them both for a full second; before an overflow event would occur. Another hastebin includes demonstration code: #include <libco/libco.h> #include <nall/nall.hpp> using namespace nall; // cothread_t mainThread = nullptr; const uint iterations = 100'000'000; const uint cpuFreq = 21477272.727272 + 0.5; const uint smpFreq = 24576000.000000 + 0.5; const uint cpuStep = 4; const uint smpStep = 5; // struct ThreadA { cothread_t handle = nullptr; uint64 frequency = 0; int64 clock = 0; auto create(auto (*entrypoint)() -> void, uint frequency) { this->handle = co_create(65536, entrypoint); this->frequency = frequency; this->clock = 0; } }; struct CPUA : ThreadA { static auto Enter() -> void; auto main() -> void; CPUA() { create(&CPUA::Enter, cpuFreq); } } cpuA; struct SMPA : ThreadA { static auto Enter() -> void; auto main() -> void; SMPA() { create(&SMPA::Enter, smpFreq); } } smpA; uint8 queueA[iterations]; uint offsetA; cothread_t resumeA = cpuA.handle; auto EnterA() -> void { offsetA = 0; co_switch(resumeA); } auto QueueA(uint value) -> void { queueA[offsetA++] = value; if(offsetA >= iterations) { resumeA = co_active(); co_switch(mainThread); } } auto CPUA::Enter() -> void { while(true) cpuA.main(); } auto CPUA::main() -> void { QueueA(1); smpA.clock -= cpuStep * smpA.frequency; if(smpA.clock < 0) co_switch(smpA.handle); } auto SMPA::Enter() -> void { while(true) smpA.main(); } auto SMPA::main() -> void { QueueA(2); smpA.clock += smpStep * cpuA.frequency; if(smpA.clock >= 0) co_switch(cpuA.handle); } // struct ThreadB { cothread_t handle = nullptr; uint128_t scalar = 0; uint128_t clock = 0; auto print128(uint128_t value) { string s; while(value) { s.append((char)('0' + value % 10)); value /= 10; } s.reverse(); print(s, "\n"); } //femtosecond (10^15) = 16306 //attosecond (10^18) = 688838 //zeptosecond (10^21) = 13712691 //yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble) //byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond) auto create(auto (*entrypoint)() -> void, uint128_t frequency) { this->handle = co_create(65536, entrypoint); uint128_t unitOfTime = 1; //for(uint n : range(29)) unitOfTime *= 10; unitOfTime <<= 96; //2^96 time units ... this->scalar = unitOfTime / frequency; print128(this->scalar); this->clock = 0; } auto step(uint128_t clocks) -> void { clock += clocks * scalar; } auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); } }; struct CPUB : ThreadB { static auto Enter() -> void; auto main() -> void; CPUB() { create(&CPUB::Enter, cpuFreq); } } cpuB; struct SMPB : ThreadB { static auto Enter() -> void; auto main() -> void; SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; } } smpB; auto correct() -> void { auto minimum = min(cpuB.clock, smpB.clock); cpuB.clock -= minimum; smpB.clock -= minimum; } uint8 queueB[iterations]; uint offsetB; cothread_t resumeB = cpuB.handle; auto EnterB() -> void { correct(); offsetB = 0; co_switch(resumeB); } auto QueueB(uint value) -> void { queueB[offsetB++] = value; if(offsetB >= iterations) { resumeB = co_active(); co_switch(mainThread); } } auto CPUB::Enter() -> void { while(true) cpuB.main(); } auto CPUB::main() -> void { QueueB(1); step(cpuStep); synchronize(smpB); } auto SMPB::Enter() -> void { while(true) smpB.main(); } auto SMPB::main() -> void { QueueB(2); step(smpStep); synchronize(cpuB); } // #include <nall/main.hpp> auto nall::main(string_vector) -> void { mainThread = co_active(); uint masterCounter = 0; while(true) { print(masterCounter++, " ...\n"); auto A = clock(); EnterA(); auto B = clock(); print((double)(B - A) / CLOCKS_PER_SEC, "s\n"); auto C = clock(); EnterB(); auto D = clock(); print((double)(D - C) / CLOCKS_PER_SEC, "s\n"); for(uint n : range(iterations)) { if(queueA[n] != queueB[n]) return print("fail at ", n, "\n"); } } } ...and that's everything.]
2016-07-31 02:11:20 +00:00
auto create(auto (*entrypoint)() -> void, double frequency) -> void {
Emulator::Thread::create(entrypoint, frequency);
scheduler.append(*this);
}
inline auto synchronize(Thread& thread) -> void {
if(clock() >= thread.clock()) scheduler.resume(thread);
}
Update to v098r01 release. byuu says: Changelog: - SFC: balanced profile removed - SFC: performance profile removed - SFC: code for handling non-threaded CPU, SMP, DSP, PPU removed - SFC: Coprocessor, Controller (and expansion port) shared Thread code merged to SFC::Cothread - Cothread here just means "Thread with CPU affinity" (couldn't think of a better name, sorry) - SFC: CPU now has vector<Thread*> coprocessors, peripherals; - this is the beginning of work to allow expansion port devices to be dynamically changed at run-time - ruby: all audio drivers default to 48000hz instead of 22050hz now if no frequency is assigned - note: the WASAPI driver can default to whatever the native frequency is; doesn't have to be 48000hz - tomoko: removed the ability to change the frequency from the UI (but it will display the frequency used) - tomoko: removed the timing settings panel - the goal is to work toward smooth video via adaptive sync - the model is broken by not being in control of the audio frequency anyway - it's further broken by PAL running at 50hz and WSC running at 75hz - it was always broken anyway by SNES interlace timing varying from progressive timing - higan: audio/ stub created (for now, it's just nall/dsp/ moved here and included as a header) - higan: video/ stub created - higan/GNUmakefile: now includes build rules for essential components (libco, emulator, audio, video) The audio changes are in preparation to merge wareya's awesome WASAPI work without the need for the nall/dsp resampler.
2016-04-09 03:40:12 +00:00
};
#include <sfc/memory/memory.hpp>
#include <sfc/ppu/counter/counter.hpp>
Updated to 20100813 release. byuu says: Since we're now talking about three splits, that's getting a bit out of hand. This WIP combines everything back into one project again. Added the src/fast folder that has all the speed-oriented cores. A slight slowdown to csnes from what it was before, I'm using blargg's accurate DSP. I just don't like the idea of releasing a less accurate DSP core than Snes9X v1.52 has. Plus the fast DSP core doesn't serialize yet. I moved back to snes_spc 0.9.0 because I care more about Tales and Star Ocean than I do about Earthworm Jim 2. So if you try EWJ2 on csnes, expect it to sound like it does on Snes9X. In other words, don't wear headphones if you value your hearing. The middle-of-the-road bsnes core uses blargg's accurate DSP, because it's about 3% faster than mine which removes all of blargg's optimizations. There is absolutely no accuracy loss here. bsnes v067.20 that is included should be equal to v067 official. Performance: Code: asnes = 58fps bsnes = 172fps +2.97x csnes = 274fps +1.59x +4.72x The binaries are not profiled, so that's an additional 15% slower from the previous builds. Save states only work on asnes, as I don't know how to serialize blargg's cores yet. The copy_func thing is very confusing to me for some reason. The debugger won't work anywhere. Outside of that, please go ahead and bug test. Once I get the debugger and save states working, I'll build some profiled v1.0 releases for all three, and we can test that for a bit and then release.
2010-10-20 11:20:39 +00:00
Update to v098r01 release. byuu says: Changelog: - SFC: balanced profile removed - SFC: performance profile removed - SFC: code for handling non-threaded CPU, SMP, DSP, PPU removed - SFC: Coprocessor, Controller (and expansion port) shared Thread code merged to SFC::Cothread - Cothread here just means "Thread with CPU affinity" (couldn't think of a better name, sorry) - SFC: CPU now has vector<Thread*> coprocessors, peripherals; - this is the beginning of work to allow expansion port devices to be dynamically changed at run-time - ruby: all audio drivers default to 48000hz instead of 22050hz now if no frequency is assigned - note: the WASAPI driver can default to whatever the native frequency is; doesn't have to be 48000hz - tomoko: removed the ability to change the frequency from the UI (but it will display the frequency used) - tomoko: removed the timing settings panel - the goal is to work toward smooth video via adaptive sync - the model is broken by not being in control of the audio frequency anyway - it's further broken by PAL running at 50hz and WSC running at 75hz - it was always broken anyway by SNES interlace timing varying from progressive timing - higan: audio/ stub created (for now, it's just nall/dsp/ moved here and included as a header) - higan: video/ stub created - higan/GNUmakefile: now includes build rules for essential components (libco, emulator, audio, video) The audio changes are in preparation to merge wareya's awesome WASAPI work without the need for the nall/dsp resampler.
2016-04-09 03:40:12 +00:00
#include <sfc/cpu/cpu.hpp>
#include <sfc/smp/smp.hpp>
#include <sfc/dsp/dsp.hpp>
#include <sfc/ppu/ppu.hpp>
Updated to 20100813 release. byuu says: Since we're now talking about three splits, that's getting a bit out of hand. This WIP combines everything back into one project again. Added the src/fast folder that has all the speed-oriented cores. A slight slowdown to csnes from what it was before, I'm using blargg's accurate DSP. I just don't like the idea of releasing a less accurate DSP core than Snes9X v1.52 has. Plus the fast DSP core doesn't serialize yet. I moved back to snes_spc 0.9.0 because I care more about Tales and Star Ocean than I do about Earthworm Jim 2. So if you try EWJ2 on csnes, expect it to sound like it does on Snes9X. In other words, don't wear headphones if you value your hearing. The middle-of-the-road bsnes core uses blargg's accurate DSP, because it's about 3% faster than mine which removes all of blargg's optimizations. There is absolutely no accuracy loss here. bsnes v067.20 that is included should be equal to v067 official. Performance: Code: asnes = 58fps bsnes = 172fps +2.97x csnes = 274fps +1.59x +4.72x The binaries are not profiled, so that's an additional 15% slower from the previous builds. Save states only work on asnes, as I don't know how to serialize blargg's cores yet. The copy_func thing is very confusing to me for some reason. The debugger won't work anywhere. Outside of that, please go ahead and bug test. Once I get the debugger and save states working, I'll build some profiled v1.0 releases for all three, and we can test that for a bit and then release.
2010-10-20 11:20:39 +00:00
#include <sfc/controller/controller.hpp>
Update to v098r03 release. byuu says: It took several hours, but I've rebuilt much of the SNES' bus memory mapping architecture. The new design unifies the cartridge string-based mapping ("00-3f,80-bf:8000-ffff") and internal bus.map calls. The map() function now has an accompanying unmap() function, and instead of a fixed 256 callbacks, it'll scan to find the first available slot. unmap() will free slots up when zero addresses reference a given slot. The controllers and expansion port are now both entirely dynamic. Instead of load/unload/power/reset, they only have the constructor (power/reset/load) and destructor (unload). What this means is you can now dynamically change even expansion port devices after the system is loaded. Note that this is incredibly dangerous and stupid, but ... oh well. The whole point of this was for 21fx. There's no way to change the expansion port device prior to loading a game, but if the 21fx isn't active, then the reset vector hijack won't work. Now you can load a 21fx game, change the expansion port device, and simply reset the system to active the device. The unification of design between controller port devices and expansion port devices is nice, and overall this results in a reduction of code (all of the Mapping stuff in Cartridge is gone, replaced with direct bus mapping.) And there's always the potential to expand this system more in the future now. The big missing feature right now is the ability to push/pop mappings. So if you look at how the 21fx does the reset vector, you might vomit a little bit. But ... it works. Also changed exit(0) to _exit(0) in the POSIX version of nall::execute. [The _exit(0) thing is an attempt to make higan not crash when it tries to launch icarus and it's not on $PATH. The theory is that higan forks, then the child tries to exec icarus and fails, so it exits, all the unique_ptrs clean up their resources and tell the X server to free things the parent process is still using. Calling _exit() prevents destructors from running, and seems to prevent the problem. -Ed.]
2016-04-09 10:21:18 +00:00
#include <sfc/expansion/expansion.hpp>
#include <sfc/system/system.hpp>
#include <sfc/coprocessor/coprocessor.hpp>
#include <sfc/slot/slot.hpp>
#include <sfc/cartridge/cartridge.hpp>
#include <sfc/memory/memory-inline.hpp>
#include <sfc/ppu/counter/counter-inline.hpp>
}
#include <sfc/interface/interface.hpp>