mirror of https://github.com/bsnes-emu/bsnes.git
6 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Tim Allen | c2975e6898 |
Update to v103r25 release.
byuu says: Changelog: - gb/cpu: force STAT mode to 0 when LCD is disabled (fixes Pokemon Pinball, etc) - gb/ppu: when LCD is disabled, require at least one-frame wait to re-enable, display white during this time - todo: should step by a scanline at a time: worst-case is an extra 99% of a frame to enable again - gba/ppu: cache tilemap lookups and attribute parsing - it's more accurate because the GBA wouldn't read this for every pixel - but unfortunately, this didn't provide any speedup at all ... sigh - ruby/audio/alsa: fixed const issue with free() - ruby/video/cgl: removed `glDisable(GL_ALPHA_TEST)` [deprecated] - ruby/video/cgl: removed `glEnable(GL_TEXTURE_2D)` [unnecessary as we use shaders] - processor/lr35902: started rewrite¹ ¹: so, the Game Boy and Game Boy Color cores will be completely broken for at least the next two or three WIPs. The old LR35902 was complete garbage, written in early 2011. So I'm rewriting it to provide a massive cleanup and consistency with other processor cores, especially the Z80 core. I've got about 85% of the main instructions implemented, and then I have to do the CB instructions. The CB instructions are easier because they're mostly just a small number of opcodes in many small variations, but it'll still be tedious. |
|
Tim Allen | 571760c747 |
Update to v103r24 release.
byuu says: Changelog: - gb/mbc6: mapper is now functional, but Net de Get has some text corruption¹ - gb/mbc7: mapper is now functional² - gb/cpu: HDMA syncs other components after each byte transfer now - gb/ppu: LY,LX forced to zero when LCDC.d7 is lowered (eg disabled), not when it's raised (eg enabled) - gb/ppu: the LCD does not run at all when LCDC.d7 is clear³ - fixes graphical corruption between scene transitions in Legend of Zelda - Oracle of Ages - thanks to Cydrak, Shonumi, gekkio for their input on the cause of this issue - md/controller: renamed "Gamepad" to "Control Pad" per official terminology - md/controller: added "Fighting Pad" (6-button controller) emulation [hex\_usr] - processor/m68k: fixed TAS to set data.d7 when EA.mode==DataRegisterDirect; fixes Asterix - hiro/windows: removed carriage returns from mouse.cpp and desktop.cpp - ruby/audio/alsa: added device driver selection [SuperMikeMan] - ruby/audio/ao: set format.matrix=nullptr to prevent a crash on some systems [SuperMikeMan] - ruby/video/cgl: rename term() to terminate() to fix a crash on macOS [Sintendo] ¹: The observation that this mapper split $4000-7fff into two banks came from MAME's implementation. But their implementation was quite broken and incomplete, so I didn't actually use any of it. The observation that this mapper split $a000-bfff into two banks came from Tauwasser, and I did directly use that information, plus the knowledge that $0400/$0800 are the RAM bank select registers. The text corruption is due to a race condition with timing. The game is transferring font letters via HDMA, but the game code ends up setting the bank# with the font a bit too late after the HDMA has already occurred. I'm not sure how to fix this ... as a whole, I assumed my Game Boy timing was pretty good, but apparently it's not that good. ²: The entire design of this mapper comes from endrift's notes. endrift gets full credit for higan being able to emulate this mapper. Note that the accelerometer implementation is still not tested, and probably won't work right until I tweak the sensitivity a lot. ³: So the fun part of this is ... it breaks the strict 60fps rate of the Game Boy. This was always inevitable: certain timing conditions can stretch frames, too. But this is pretty much an absolute deal breaker for something like Vsync timing. This pretty much requires adaptive sync to run well without audio stuttering during the transition. There's currently one very important detail missing: when the LCD is turned off, presumably the image on the screen fades to white. I do not know how long this process takes, or how to really go about emulating it. Right now as an incomplete patch, I'm simply leaving the last displayed image on the screen until the LCD is turned on again. But I will have to output white, as well as add code to break out of the emulation loop periodically when the LCD is left off eg indefinitely, or bad things would happen. I'll work something out and then implement. Another detail is I'm not sure how long it takes for the LCD to start rendering again once enabled. Right now, it's immediate. I've heard it's as long as 1/60th of a second, but that really seems incredibly excessive? I'd like to know at least a reasonably well-supported estimate before I implement that. |
|
Tim Allen | ee7662a8be |
Update to v102r04 release.
byuu says: Changelog: - Super Game Boy support is functional once again - new GameBoy::SuperGameBoyInterface class - system.(dmg,cgb,sgb) is now Model::(Super)GameBoy(Color) ala the PC Engine - merged WonderSwanInterface, WonderSwanColorInterface shared functions to WonderSwan::Interface - merged GameBoyInterface, GameBoyColorInterface shared functions to GameBoy::Interface - Interface::unload() now calls Interface::save() for Master System, Game Gear, Mega Drive, PC Engine, SuperGrafx - PCE: emulated PCE-CD backup RAM; stored per-game as save.ram (2KiB file) - this means you can now save your progress in games like Neutopia - the PCE-CD I/O registers like BRAM write protect are not emulated yet - PCE: IRQ sources now hold the IRQ line state, instead of the CPU holding it - this fixes most SuperGrafx games, which were fighting over the VDC IRQ line previously - PCE: CPU I/O $14xx should return the pending IRQ bits even if IRQs are disabled - PCE: VCE and the VDCs now synchronize to each other; fixes pixel widths in all games - PCE: greatly increased the accuracy of the VPC priority selection code (windows may be buggy still) - HuC6280: PLA, PLX, PLY should set Z, N flags; fixes many game bugs [Jonas Quinn] The big thing I wanted to do was enslave the VDC(s) to the VCE. But unfortunately, I forgot about the asynchronous DMA channels that each VDC supports, so this isn't going to be possible I'm afraid. In the most demanding case, Daimakaimura in-game, we're looking at 85fps on my Xeon E3 1276v3. So ... not great, and we don't even have sound connected yet. We are going to have to profile and optimize this code once sound emulation and save states are in. Basically, think of it like this: the VCE, VDC0, and VDC1 all have the same overhead, scheduling wise (which is the bulk of the performance loss) as the dot-renderer for the SNES core. So it's like there's three bsnes-accuracy PPU threads running just for video. ----- Oh, just a fair warning ... the hooks for the SGB are a work in progress. If anyone is working on higan or a fork and want to do something similar to it, don't use it as a template, at least not yet. Right now, higan looks like this: - Emulator::Video handles the platform→videoRefresh calls - Emulator::Audio handles the platform→audioSample calls - each core hard-codes the platform→inputPoll, inputRumble calls - each core hard-codes calls to path, open, load to process files - dipSettings and notify are specialty hacks, neither are even hooked up right now to anything With the SGB, it's an emulation core inside an emulation core, so ideally you want to hook all of those functions. Emulator::Video and Emulator::Audio aren't really abstractions over that, as the GB core calls them and we have to special case not calling them in SGB mode. The path, open, load can be implemented without hooks, thanks to the UI only using one instance of Emulator::Platform for all cores. All we have to do is override the folder path ID for the "Game Boy.sys" folder, so that it picks "Super Game Boy.sfc/" and loads its boot ROM instead. That's just a simple argument to GameBoy::System::load() and we're done. dipSettings, notify and inputRumble don't matter. But we do also have to hook inputPoll as well. The nice idea would be for SuperFamicom::ICD2 to inherit from Emulator::Platform and provide the desired functions that we need to overload. After that, we'd just need the GB core to keep an abstraction over the global Emulator::platform\* handle, to select between the UI version and the SFC::ICD2 version. However ... that doesn't work because of Emulator::Video and Emulator::Audio. They would also have to gain an abstraction over Emulator::platform\*, and even worse ... you'd have to constantly swap between the two so that the SFC core uses the UI, and the GB core uses the ICD2. And so, for right now, I'm checking Model::SuperGameBoy() -> bool everywhere, and choosing between the UI and ICD2 targets that way. And as such, the ICD2 doesn't really need Emulator::Platform inheritance, although it certainly could do that and just use the functions it needs. But the SGB is even weirder, because we need additional new signals beyond just Emulator::Platform, like joypWrite(), etc. I'd also like to work on the Emulator::Stream for the SGB core. I don't see why we can't have the GB core create its own stream, and let the ICD2 just use that instead. We just have to be careful about the ICD2's CPU soft reset function, to make sure the GB core's Stream object remains valid. What I think that needs is a way to release an Emulator::Stream individually, rather than calling Emulator::Audio::reset() to do it. They are shared\_pointer objects, so I think if I added a destructor function to remove it from Emulator::Audio::streams, then that should work. |
|
Tim Allen | c50723ef61 |
Update to v100r15 release.
byuu wrote: Aforementioned scheduler changes added. Longer explanation of why here: http://hastebin.com/raw/toxedenece Again, we really need to test this as thoroughly as possible for regressions :/ This is a really major change that affects absolutely everything: all emulation cores, all coprocessors, etc. Also added ADDX and SUB to the 68K core, which brings us just barely above 50% of the instruction encoding space completed. [Editor's note: The "aformentioned scheduler changes" were described in a previous forum post: Unfortunately, 64-bits just wasn't enough precision (we were getting misalignments ~230 times a second on 21/24MHz clocks), so I had to move to 128-bit counters. This of course doesn't exist on 32-bit architectures (and probably not on all 64-bit ones either), so for now ... higan's only going to compile on 64-bit machines until we figure something out. Maybe we offer a "lower precision" fallback for machines that lack uint128_t or something. Using the booth algorithm would be way too slow. Anyway, the precision is now 2^-96, which is roughly 10^-29. That puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly referring to it as the byuusecond. The other 32-bits of precision allows a 1Hz clock to run up to one full second before all clocks need to be normalized to prevent overflow. I fixed a serious wobbling issue where I was using clock > other.clock for synchronization instead of clock >= other.clock; and also another aliasing issue when two threads share a common frequency, but don't run in lock-step. The latter I don't even fully understand, but I did observe it in testing. nall/serialization.hpp has been extended to support 128-bit integers, but without explicitly naming them (yay generic code), so nall will still compile on 32-bit platforms for all other applications. Speed is basically a wash now. FC's a bit slower, SFC's a bit faster. The "longer explanation" in the linked hastebin is: Okay, so the idea is that we can have an arbitrary number of oscillators. Take the SNES: - CPU/PPU clock = 21477272.727272hz - SMP/DSP clock = 24576000hz - Cartridge DSP1 clock = 8000000hz - Cartridge MSU1 clock = 44100hz - Controller Port 1 modem controller clock = 57600hz - Controller Port 2 barcode battler clock = 115200hz - Expansion Port exercise bike clock = 192000hz Is this a pathological case? Of course it is, but it's possible. The first four do exist in the wild already: see Rockman X2 MSU1 patch. Manifest files with higan let you specify any frequency you want for any component. The old trick higan used was to hold an int64 counter for each thread:thread synchronization, and adjust it like so: - if thread A steps X clocks; then clock += X * threadB.frequency - if clock >= 0; switch to threadB - if thread B steps X clocks; then clock -= X * threadA.frequency - if clock < 0; switch to threadA But there are also system configurations where one processor has to synchronize with more than one other processor. Take the Genesis: - the 68K has to sync with the Z80 and PSG and YM2612 and VDP - the Z80 has to sync with the 68K and PSG and YM2612 - the PSG has to sync with the 68K and Z80 and YM2612 Now I could do this by having an int64 clock value for every association. But these clock values would have to be outside the individual Thread class objects, and we would have to update every relationship's clock value. So the 68K would have to update the Z80, PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds per clock step event instead of one. As such, we have to account for both possibilities. The only way to do this is with a single time base. We do this like so: - setup: scalar = timeBase / frequency - step: clock += scalar * clocks Once per second, we look at every thread, find the smallest clock value. Then subtract that value from all threads. This prevents the clock counters from overflowing. Unfortunately, these oscillator values are psychotic, unpredictable, and often times repeating fractions. Even with a timeBase of 1,000,000,000,000,000,000 (one attosecond); we get rounding errors every ~16,300 synchronizations. Specifically, this happens with a CPU running at 21477273hz (rounded) and SMP running at 24576000hz. That may be good enough for most emulators, but ... you know how I am. Plus, even at the attosecond level, we're really pushing against the limits of 64-bit integers. Given the reciprocal inverse, a frequency of 1Hz (which does exist in higan!) would have a scalar that consumes 1/18th of the entire range of a uint64 on every single step. Yes, I could raise the frequency, and then step by that amount, I know. But I don't want to have weird gotchas like that in the scheduler core. Until I increase the accuracy to about 100 times greater than a yoctosecond, the rounding errors are too great. And since the only choice above 64-bit values is 128-bit values; we might as well use all the extra headroom. 2^-96 as a timebase gives me the ability to have both a 1Hz and 4GHz clock; and run them both for a full second; before an overflow event would occur. Another hastebin includes demonstration code: #include <libco/libco.h> #include <nall/nall.hpp> using namespace nall; // cothread_t mainThread = nullptr; const uint iterations = 100'000'000; const uint cpuFreq = 21477272.727272 + 0.5; const uint smpFreq = 24576000.000000 + 0.5; const uint cpuStep = 4; const uint smpStep = 5; // struct ThreadA { cothread_t handle = nullptr; uint64 frequency = 0; int64 clock = 0; auto create(auto (*entrypoint)() -> void, uint frequency) { this->handle = co_create(65536, entrypoint); this->frequency = frequency; this->clock = 0; } }; struct CPUA : ThreadA { static auto Enter() -> void; auto main() -> void; CPUA() { create(&CPUA::Enter, cpuFreq); } } cpuA; struct SMPA : ThreadA { static auto Enter() -> void; auto main() -> void; SMPA() { create(&SMPA::Enter, smpFreq); } } smpA; uint8 queueA[iterations]; uint offsetA; cothread_t resumeA = cpuA.handle; auto EnterA() -> void { offsetA = 0; co_switch(resumeA); } auto QueueA(uint value) -> void { queueA[offsetA++] = value; if(offsetA >= iterations) { resumeA = co_active(); co_switch(mainThread); } } auto CPUA::Enter() -> void { while(true) cpuA.main(); } auto CPUA::main() -> void { QueueA(1); smpA.clock -= cpuStep * smpA.frequency; if(smpA.clock < 0) co_switch(smpA.handle); } auto SMPA::Enter() -> void { while(true) smpA.main(); } auto SMPA::main() -> void { QueueA(2); smpA.clock += smpStep * cpuA.frequency; if(smpA.clock >= 0) co_switch(cpuA.handle); } // struct ThreadB { cothread_t handle = nullptr; uint128_t scalar = 0; uint128_t clock = 0; auto print128(uint128_t value) { string s; while(value) { s.append((char)('0' + value % 10)); value /= 10; } s.reverse(); print(s, "\n"); } //femtosecond (10^15) = 16306 //attosecond (10^18) = 688838 //zeptosecond (10^21) = 13712691 //yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble) //byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond) auto create(auto (*entrypoint)() -> void, uint128_t frequency) { this->handle = co_create(65536, entrypoint); uint128_t unitOfTime = 1; //for(uint n : range(29)) unitOfTime *= 10; unitOfTime <<= 96; //2^96 time units ... this->scalar = unitOfTime / frequency; print128(this->scalar); this->clock = 0; } auto step(uint128_t clocks) -> void { clock += clocks * scalar; } auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); } }; struct CPUB : ThreadB { static auto Enter() -> void; auto main() -> void; CPUB() { create(&CPUB::Enter, cpuFreq); } } cpuB; struct SMPB : ThreadB { static auto Enter() -> void; auto main() -> void; SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; } } smpB; auto correct() -> void { auto minimum = min(cpuB.clock, smpB.clock); cpuB.clock -= minimum; smpB.clock -= minimum; } uint8 queueB[iterations]; uint offsetB; cothread_t resumeB = cpuB.handle; auto EnterB() -> void { correct(); offsetB = 0; co_switch(resumeB); } auto QueueB(uint value) -> void { queueB[offsetB++] = value; if(offsetB >= iterations) { resumeB = co_active(); co_switch(mainThread); } } auto CPUB::Enter() -> void { while(true) cpuB.main(); } auto CPUB::main() -> void { QueueB(1); step(cpuStep); synchronize(smpB); } auto SMPB::Enter() -> void { while(true) smpB.main(); } auto SMPB::main() -> void { QueueB(2); step(smpStep); synchronize(cpuB); } // #include <nall/main.hpp> auto nall::main(string_vector) -> void { mainThread = co_active(); uint masterCounter = 0; while(true) { print(masterCounter++, " ...\n"); auto A = clock(); EnterA(); auto B = clock(); print((double)(B - A) / CLOCKS_PER_SEC, "s\n"); auto C = clock(); EnterB(); auto D = clock(); print((double)(D - C) / CLOCKS_PER_SEC, "s\n"); for(uint n : range(iterations)) { if(queueA[n] != queueB[n]) return print("fail at ", n, "\n"); } } } ...and that's everything.] |
|
Tim Allen | ca277cd5e8 |
Update to v100r14 release.
byuu says: (Windows: compile with -fpermissive to silence an annoying error. I'll fix it in the next WIP.) I completely replaced the time management system in higan and overhauled the scheduler. Before, processor threads would have "int64 clock"; and there would be a 1:1 relationship between two threads. When thread A ran for X cycles, it'd subtract X * B.Frequency from clock; and when thread B ran for Y cycles, it'd add Y * A.Frequency from clock. This worked well and allowed perfect precision; but it doesn't work when you have more complicated relationships: eg the 68K can sync to the Z80 and PSG; the Z80 to the 68K and PSG; so the PSG needs two counters. The new system instead uses a "uint64 clock" variable that represents time in attoseconds. Every time the scheduler exits, it subtracts the smallest clock count from all threads, to prevent an overflow scenario. The only real downside is that rounding errors mean that roughly every 20 minutes, we have a rounding error of one clock cycle (one 20,000,000th of a second.) However, this only applies to systems with multiple oscillators, like the SNES. And when you're in that situation ... there's no such thing as a perfect oscillator anyway. A real SNES will be thousands of times less out of spec than 1hz per 20 minutes. The advantages are pretty immense. First, we obviously can now support more complex relationships between threads. Second, we can build a much more abstracted scheduler. All of libco is now abstracted away completely, which may permit a state-machine / coroutine version of Thread in the future. We've basically gone from this: auto SMP::step(uint clocks) -> void { clock += clocks * (uint64)cpu.frequency; dsp.clock -= clocks; if(dsp.clock < 0 && !scheduler.synchronizing()) co_switch(dsp.thread); if(clock >= 0 && !scheduler.synchronizing()) co_switch(cpu.thread); } To this: auto SMP::step(uint clocks) -> void { Thread::step(clocks); synchronize(dsp); synchronize(cpu); } As you can see, we don't have to do multiple clock adjustments anymore. This is a huge win for the SNES CPU that had to update the SMP, DSP, all peripherals and all coprocessors. Likewise, we don't have to synchronize all coprocessors when one runs, now we can just synchronize the active one to the CPU. Third, when changing the frequencies of threads (think SGB speed setting modes, GBC double-speed mode, etc), it no longer causes the "int64 clock" value to be erroneous. Fourth, this results in a fairly decent speedup, mostly across the board. Aside from the GBA being mostly a wash (for unknown reasons), it's about an 8% - 12% speedup in every other emulation core. Now, all of this said ... this was an unbelievably massive change, so ... you know what that means >_> If anyone can help test all types of SNES coprocessors, and some other system games, it'd be appreciated. ---- Lastly, we have a bitchin' new about screen. It unfortunately adds ~200KiB onto the binary size, because the PNG->C++ header file transformation doesn't compress very well, and I want to keep the original resource files in with the higan archive. I might try some things to work around this file size increase in the future, but for now ... yeah, slightly larger archive sizes, sorry. The logo's a bit busted on Windows (the Label control's background transparency and alignment settings aren't working), but works well on GTK. I'll have to fix Windows before the next official release. For now, look on my Twitter feed if you want to see what it's supposed to look like. ---- EDIT: forgot about ICD2::Enter. It's doing some weird inverse run-to-save thing that I need to implement support for somehow. So, save states on the SGB core probably won't work with this WIP. |
|
Tim Allen | 67457fade4 |
Update to v099r13 release.
byuu says: Changelog: - GB core code cleanup completed - GBA core code cleanup completed - some more cleanup on missed processor/arm functions/variables - fixed FC loading icarus bug - "Load ROM File" icarus functionality restored - minor code unification efforts all around (not perfect yet) - MMIO->IO - mmio.cpp->io.cpp - read,write->readIO,writeIO It's been a very long work in progress ... starting all the way back with v094r09, but the major part of the higan code cleanup is now completed! Of course, it's very important to note that this is only for the basic style: - under_score functions and variables are now camelCase - return-type function-name() are now auto function-name() -> return-type - Natural<T>/Integer<T> replace (u)intT_n types where possible - signed/unsigned are now int/uint - most of the x==true,x==false tests changed to x,!x A lot of spot improvements to consistency, simplicity and quality have gone in along the way, of course. But we'll probably never fully finishing beautifying every last line of code in the entire codebase. Still, this is a really great start. Going forward, WIP diffs should start being smaller and of higher quality once again. I know the joke is, "until my coding style changes again", but ... this was way too stressful, way too time consuming, and way too risky. I'm too old and tired now for extreme upheavel like this again. The only major change I'm slowly mulling over would be renaming the using Natural<T>/Integer<T> = (u)intT; shorthand to something that isn't as easily confused with the (u)int_t types ... but we'll see. I'll definitely continue to change small things all the time, but for the larger picture, I need to just accept the style I have and live with it. |