mirror of https://github.com/bsnes-emu/bsnes.git
528 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Tim Allen | f7ddbfc462 |
Update to v101r11 release.
byuu says: Changelog: - 68K: fixed NEG/NEGX operand order - 68K: fixed bug in disassembler that was breaking trace logging - VDP: improved sprite rendering (still 100% broken) - VDP: added horizontal/vertical scrolling (90% broken) Forgot: - 68K: fix extension word sign bit on indexed modes for disassembler as well - 68K: emulate STOP properly (use r.stop flag; clear on IRQs firing) I'm really wearing out fast here. The Genesis documentation is somehow even worse than Game Boy documentation, but this is a far more complex system. It's a massive time sink to sit here banging away at every possible combination of how things could work, only to see no positive improvements. Nothing I do seems to get sprites to do a goddamn thing. squee says the sprite Y field is 10-bits, X field is 9-bits. genvdp says they're both 10-bits. BlastEm treats them like they're both 10-bits, then masks off the upper bit so it's effectively 9-bits anyway. Nothing ever bothers to tell you whether the horizontal scroll values are supposed to add or subtract from the current X position. Probably the most basic detail you could imagine for explaining horizontal scrolling and yet ... nope. Nothing. I can't even begin to understand how the VDP FIFO functionality works, or what the fuck is meant by "slots". I'm completely at a loss as how how in the holy hell the 68K works with 8-bit accesses. I don't know whether I need byte/word handlers for every device, or if I can just hook it right into the 68K core itself. This one's probably the most major design detail. I need to know this before I go and implement the PSG/YM2612/IO ports-\>gamepads/Z80/etc. Trying to debug the 68K is murder because basically every game likes to start with a 20,000,000-instruction reset phase of checksumming entire games, and clearing out the memory as agonizingly slowly as humanly possible. And like the ARM, there's too many registers so I'd need three widescreen monitors to comfortably view the entire debugger output lines onscreen. I can't get any test ROMs to debug functionality outside of full games because every **goddamned** test ROM coder thinks it's acceptable to tell people to go fetch some toolchain from a link that died in the late '90s and only works on MS-DOS 6.22 to build their fucking shit, because god forbid you include a 32KiB assembled ROM image in your fucking archives. ... I may have to take a break for a while. We'll see. |
|
Tim Allen | 0b70a01b47 |
Update to v101r10 release.
byuu says: Changelog: - 68K: MOVEQ is 8-bit signed - 68K: disassembler was print EOR for OR instructions - 68K: address/program-counter indexed mode had the signed-word/long bit backward - 68K: ADDQ/SUBQ #n,aN always works in long mode; regardless of size - 68K→VDP DMA needs to use `mode.bit(0)<<22|dmaSource`; increment by one instead of two - Z80: added registers and initial two instructions - MS: hooked up enough to load and start running games - Sonic the Hedgehog can execute exactly one instruction... whoo. |
|
Tim Allen | 4d2e17f9c0 |
Update to v101r09 release.
byuu says: Sorry, two WIPs in one day. Got excited and couldn't wait. Changelog: - ADDQ, SUBQ shouldn't update flags when targeting an address register - ADDA should sign extend effective address reads - JSR was pushing the PC too early - some improvements to 8-bit register reads on the VDP (still needs work) - added H/V counter reads to the VDP IO port region - icarus: added support for importing Master System and Game Gear ROMs - tomoko: added library sub-menus for each manufacturer - still need to sort Game Gear after Mega Drive somehow ... The sub-menu system actually isn't all that bad. It is indeed a bit more annoying, but not as annoying as I thought it was going to be. However, it looks a hell of a lot nicer now. |
|
Tim Allen | 043f6a8b33 |
Update to v101r08 release.
byuu says: Changelog: - 68K: fixed read-modify-write instructions - 68K: fixed ADDX bug (using wrong target) - 68K: fixed major bug with SUB using wrong argument ordering - 68K: fixed sign extension when reading address registers from effective addressing - 68K: fixed sign extension on CMPA, SUBA instructions - VDP: improved OAM sprite attribute table caching behavior - VDP: improved DMA fill operation behavior - added Master System / Game Gear stubs (needed for developing the Z80 core) |
|
Tim Allen | ffd150735b |
Update to v101r07 release.
byuu says: Added VDP sprite rendering. Can't get any games far enough in to see if it actually works. So in other words, it doesn't work at all and is 100% completely broken. Also added 68K exceptions and interrupts. So far only the VDP interrupt is present. It definitely seems to be firing in commercial games, so that's promising. But the implementation is almost certainly completely wrong. There is fuck all of nothing for documentation on how interrupts actually work. I had to find out the interrupt vector numbers from reading the comments from the Sonic the Hedgehog disassembly. I have literally no fucking clue what I0-I2 (3-bit integer priority value in the status register) is supposed to do. I know that Vblank=6, Hblank=4, Ext(gamepad)=2. I know that at reset, SR.I=7. I don't know if I'm supposed to block interrupts when I is >, >=, <, <= to the interrupt level. I don't know what level CPU exceptions are supposed to be. Also implemented VDP regular DMA. No idea if it works correctly since none of the commercial games run far enough to use it. So again, it's horribly broken for usre. Also improved VDP fill mode. But I don't understand how it takes byte-lengths when the bus is 16-bit. The transfer times indicate it's actually transferring at the same speed as the 68K->VDP copy, strongly suggesting it's actually doing 16-bit transfers at a time. In which case, what happens when you set an odd transfer length? Also, both DMA modes can now target VRAM, VSRAM, CRAM. Supposedly there's all kinds of weird shit going on when you target VSRAM, CRAM with VDP fill/copy modes, but whatever. Get to that later. Also implemented a very lazy preliminary wait mechanism to to stall out a processor while another processor exerts control over the bus. This one's going to be a major work in progress. For one, it totally breaks the model I use to do save states with libco. For another, I don't know if a 68K->VDP DMA instantly locks the CPU, or if it the CPU could actually keep running if it was executing out of RAM when it started the DMA transfer from ROM (eg it's a bus busy stall, not a hard chip stall.) That'll greatly change how I handle the waiting. Also, the OSS driver now supports Audio::Latency. Sound should be even lower latency now. On FreeBSD when set to 0ms, it's absolutely incredible. Cannot detect latency whatsoever. The Mario jump sound seems to happen at the very instant I hear my cherry blue keyswitch activate. |
|
Tim Allen | 427bac3011 |
Update to v101r06 release.
byuu says: I reworked the video sizing code. Ended up wasting five fucking hours fighting GTK. When you call `gtk_widget_set_size_request`, it doesn't actually happen then. This is kind of a big deal because when I then go to draw onto the viewport, the actual viewport child window is still the old size, so the image gets distorted. It recovers in a frame or so with emulation, but if we were to put a still image on there, it would stay distorted. The first thought is, `while(gtk_events_pending()) gtk_main_iteration_do(false);` right after the `set_size_request`. But nope, it tells you there's no events pending. So then you think, go deeper, use `XPending()` instead. Same thing, GTK hasn't actually issued the command to Xlib yet. So then you think, if the widget is realized, just call a blocking `gtk_main_iteration`. One call does nothing, two calls results in a deadlock on the second one ... do it before program startup, and the main window will never appear. Great. Oh, and it's not just the viewport. It's also the widget container area of the windows, as well as the window itself, as well as the fullscreen mode toggle effect. They all do this. For the latter three, I couldn't find anything that worked, so I just added 20ms loops of constantly calling `gtk_main_iteration_do(false)` after each one of those things. The downside here is toggling the status bar takes 40ms, so you'll see it and it'll feel a tiny bit sluggish. But I can't have a 20ms wait on each widget resize, that would be catastrophic to performance on windows with lots of widgets. I tried hooking configure-event and size-allocate, but they were very unreliable. So instead I ended up with a loop that waits up to a maximm of 20ms that inspects the `widget->allocation.(width,height)` values directly and waits for them to be what we asked for with `set_size_request`. There was some extreme ugliness in GTK with calling `gtk_main_iteration_do` recursively (`hiro::Widget::setGeometry` is called recursively), so I had to lock it to only happen on the top level widgets (the child ones should get resized while waiting on the top-level ones, so it should be fine in practice), and also only run it on realized widgets. Even still, I'm getting ~3 timeouts when opening the settings dialog in higan, but no other windows. But, this is the best I can do for now. And the reason for all of this pain? Yeah, updated the video code. So the Emulator::Interface now has this: struct VideoSize { uint width, height; }; //or requiem for a tuple auto videoSize() -> VideoSize; auto videoSize(uint width, uint height, bool arc) -> VideoSize; The first function, for now, is just returning the literal surface size. I may remove this ... one thing I want to allow for is cores that send different texture sizes based on interlace/hires/overscan/etc settings. The second function is more interesting. Instead of having the UI trying to figure out sizing, I figure the emulation cores can do a better job and we can customize it per-core now. So it gets the window's width and height, and whether the user asked for aspect correction, and then computes the best width/height ratio possible. For now they're all just doing multiples of a 1x scale to the UI 2x,3x,4x modes. We still need a third function, which will probably be what I repurpose videoSize() for: to return the 'effective' size for pixel shaders, to then feed into ruby, to then feed into quark, to then feed into our shaders. Since shaders use normalized coordinates for pixel fetching, this should work out just fine. The real texture size will be exposed to quark shaders as well, of course. Now for the main window ... it's just hard-coded to be 640x480, 960x720, 1280x960 for now. It works nicely for some cores on some modes, not so much for others. Work in progress I guess. I also took the opportunity to draw the about dialog box logo on the main window. Got a bit fancy and used the old spherical gradient and impose functionality of nall/image on it. Very minor highlight, nothing garish. Just something nicer than a solid black window. If you guys want to mess around with sizes, placements, and gradient styles/colors/shapes ... feel free. If you come up with something nicer, do share. That's what led to all the GTK hell ... the logo wasn't drawing right as you resized the window. But now it is, though I am not at all happy with the hacking I had to do. I also had to improve the video update code as a result of this: - when you unload a game, it blacks out the screen - if you are not quitting the emulator, it'll draw the logo; if you are, it won't - when you load a game, it black out the logo These options prevent any unsightliness from resizing the viewport with image data on it already I need to redraw the logo when toggling fullscreen with no game loaded as well for Windows, it seems. |
|
Tim Allen | ac2d0ba1cf |
Update to v101r05 release.
byuu says: Changelog: - 68K: fixed bug that affected BSR return address - VDP: added very preliminary emulation of planes A, B, W (W is entirely broken though) - VDP: added command/address stuff so you can write to VRAM, CRAM, VSRAM - VDP: added VRAM fill DMA I would be really surprised if any commercial games showed anything at all, so I'd probably recommend against wasting your time trying, unless you're really bored :P Also, I wanted to add: I am accepting patches\! So if anyone wants to look over the 68K core for bugs, that would save me untold amounts of time in the near future :D |
|
Tim Allen | 1df2549d18 |
Update to v101r04 release.
byuu says: Changelog: - pulled the (u)intN type aliases into higan instead of leaving them in nall - added 68K LINEA, LINEF hooks for illegal instructions - filled the rest of the 68K lambda table with generic instance of ILLEGAL - completed the 68K disassembler effective addressing modes - still unsure whether I should use An to decode absolute addresses or not - pro: way easier to read where accesses are taking place - con: requires An to be valid; so as a disassembler it does a poor job - making it optional: too much work; ick - added I/O decoding for the VDP command-port registers - added skeleton timing to all five processor cores - output at 1280x480 (needed for mixed 256/320 widths; and to handle interlace modes) The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and syncing; which is the pathological worst case for libco. But they also have no logic inside of them. With all the above, I'm averaging around 250fps with just the 68K core actually functional, and the VDP doing a dumb "draw white pixels" loop. Still way too early to tell how this emulator is going to perform. Also, the 320x240 mode of the Genesis means that we don't need an aspect correction ratio. But we do need to ensure the output window is a multiple 320x240 so that the scale values work correctly. I was hard-coding aspect correction to stretch the window an additional \*8/7. But that won't work anymore so ... the main higan window is now 640x480, 960x720, or 1280x960. Toggling aspect correction only changes the video width inside the window. It's a bit jarring ... the window is a lot wider, more black space now for most modes. But for now, it is what it is. |
|
Tim Allen | 9b8c3ff8c0 |
Update to v101r03 release.
byuu says: The 68K core now implements all 88 instructions. It ended up being 111 instructions in my core due to splitting up opcodes with the same name but different addressing modes or directions (removes conditions at the expense of more code.) Technically, I don't have exceptions actually implemented yet, and RESET/STOP don't do anything but set flags. So there's still more to go. But ... close enough for statistics time! The M68K core source code is 124,712 bytes in size. The next largest core is the ARM7 core at 70,203 bytes in size. The M68K object size is 942KiB; with the next largest being the V30MZ core at 173KiB. There are a total of 19,656 invalid opcodes in the 68000 revision (unless of course I've made mistakes in my mappings, which is very probably.) Now the fun part ... figuring out how to fix bugs in this core without VDP emulation :/ |
|
Tim Allen | 0a57cac70c |
Update to v101r02 release.
byuu says: Changelog: - Emulator: use `(uintmax)-1 >> 1` for the units of time - MD: implemented 13 new 68K instructions (basically all of the remaining easy ones); 21 remain - nall: replaced `(u)intmax_t` (64-bit) with *actual* `(u)intmax` type (128-bit where available) - this extends to everything: atoi, string, etc. You can even print 128-bit variables if you like 22,552 opcodes still don't exist in the 68K map. Looking like quite a few entries will be blank once I finish. |
|
Tim Allen | 8bdf8f2a55 |
Update to v101r01 release.
byuu says: Changelog: - added eight more 68K instructions - split ADD(direction) into two separate ADD functions I now have 54 out of 88 instructions implemented (thus, 34 remaining.) The map is missing 25,182 entries out of 65,536. Down from 32,680 for v101.00 Aside: this version number feels really silly. r10 and r11 surely will as well ... |
|
Tim Allen | e39987a3e3 |
Update to v101 release.
byuu says (in the public announcement): Not a large changelog this time, sorry. This release is mostly to fix the SA-1 issue, and to get some real-world testing of the new scheduler model. Most of the work in the past month has gone into writing a 68000 CPU core; yet it's still only about half-way finished. Changelog (since the previous release): - fixed SNES SA-1 IRQ regression (fixes Super Mario RPG level-up screen) - new scheduler for all emulator cores (precision of 2^-127) - icarus database adds nine new SNES games - added Input/Frequency to settings file (allows simulation of latency) byuu says (in the WIP forum): Changelog: - in 32-bit mode, Thread uses uint64\_t with 2^-63 time units (10^-7 precision in the worst case) - nearly ten times the precision of an attosecond - in 64-bit mode, Thread uses uint128\_t with 2^-127 time units (10^-26 precision in the worst case) - far more accurate than yoctoseconds; almost closing in on planck time Note: a quartz crystal is accurate to 10^-4 or 10^-5. A cesium fountain atomic clock is accurate to 10^-15. So ... yeah. 2^-63 was perfectly fine; but there was no speed penalty whatsoever for using uint128\_t in 64-bit mode, so why not? |
|
Tim Allen | f5e5bf1772 |
Update to v100r16 release.
byuu says: (Windows users may need to include <sys/time.h> at the top of nall/chrono.hpp, not sure.) Unchangelog: - forgot to add the Scheduler clock=0 fix because I have the memory of a goldfish Changelog: - new icarus database with nine additional games - hiro(GTK,Qt) won't constantly write its settings.bml file to disk anymore - added latency simulator for fun (settings.bml => Input/Latency in milliseconds) So the last one ... I wanted to test out nall::chrono, and I was also thinking that by polling every emulated frame, it's pretty wasteful when you are using Fast Forward and hitting 200+fps. As I've said before, calls to ruby::input::poll are not cheap. So to get around this, I added a limiter so that if you called the hardware poll function within N milliseconds, it'll return without doing any actual work. And indeed, that increases my framerate of Zelda 3 uncapped from 133fps to 142fps. Yay. But it's not a "real" speedup, as it only helps you when you exceed 100% speed (theoretically, you'd need to crack 300% speed since the game itself will poll at 16ms at 100% speed, but yet it sped up Zelda 3, so who am I to complain?) I threw the latency value into the settings file. It should be 16, but I set it to 5 since that was the lowest before it started negatively impacting uncapped speeds. You're wasting your time and CPU cycles setting it lower than 5, but if people like placebo effects it might work. Maybe I should let it be a signed integer so people can set it to -16 and think it's actually faster :P (I'm only joking. I took out the 96000hz audio placebo effect as well. Not really into psychological tricks anymore.) But yeah seriously, I didn't do this to start this discussion again for the billionth time. Please don't go there. And please don't tell me this WIP has higher/lower latency than before. I don't want to hear it. The only reason I bring it up is for the fun part that is worth discussing: put up or shut up time on how sensitive you are to latency! You can set the value above 5 to see how games feel. I personally can't really tell a difference until about 50. And I can't be 100% confident it's worse until about 75. But ... when I set it to 150, games become "extra difficult" ... the higher it goes, the worse it gets :D For this WIP, I've left no upper limit cap. I'll probably set a cap of something like 500ms or 1000ms for the official release. Need to balance user error/trolling with enjoyability. I'll think about it. [...] Now, what I worry about is stupid people seeing it and thinking it's an "added latency" setting, as if anyone would intentionally make things worse by default. This is a limiter. So if 5ms have passed since the game last polled, and that will be the case 99.9% of the time in games, the next poll will happen just in time, immediately when the game polls the inputs. Thus, a value below 1/<framerate>ms is not only pointless, if you go too low it will ruin your fast forward max speeds. I did say I didn't want to resort to placebo tricks, but I also don't want to spark up public discussion on this again either. So it might be best to default Input/Latency to 0ms, and internally have a max(5, latency) wrapper around the value. |
|
Tim Allen | c50723ef61 |
Update to v100r15 release.
byuu wrote: Aforementioned scheduler changes added. Longer explanation of why here: http://hastebin.com/raw/toxedenece Again, we really need to test this as thoroughly as possible for regressions :/ This is a really major change that affects absolutely everything: all emulation cores, all coprocessors, etc. Also added ADDX and SUB to the 68K core, which brings us just barely above 50% of the instruction encoding space completed. [Editor's note: The "aformentioned scheduler changes" were described in a previous forum post: Unfortunately, 64-bits just wasn't enough precision (we were getting misalignments ~230 times a second on 21/24MHz clocks), so I had to move to 128-bit counters. This of course doesn't exist on 32-bit architectures (and probably not on all 64-bit ones either), so for now ... higan's only going to compile on 64-bit machines until we figure something out. Maybe we offer a "lower precision" fallback for machines that lack uint128_t or something. Using the booth algorithm would be way too slow. Anyway, the precision is now 2^-96, which is roughly 10^-29. That puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly referring to it as the byuusecond. The other 32-bits of precision allows a 1Hz clock to run up to one full second before all clocks need to be normalized to prevent overflow. I fixed a serious wobbling issue where I was using clock > other.clock for synchronization instead of clock >= other.clock; and also another aliasing issue when two threads share a common frequency, but don't run in lock-step. The latter I don't even fully understand, but I did observe it in testing. nall/serialization.hpp has been extended to support 128-bit integers, but without explicitly naming them (yay generic code), so nall will still compile on 32-bit platforms for all other applications. Speed is basically a wash now. FC's a bit slower, SFC's a bit faster. The "longer explanation" in the linked hastebin is: Okay, so the idea is that we can have an arbitrary number of oscillators. Take the SNES: - CPU/PPU clock = 21477272.727272hz - SMP/DSP clock = 24576000hz - Cartridge DSP1 clock = 8000000hz - Cartridge MSU1 clock = 44100hz - Controller Port 1 modem controller clock = 57600hz - Controller Port 2 barcode battler clock = 115200hz - Expansion Port exercise bike clock = 192000hz Is this a pathological case? Of course it is, but it's possible. The first four do exist in the wild already: see Rockman X2 MSU1 patch. Manifest files with higan let you specify any frequency you want for any component. The old trick higan used was to hold an int64 counter for each thread:thread synchronization, and adjust it like so: - if thread A steps X clocks; then clock += X * threadB.frequency - if clock >= 0; switch to threadB - if thread B steps X clocks; then clock -= X * threadA.frequency - if clock < 0; switch to threadA But there are also system configurations where one processor has to synchronize with more than one other processor. Take the Genesis: - the 68K has to sync with the Z80 and PSG and YM2612 and VDP - the Z80 has to sync with the 68K and PSG and YM2612 - the PSG has to sync with the 68K and Z80 and YM2612 Now I could do this by having an int64 clock value for every association. But these clock values would have to be outside the individual Thread class objects, and we would have to update every relationship's clock value. So the 68K would have to update the Z80, PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds per clock step event instead of one. As such, we have to account for both possibilities. The only way to do this is with a single time base. We do this like so: - setup: scalar = timeBase / frequency - step: clock += scalar * clocks Once per second, we look at every thread, find the smallest clock value. Then subtract that value from all threads. This prevents the clock counters from overflowing. Unfortunately, these oscillator values are psychotic, unpredictable, and often times repeating fractions. Even with a timeBase of 1,000,000,000,000,000,000 (one attosecond); we get rounding errors every ~16,300 synchronizations. Specifically, this happens with a CPU running at 21477273hz (rounded) and SMP running at 24576000hz. That may be good enough for most emulators, but ... you know how I am. Plus, even at the attosecond level, we're really pushing against the limits of 64-bit integers. Given the reciprocal inverse, a frequency of 1Hz (which does exist in higan!) would have a scalar that consumes 1/18th of the entire range of a uint64 on every single step. Yes, I could raise the frequency, and then step by that amount, I know. But I don't want to have weird gotchas like that in the scheduler core. Until I increase the accuracy to about 100 times greater than a yoctosecond, the rounding errors are too great. And since the only choice above 64-bit values is 128-bit values; we might as well use all the extra headroom. 2^-96 as a timebase gives me the ability to have both a 1Hz and 4GHz clock; and run them both for a full second; before an overflow event would occur. Another hastebin includes demonstration code: #include <libco/libco.h> #include <nall/nall.hpp> using namespace nall; // cothread_t mainThread = nullptr; const uint iterations = 100'000'000; const uint cpuFreq = 21477272.727272 + 0.5; const uint smpFreq = 24576000.000000 + 0.5; const uint cpuStep = 4; const uint smpStep = 5; // struct ThreadA { cothread_t handle = nullptr; uint64 frequency = 0; int64 clock = 0; auto create(auto (*entrypoint)() -> void, uint frequency) { this->handle = co_create(65536, entrypoint); this->frequency = frequency; this->clock = 0; } }; struct CPUA : ThreadA { static auto Enter() -> void; auto main() -> void; CPUA() { create(&CPUA::Enter, cpuFreq); } } cpuA; struct SMPA : ThreadA { static auto Enter() -> void; auto main() -> void; SMPA() { create(&SMPA::Enter, smpFreq); } } smpA; uint8 queueA[iterations]; uint offsetA; cothread_t resumeA = cpuA.handle; auto EnterA() -> void { offsetA = 0; co_switch(resumeA); } auto QueueA(uint value) -> void { queueA[offsetA++] = value; if(offsetA >= iterations) { resumeA = co_active(); co_switch(mainThread); } } auto CPUA::Enter() -> void { while(true) cpuA.main(); } auto CPUA::main() -> void { QueueA(1); smpA.clock -= cpuStep * smpA.frequency; if(smpA.clock < 0) co_switch(smpA.handle); } auto SMPA::Enter() -> void { while(true) smpA.main(); } auto SMPA::main() -> void { QueueA(2); smpA.clock += smpStep * cpuA.frequency; if(smpA.clock >= 0) co_switch(cpuA.handle); } // struct ThreadB { cothread_t handle = nullptr; uint128_t scalar = 0; uint128_t clock = 0; auto print128(uint128_t value) { string s; while(value) { s.append((char)('0' + value % 10)); value /= 10; } s.reverse(); print(s, "\n"); } //femtosecond (10^15) = 16306 //attosecond (10^18) = 688838 //zeptosecond (10^21) = 13712691 //yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble) //byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond) auto create(auto (*entrypoint)() -> void, uint128_t frequency) { this->handle = co_create(65536, entrypoint); uint128_t unitOfTime = 1; //for(uint n : range(29)) unitOfTime *= 10; unitOfTime <<= 96; //2^96 time units ... this->scalar = unitOfTime / frequency; print128(this->scalar); this->clock = 0; } auto step(uint128_t clocks) -> void { clock += clocks * scalar; } auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); } }; struct CPUB : ThreadB { static auto Enter() -> void; auto main() -> void; CPUB() { create(&CPUB::Enter, cpuFreq); } } cpuB; struct SMPB : ThreadB { static auto Enter() -> void; auto main() -> void; SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; } } smpB; auto correct() -> void { auto minimum = min(cpuB.clock, smpB.clock); cpuB.clock -= minimum; smpB.clock -= minimum; } uint8 queueB[iterations]; uint offsetB; cothread_t resumeB = cpuB.handle; auto EnterB() -> void { correct(); offsetB = 0; co_switch(resumeB); } auto QueueB(uint value) -> void { queueB[offsetB++] = value; if(offsetB >= iterations) { resumeB = co_active(); co_switch(mainThread); } } auto CPUB::Enter() -> void { while(true) cpuB.main(); } auto CPUB::main() -> void { QueueB(1); step(cpuStep); synchronize(smpB); } auto SMPB::Enter() -> void { while(true) smpB.main(); } auto SMPB::main() -> void { QueueB(2); step(smpStep); synchronize(cpuB); } // #include <nall/main.hpp> auto nall::main(string_vector) -> void { mainThread = co_active(); uint masterCounter = 0; while(true) { print(masterCounter++, " ...\n"); auto A = clock(); EnterA(); auto B = clock(); print((double)(B - A) / CLOCKS_PER_SEC, "s\n"); auto C = clock(); EnterB(); auto D = clock(); print((double)(D - C) / CLOCKS_PER_SEC, "s\n"); for(uint n : range(iterations)) { if(queueA[n] != queueB[n]) return print("fail at ", n, "\n"); } } } ...and that's everything.] |
|
Tim Allen | ca277cd5e8 |
Update to v100r14 release.
byuu says: (Windows: compile with -fpermissive to silence an annoying error. I'll fix it in the next WIP.) I completely replaced the time management system in higan and overhauled the scheduler. Before, processor threads would have "int64 clock"; and there would be a 1:1 relationship between two threads. When thread A ran for X cycles, it'd subtract X * B.Frequency from clock; and when thread B ran for Y cycles, it'd add Y * A.Frequency from clock. This worked well and allowed perfect precision; but it doesn't work when you have more complicated relationships: eg the 68K can sync to the Z80 and PSG; the Z80 to the 68K and PSG; so the PSG needs two counters. The new system instead uses a "uint64 clock" variable that represents time in attoseconds. Every time the scheduler exits, it subtracts the smallest clock count from all threads, to prevent an overflow scenario. The only real downside is that rounding errors mean that roughly every 20 minutes, we have a rounding error of one clock cycle (one 20,000,000th of a second.) However, this only applies to systems with multiple oscillators, like the SNES. And when you're in that situation ... there's no such thing as a perfect oscillator anyway. A real SNES will be thousands of times less out of spec than 1hz per 20 minutes. The advantages are pretty immense. First, we obviously can now support more complex relationships between threads. Second, we can build a much more abstracted scheduler. All of libco is now abstracted away completely, which may permit a state-machine / coroutine version of Thread in the future. We've basically gone from this: auto SMP::step(uint clocks) -> void { clock += clocks * (uint64)cpu.frequency; dsp.clock -= clocks; if(dsp.clock < 0 && !scheduler.synchronizing()) co_switch(dsp.thread); if(clock >= 0 && !scheduler.synchronizing()) co_switch(cpu.thread); } To this: auto SMP::step(uint clocks) -> void { Thread::step(clocks); synchronize(dsp); synchronize(cpu); } As you can see, we don't have to do multiple clock adjustments anymore. This is a huge win for the SNES CPU that had to update the SMP, DSP, all peripherals and all coprocessors. Likewise, we don't have to synchronize all coprocessors when one runs, now we can just synchronize the active one to the CPU. Third, when changing the frequencies of threads (think SGB speed setting modes, GBC double-speed mode, etc), it no longer causes the "int64 clock" value to be erroneous. Fourth, this results in a fairly decent speedup, mostly across the board. Aside from the GBA being mostly a wash (for unknown reasons), it's about an 8% - 12% speedup in every other emulation core. Now, all of this said ... this was an unbelievably massive change, so ... you know what that means >_> If anyone can help test all types of SNES coprocessors, and some other system games, it'd be appreciated. ---- Lastly, we have a bitchin' new about screen. It unfortunately adds ~200KiB onto the binary size, because the PNG->C++ header file transformation doesn't compress very well, and I want to keep the original resource files in with the higan archive. I might try some things to work around this file size increase in the future, but for now ... yeah, slightly larger archive sizes, sorry. The logo's a bit busted on Windows (the Label control's background transparency and alignment settings aren't working), but works well on GTK. I'll have to fix Windows before the next official release. For now, look on my Twitter feed if you want to see what it's supposed to look like. ---- EDIT: forgot about ICD2::Enter. It's doing some weird inverse run-to-save thing that I need to implement support for somehow. So, save states on the SGB core probably won't work with this WIP. |
|
Tim Allen | 306cac2b54 |
Update to v100r13 release.
byuu says: Changelog: M68K improvements, new instructions added. |
|
Tim Allen | f230d144b5 |
Update to v100r12 release.
byuu says: All of the above fixes, plus I added all 24 variations on the shift opcodes, plus SUBQ, plus fixes to the BCC instruction. I can now run 851,767 instructions into Sonic the Hedgehog before hitting an unimplemented instruction (SUB). The 68K core is probably only ~35% complete, and yet it's already within 4KiB of being the largest CPU core, code size wise, in all of higan. Fuck this chip. |
|
Tim Allen | 7ccfbe0206 |
Update to v100r11 release.
byuu says: I split the Register class and read/write handlers into DataRegister and AddressRegister, given that they have different behaviors on byte/word accesses (data tends to preserve the upper bits; address tends to sign-extend things.) I expanded EA to EffectiveAddress. No sense in abbreviating things to death. I've now implemented 26 instructions. But the new ones are just all the stupid from/to ccr/sr instructions. Ryphecha confirmed that you can't set the undefined bits, so I don't think the BitField concept is appropriate for the CCR/SR. Instead, I'm just storing direct flags and have (read,write)(CCR,SR) instead. This isn't like the 65816 where you have subroutines that push and pop the flag register. It's much more common to access individual flags. Doesn't match the consistency angle of the other CPU cores, but ... I think this is the right thing to for the 68K specifically. |
|
Tim Allen | 4b897ba791 |
Update to v100r10 release.
byuu says: Redesigned the handling of reading/writing registers to be about eight times faster than the old system. More work may be needed ... it seems data registers tend to preserve their upper bits upon assignment; whereas address registers tend to sign-extend values into them. It may make sense to have DataRegister and AddressRegister classes with separate read/write handlers. I'd have to hold two Register objects inside the EffectiveAddress (EA) class if we do that. Implemented 19 opcodes now (out of somewhere between 60 and 90.) That gets the first ~530,000 instructions in Sonic the Hedgehog running (though probably wrong. But we can run a lot thanks to large initialization loops.) If I force the core to loop back to the reset vector on an invalid opcode, I'm getting about 1500fps with a dumb 320x240 blit 60 times a second and just the 68K running alone (no Z80, PSG, VDP, YM2612.) I don't know if that's good or not. I guess we'll find out. I had to stop tonight because the final opcode I execute is an RTS (return from subroutine) that's branching back to address 0; which is invalid ... meaning something went terribly wrong and the system crashed. |
|
Tim Allen | be3f6ac0d5 |
Update to v100r09 release.
byuu says: Another six hours in ... I have all of the opcodes, memory access functions, disassembler mnemonics and table building converted over to the new template<uint Size> format. Certainly, it would be quite easy for this nightmare chip to throw me another curveball, but so far I can handle: - MOVE (EA to, EA from) case - read(from) has to update register index for +/-(aN) mode - MOVEM (EA from) case - when using +/-(aN), RA can't actually be updated until the transfer is completed - LEA (EA from) case - doesn't actually perform the final read; just returns the address to be read from - ANDI (EA from-and-to) case - same EA has to be read from and written to - for -(aN), the read has to come from aN-2, but can't update aN yet; so that the write also goes to aN-2 - no opcode can ever fetch the extension words more than once - manually control the order of extension word fetching order for proper opcode decoding To do all of that without a whole lot of duplicated code (or really bloating out every single instruction with red tape), I had to bring back the "bool valid / uint32 address" variables inside the EA struct =( If weird exceptions creep in like timing constraints only on certain opcodes, I can use template flags to the EA read/write functions to handle that. |
|
Tim Allen | 92fe5b0813 |
Update to v100r08 release.
byuu says: Six and a half hours this time ... one new opcode, and all old opcodes now in a deprecated format. Hooray, progress! For building the table, I've decided to move from: for(uint opcode : range(65536)) { if(match(...)) bind(opNAME, ...); } To instead having separate for loops for each supported opcode. This lets me specialize parts I want with templates. And to this aim, I'm moving to replace all of the (read,write)(size, ...) functions with (read,write)<Size>(...) functions. This will amount to the ~70ish instructions being triplicated ot ~210ish instructions; but I think this is really important. When I was getting into flag calculations, a ton of conditionals were needed to mask sizes to byte/word/long. There was also lots of conditionals in all the memory access handlers. The template code is ugly, but we eliminate a huge amount of branch conditions this way. |
|
Tim Allen | 059347e575 |
Update to v100r07 release.
byuu says: Four and a half hours of work and ... zero new opcodes implemented. This was the best job I could do refining the effective address computations. Should have all twelve 68000 modes implemented now. Still have a billion questions about when and how I'm supposed to perform certain edge case operations, though. |
|
Tim Allen | 0d6a09f9f8 |
Update to v100r06 release.
byuu says: Up to ten 68K instructions out of somewhere between 61 and 88, depending upon which PDF you look at. Of course, some of them aren't 100% completed yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant that needs stack push/pop functions. This WIP actually took over eight hours to make, going through every possible permutation on how to design the core itself. The updated design now builds both the instruction decoder+dispatcher and the disassembler decoder into the same main loop during M68K's constructor. The special cases are also really psychotic on this processor, and I'm afraid of missing something via the fallthrough cases. So instead, I'm ordering the instructions alphabetically, and including exclusion cases to ignore binding invalid cases. If I end up remapping an existing register, then it'll throw a run-time assertion at program startup. I wanted very much to get rid of struct EA (EffectiveAddress), but it's too difficult to keep track of the internal effective address without it. So I split out the size to a separate parameter, since every opcode only has one size parameter, and otherwise it was getting duplicated in opcodes that take two EAs, and was also awkward with the flag testing. It's a bit more typing, but I feel it's more clean this way. Overall, I'm really worried this is going to be too slow. I don't want to turn the EA stuff into templates, because that will massively bloat out compilation times and object sizes, and will also need a special DSL preprocessor since C++ doesn't have a static for loop. I can definitely optimize a lot of EA's address/read/write functions away once the core is completed, but it's never going to hold a candle to a templatized 68K core. ---- Forgot to include the SA-1 regression fix. I always remember immediately after I upload and archive the WIP. Will try to get that in next time, I guess. |
|
Tim Allen | b72f35a13e |
Update to v100r05 release.
byuu says: Alright, I'm definitely going to need to find some people willing to tolerate my questions on this chip, so I'm going to go ahead and announce I'm working on this I guess. This core is way too big for a surprise like the NES and WS cores were. It'll probably even span multiple v10x releases before it's even ready. |
|
Tim Allen | 1c0ef793fe |
Update to v100r04 release.
byuu says: I now have enough of three instructions implemented to get through the first four instructions in Sonic the Hedgehog. But they're far from complete. The very first instruction uses EA addressing, which is similar to x86's ModRM in terms of how disgustingly complex it is. And it also accesses Z80 control registers, which obviously isn't going to do anything yet. The slow speed was me being stupid again. It's not 7.6MHz per frame, it's 7.67MHz per second. So yeah, speed is so far acceptable again. But we'll see how things go as I keep emulating more. The 68K decode is not pretty at all. |
|
Tim Allen | 76a8ecd32a |
Update to v100r03 release.
byuu says: Changelog: - moved Thread, Scheduler, Cheat functionality into emulator/ for all cores - start of actual Mega Drive emulation (two 68K instructions) I'm going to be rather terse on MD emulation, as it's too early for any meaningful dialogue here. |
|
Tim Allen | 3dd1aa9c1b |
Update to v100r02 release.
byuu says: Sigh ... I'm really not a good person. I'm inherently selfish. My responsibility and obligation right now is to work on loki, and then on the Tengai Makyou Zero translation, and then on improving the Famicom emulation. And yet ... it's not what I really want to do. That shouldn't matter; I should work on my responsibilities first. Instead, I'm going to be a greedy, self-centered asshole, and work on what I really want to instead. I'm really sorry, guys. I'm sure this will make a few people happy, and probably upset even more people. I'm also making zero guarantees that this ever gets finished. As always, I wish I could keep these things secret, so if I fail / give up, I could just drop it with no shame. But I would have to cut everyone out of the WIP process completely to make it happen. So, here goes ... This WIP adds the initial skeleton for Sega Mega Drive / Genesis emulation. God help us. (minor note: apparently the new extension for Mega Drive games is .md, neat. That's what I chose for the folders too. I thought it was .smd, so that'll be fixed in icarus for the next WIP.) (aside: this is why I wanted to get v100 out. I didn't want this code in a skeleton state in v100's source. Nor did I want really broken emulation, which the first release is sure to be, tarring said release.) ... So, basically, I've been ruminating on the legacy I want to leave behind with higan. 3D systems are just plain out. I'm never going to support them. They're too complex for my abilities, and they would run too slowly with my design style. I'm not willing to compromise my design ideals. And I would never want to play a 3D game system at native 240p/480i resolution ... but 1080p+ upscaling is not accurate, so that's a conflict I want to avoid entirely. It's also never going to emulate computer systems (X68K, PC-98, FM-Towns, etc) because holy shit that would completely destroy me. It's also never going emulate arcade machines. So I think of higan as a collection of 2D emulators for consoles and handhelds. I've gone over every major 2D gaming system there is, looking for ones with games I actually care about and enjoy. And I basically have five of those systems supported already. Looking at the remaining list, I see only three systems left that I have any interest in whatsoever: PC-Engine, Master System, Mega Drive. Again, I'm not in any way committing to emulating any of these, but ... if I had all of those in higan, I think I'd be content to really, truly, finally stop writing more emulators for the rest of my life. And so I decided to tackle the most difficult system first. If I'm successful, the Z80 core should cover a lot of the work on the SMS. And the HuC6280 should land somewhere between the NES and SNES in terms of difficulty ... closer to the NES. The systems that just don't appeal to me at all, which I will never touch, include, but are not limited to: * Atari 2600/5200/7800 * Lynx * Jaguar * Vectrex * Colecovision * Commodore 64 * Neo-Geo * Neo-Geo Pocket / Color * Virtual Boy * Super A'can * 32X * CD-i * etc, etc, etc. And really, even if something were mildly interesting in there ... we have to stop. I can't scale infinitely. I'm already way past my limit, but I'm doing this anyway. Too many cores bloats everything and kills quality on everything. I don't want higan to become MESS v2. I don't know what I'll do about the Famicom Disk System, PC-Engine CD, and Mega CD. I don't think I'll be able to achieve 60fps emulating the Mega CD, even if I tried to. I don't know what's going to happen here with even the Mega Drive. Maybe I'll get driven crazy with the documentation and quit. Maybe it'll end up being too complicated and I'll quit. Maybe the emulation will end up way too slow and I'll give up. Maybe it'll take me seven years to get any games playable at all. Maybe Steve Snake, AamirM and Mike Pavone will pool money to hire a hitman to come after me. Who knows. But this is what I want to do, so ... here goes nothing. |
|
Tim Allen | 88c79e56a0 |
Update to v100r01 release.
[This version, with the internal version number changed back to "v100", replaced the original v100 source archive on byuu.org soon after v100's release, because it fixes important bugs in that version. --Ed] byuu says: Changelog: - fixed default paths for Sufami Turbo slotted games - moved WonderSwan orientation controls to the port rather than the device - I do like hex_usr's idea here; but that'll need more consideration; so this is a temporary fix - added new debugger interface (see the public topic for more on that) |
|
Tim Allen | 07995c05a5 |
Update to v100 release.
byuu says: higan has finally reached v100! I feel it's important to stress right away that this is not "version 1.00", nor is it a major milestone release. Rather than arbitrary version numbers, all of my software simply bumps version numbers by one for each official release. As such, higan v100 is simply higan's 100th release. That said, the primary focus of this release has been code clean-ups. These are always somewhat dangerous in that regressions are possible. We've tested through sixteen WIP revisions, one of which was open to the public, to try and minimize any regressions. But all the same, please report any regressions if you discover any. Changelog (since v099): FC: render during pixels 1-256 instead of 0-255 [hex_usr] FC: rewrote controller emulation code SFC: 8% speedup over the previous release thanks to PPU optimizations SFC: fixed nasty DB address wrapping regression from v099 SFC: USART developer controller removed; superseded by 21fx SFC: Super Multitap option removed from controller port 1; ports renamed 2-5 SFC: hidden option to experiment with 128KB VRAM (strictly for novelty) higan: audio volume no longer divided by number of audio streams higan: updated controller polling code to fix possible future mapping issues higan: replaced nall/stream with nall/vfs for file-loading subsystem tomoko: can now load multi-slotted games via command-line tomoko: synchronize video removed from UI; still available in the settings file tomoko, icarus: can navigate to root drive selection on Windows all: major code cleanups and refactoring (~1MB diff against v099) Note 1: the audio volume change means that SGB and MSU1 games won't lose half the volume on the SNES sounds anymore. However, if one goes overboard and drives the sound all the way to max volume with the MSU1, clamping may occur. The obvious solution is not to drive volume that high (it will vastly overpower the SNES audio, which usually never exceeds 25% volume.) Another option is to lower the volume in the audio settings panel to 50%. In general, neither is likely to ever be necessary. Note 2: the synchronize video option was hidden from the UI because it is no longer useful. With the advent of compositors, the loss of the complicated timing settings panel, support for the WonderSwan and its 75hz display, the need to emulate variable refresh rate behaviors in the Game Boy, the unfortunate latency spike and audio distortion caused by long Vsync pauses, and the arrival of adaptive sync technology ... it no longer makes sense to present this option. However, as stated, you can edit settings.bml to enable this option anyway if you insist and understand the aforementioned risks. Changelog (since v099r16 open beta): - fixed MSU1 audio sign extension - fixed compilation with SGB support disabled - icarus can now navigate to root directory - fixed compilation issues with OS X port - (hopefully) fixed label height issue with hiro that affected icarus import dialog - (mostly) fixed BS Memory, Sufami Turbo slot loading Errata: - forgot to remove the " - Slot A", " - Slot B" suffixes for Sufami Turbo slot loading - this means you have to navigate up one folder and then into Sufami Turbo/ to load games for this system - moving WonderSwan orientation controls to the device slot is causing some nastiness - can now select orientation from the main menu, but it doesn't rotate the display |
|
Tim Allen | 13ad9644a2 |
Update to v099r16 release (public beta).
byuu says: Changelog: - hiro: BrowserDialog can navigate up to drive selection on Windows - nall: (file,path,dir,base,prefix,suffix)name => Location::(file,path,dir,base,prefix,suffix) - higan/tomoko: rename audio filter label from "Sinc" to "IIR - Biquad" - higan/tomoko: allow loading files via icarus on the command-line once again - higan/tomoko: (begrudging) quick hack to fix presentation window focus on startup - higan/audio: don't divide output audio volume by number of streams - processor/r65816: fix a regression in (read,write)DB; fixes Taz-Mania - fixed compilation regressions on Windows and Linux I'm happy with where we are at with code cleanups and stability, so I'd like to release v100. But even though I'm not assigning any special significance to this version, we should probably test it more thoroughly first. |
|
Tim Allen | 8d5cc0c35e |
Update to v099r15 release.
byuu says: Changelog: - nall::lstring -> nall::string_vector - added IntegerBitField<type, lo, hi> -- hopefully it works correctly... - Multitap 1-4 -> Super Multitap 2-5 - fixed SFC PPU CGRAM read regression - huge amounts of SFC PPU IO register cleanups -- .bits really is lovely - re-added the read/write(VRAM,OAM,CGRAM) helpers for the SFC PPU - but they're now optimized to the realities of the PPU (16-bit data sizes / no address parameter / where appropriate) - basically used to get the active-display overrides in a unified place; but also reduces duplicate code in (read,write)IO |
|
Tim Allen | 82293c95ae |
Update to v099r14 release.
byuu says: Changelog: - (u)int(max,ptr) abbreviations removed; use _t suffix now [didn't feel like they were contributing enough to be worth it] - cleaned up nall::integer,natural,real functionality - toInteger, toNatural, toReal for parsing strings to numbers - fromInteger, fromNatural, fromReal for creating strings from numbers - (string,Markup::Node,SQL-based-classes)::(integer,natural,real) left unchanged - template<typename T> numeral(T value, long padding, char padchar) -> string for print() formatting - deduces integer,natural,real based on T ... cast the value if you want to override - there still exists binary,octal,hex,pointer for explicit print() formatting - lstring -> string_vector [but using lstring = string_vector; is declared] - would be nice to remove the using lstring eventually ... but that'd probably require 10,000 lines of changes >_> - format -> string_format [no using here; format was too ambiguous] - using integer = Integer<sizeof(int)*8>; and using natural = Natural<sizeof(uint)*8>; declared - for consistency with boolean. These three are meant for creating zero-initialized values implicitly (various uses) - R65816::io() -> idle() and SPC700::io() -> idle() [more clear; frees up struct IO {} io; naming] - SFC CPU, PPU, SMP use struct IO {} io; over struct (Status,Registers) {} (status,registers); now - still some CPU::Status status values ... they didn't really fit into IO functionality ... will have to think about this more - SFC CPU, PPU, SMP now use step() exclusively instead of addClocks() calling into step() - SFC CPU joypad1_bits, joypad2_bits were unused; killed them - SFC PPU CGRAM moved into PPU::Screen; since nothing else uses it - SFC PPU OAM moved into PPU::Object; since nothing else uses it - the raw uint8[544] array is gone. OAM::read() constructs values from the OAM::Object[512] table now - this avoids having to determine how we want to sub-divide the two OAM memory sections - this also eliminates the OAM::synchronize() functionality - probably more I'm forgetting The FPS fluctuations are driving me insane. This WIP went from 128fps to 137fps. Settled on 133.5fps for the final build. But nothing I changed should have affected performance at all. This level of fluctuation makes it damn near impossible to know whether I'm speeding things up or slowing things down with changes. |
|
Tim Allen | 67457fade4 |
Update to v099r13 release.
byuu says: Changelog: - GB core code cleanup completed - GBA core code cleanup completed - some more cleanup on missed processor/arm functions/variables - fixed FC loading icarus bug - "Load ROM File" icarus functionality restored - minor code unification efforts all around (not perfect yet) - MMIO->IO - mmio.cpp->io.cpp - read,write->readIO,writeIO It's been a very long work in progress ... starting all the way back with v094r09, but the major part of the higan code cleanup is now completed! Of course, it's very important to note that this is only for the basic style: - under_score functions and variables are now camelCase - return-type function-name() are now auto function-name() -> return-type - Natural<T>/Integer<T> replace (u)intT_n types where possible - signed/unsigned are now int/uint - most of the x==true,x==false tests changed to x,!x A lot of spot improvements to consistency, simplicity and quality have gone in along the way, of course. But we'll probably never fully finishing beautifying every last line of code in the entire codebase. Still, this is a really great start. Going forward, WIP diffs should start being smaller and of higher quality once again. I know the joke is, "until my coding style changes again", but ... this was way too stressful, way too time consuming, and way too risky. I'm too old and tired now for extreme upheavel like this again. The only major change I'm slowly mulling over would be renaming the using Natural<T>/Integer<T> = (u)intT; shorthand to something that isn't as easily confused with the (u)int_t types ... but we'll see. I'll definitely continue to change small things all the time, but for the larger picture, I need to just accept the style I have and live with it. |
|
Tim Allen | 7a68059f78 |
Update to v099r12 release.
byuu says: Changelog: - fixed FC AxROM / VRC7 regression - BitField split to BooleanBitField/NaturalBitField (in preparation for IntegerBitField) - BitFieldReference removed - GB CPU cleaned up - GB Cartridge + Mappers cleaned up - SFC CGRAM is now emulated as uint15[256] instead of uint[512] - sfc/ppu/memory.cpp no longer needed; removed - purged SFC Debugger hooks for now (some of the operator[] calls were bypassing them anyway) Unfortunately, for reasons that defy all semblance of logic, the CGRAM change caused a slight speed hit. As have the last few changes. We're now down to around 129.5fps compared to 123.fps for v099 and 134.5fps at our peak (v099r01-r02). I really like the style I came up with for the Game Boy mappers to settle the purpose(ROM,RAM) vs (rom,ram)Purpose naming convention. If I ever get around to redoing the NES mappers, that's likely the approach I'll take. |
|
Tim Allen | 3e807946b8 |
Update to v099r11 release.
byuu says: Changelog: - NES PPU core updated to use BitFields (absolutely massive improvement in code readability) - NES APU core updated to new coding style - NES cartridge/board and cartridge/chip updated to new coding style - pushed NES PPU rendering one dot forward (doesn't fix King's Quest V yet, sadly) - fixed SNES PPU BG tilemask for 128KiB VRAM mode (doesn't fix Yoshi's Island, though) So ... I kind of went overboard with the fc/cartridge changes. This WIP diff is 185KiB >_> I didn't realize it was going to be as big a task as it was, but once I started everything broke in a chain reaction, so I had to do it all at once. There's a massive chance we've broken a bunch of NES things. Any typos in this WIP are going to be absolutely insidious to track down =( But ... supposing I pulled it off, this means the Famicom core is now fully converted to the new coding style as well. That leaves only the GB and GBA cores. Once those are finished, then we'll finally be free of these gigantic hellspawn diffs. |
|
Tim Allen | a816998122 |
Update to v099r10 release.
byuu says: Changelog: - higan/profile/ => higan/systems/ [temporary; unless we can't think of a better base folder name] - god-damn-better-have fixed the input polling bug - re-added command-line and drag-and-drop loading - command-line loading can now load multiple folders at once (SGB+GB game; Sufami Turbo+Slot A+Slot B; etc) - if you load just the base cart, it'll present you with a dialog to optionally load slotted cart(s) - MSU1 now goes through nall/vfs instead of directly accessing the filesystem - Famicom Cartridge, PPU cores updated to newer programming style - there's countless opportunity for BitField and .bits() in the PPU ... but I'm worried about breaking things If anyone has a working MSU1 game and can test the changes out, that'd be appreciated. I still don't have a test ROM on my dev box. I wouldn't worry too much about extensively testing the Famicom PPU changes just yet ... I'm still struggling with what to name the structs inside the classes between all of my emulators, and the BitField/.bits() changes will be much more important to test at a later date. The only use case left for Emulator::Interface::path(uint id) is for 21fx emulation. This peripheral loads a DLL/SO via LoadLibrary/dlopen, which do not have any official ways to open a file in RAM. I'm very hesitant to use the portable trick of writing the memory to a temporary file, loading it, and deleting the temporary file once done ... it's a real waste of disk activity. I might make something like vfs::file::isVirtual->bool,path()->string to get around this. But even once I do, the underlying LoadLibrary/dlopen call is still going to be direct disk access. |
|
Tim Allen | 3a9c7c6843 |
Update to v099r09 release.
byuu says: Changelog: - Emulator::Interface::Medium::bootable removed - Emulator::Interface::load(bool required) argument removed [File::Required makes no sense on a folder] - Super Famicom.sys now has user-configurable properties (CPU,PPU1,PPU2 version; PPU1 VRAM size, Region override) - old nall/property removed completely - volatile flags supported on coprocessor RAM files now (still not in icarus, though) - (hopefully) fixed SNES Multitap support (needs testing) - fixed an OAM tiledata range clipping limit in 128KiB VRAM mode (doesn't fix Yoshi's Island, sadly) - (hopefully, again) fixed the input polling bug hex_usr reported - re-added dialog box for when File::Required files are missing - really cool: if you're missing a boot ROM, BIOS ROM, or IPL ROM, it warns you immediately - you don't have to select a game before seeing the error message anymore - fixed cheats.bml load/save location |
|
Tim Allen | f48b332c83 |
Update to v099r08 release.
byuu says: Changelog: - nall/vfs work 100% completed; even SGB games load now - emulation cores now call load() for the base cartridges as well - updated port/device handling; portmask is gone; device ID bug should be resolved now - SNES controller port 1 multitap option was removed - added support for 128KiB SNES PPU VRAM (for now, edit sfc/ppu/ppu.hpp VRAM::size=0x10000; to enable) Overall, nall/vfs was a huge success!! We've substantially reduced the amount of boilerplate code everywhere, while still allowing (even easier than before) support for RAM-based game loading/saving. All of nall/stream is dead and buried. I am considering removing Emulator::Interface::Medium::id and/or bootable flag. Or at least, doing something different with it. The values for the non-bootable GB/BS/ST entries duplicate the ID that is supposed to be unique. They are for GB/GBC and WS/WSC. Maybe I'll use this as the hardware revision selection ID, and then gut non-bootable options. There's really no reason for that to be there. I think at one point I was using it to generate library tabs for non-bootable systems, but we don't do that anymore anyway. Emulator::Interface::load() may not need the required flag anymore ... it doesn't really do anything right now anyway. I have a few reasons for having the cores load the base cartridge. Most importantly, it is going to enable a special mode for the WonderSwan / WonderSwan Color in the future. If we ever get the IPLROMs dumped ... it's possible to boot these systems with no games inserted to set user profile information and such. There are also other systems that may accept being booted without a cartridge. To reach this state, you would load a game and then cancel the load dialog. Right now, this results in games not loading. The second reason is this prevents nasty crashes when loading fails. So if you're missing a required manifest, the emulator won't die a violent death anymore. It's able to back out at any point. The third reason is consistency: loading the base cartridge works the same as the slot cartridges. The fourth reason is Emulator::Interface::open(uint pathID) values. Before, the GB, SB, GBC modes were IDs 1,2,3 respectively. This complicated things because you had to pass the correct ID. But now instead, Emulator::Interface::load() returns maybe<uint> that is nothing when no game is selected, and a pathID for a valid game. And now open() can take this ID to access this game's folder contents. The downside, which is temporary, is that command-line loading is currently broken. But I do intend on restoring it. In fact, I want to do better than before and allow multi-cart booting from the command-line by specifying the base cartridge and then slot cartridges. The idea should be pretty simple: keep a queue of pending filenames that we fill from the command-line and/or drag-and-drop operations on the main window, and then empty out the queue or prompt for load dialogs from the UI when booting a system. This also might be a bit more unorthodox compared to the traditional emulator design of "loadGame(filename)", but ... oh well. It's easy enough still. The port/device changes are fun. We simplified things quite a bit. The portmask stuff is gone entirely. While ports and devices keep IDs, this is really just sugar-coating so UIs can use for(auto& port : emulator->ports) and access port.id; rather than having to use for(auto n : range(emulator->ports)) { auto& port = emulator->ports[n]; ... }; but they should otherwise generally be identical to the order they appear in their respective ranges. Still, don't rely on that. Input::id is gone. There was no point since we also got rid of the nasty Input::order vector. Since I was in here, I went ahead and caved on the pedantics and renamed Input::guid to Input::userData. I removed the SNES controller port 1 multitap option. Basically, the only game that uses this is N-warp Daisakusen and, no offense to d4s, it's not really a good game anyway. It's just a quick demo to show 8-players on the SNES. But in the UI, all it does is confuse people into wasting time mapping a controller they're never going to use, and they're going to wonder which port to use. If more compelling use cases for 8-players comes about, we can reconsider this. I left all the code to support this in place, so all you have to do is uncomment one line to enable it again. We now have dsnes emulation! :D If you change PPU::VRAM::size to 0x10000 (words), then you should now have 128KiB of VRAM. Even better, it serializes the used-VRAM size, so your save states shouldn't crash on you if you swap between the two (though if you try this, you're nuts.) Note that this option does break commercial software. Yoshi's Island in particular. This game is setting A15 on some PPU register writes, but not on others. The end result of this is things break horribly in-game. Also, this option is causing a very tiny speed hit for obvious reasons with the variable masking value (I'm even using size-1 for now.) Given how niche this is, I may just leave it a compile-time constant to avoid the overhead cost. Otherwise, if we keep the option, then it'll go into Super Famicom.sys/manifest.bml ... I'll flesh that out in the near-future. ---- Finally, some fun for my OCD ... my monitor suddenly cut out on me in the middle of working on this WIP, about six hours in of non-stop work. Had to hit a bunch of ctrl+alt+fN commands (among other things) and trying to log in headless on another TTY to do issue commands, trying to recover the display. Finally power cycled the monitor and it came back up. So all my typing ended up going to who knows where. Usually this sort of thing terrifies me enough that I scrap a WIP and start over to ensure I didn't screw anything up during the crashed screen when hitting keys randomly. Obviously, everything compiles and appears to work fine. And I know it's extremely paranoid, but OCD isn't logical, so ... I'm going to go over every line of the 100KiB r07->r08 diff looking for any corruption/errors/whatever. ---- Review finished. r08 diff review notes: - fc/controller/gamepad/gamepad.cpp: use uint device = ID::Device::Gamepad; not id = ...; - gb/cartridge/cartridge.hpp: remove redundant uint _pathID; (in Information::pathID already) - gb/cartridge/cartridge.hpp: pull sha256 inside Information - sfc/cartridge/load/cpp: add " - Slot (A,B)" to interface->load("Sufami Turbo"); to be more descriptive - sfc/controller/gamepad/gamepad.cpp: use uint device = ID::Device::Gamepad; not id = ...; - sfc/interface/interface.cpp: remove n variable from the Multitap device input generation loop (now unused) - sfc/interface/interface.hpp: put struct Port above struct Device like the other classes - ui-tomoko: cheats.bml is reading from/writing to mediumPaths(0) [system folder instead of game folder] - ui-tomoko: instead of mediumPaths(1) - call emulator->metadataPathID() or something like that |
|
Tim Allen | ccd8878d75 |
Update to v099r07 release.
byuu says: Changelog: - (hopefully) fixed BS Memory and Sufami Turbo slot loading - ported GB, GBA, WS cores to use nall/vfs - completely removed loadRequest, saveRequest functionality from Emulator::Interface and ui-tomoko - loadRequest(folder) is now load(folder) - save states now use a shared Emulator::SerializerVersion string - whenever this is bumped, all older states will break; but this makes bumping state versions way easier - also, the version string makes it a lot easier to identify compatibility windows for save states - SNES PPU now uses uint16 vram[32768] for memory accesses [hex_usr] NOTE: Super Game Boy loading is currently broken, and I'm not entirely sure how to fix it :/ The file loading handoff was -really- complicated, and so I'm kind of at a loss ... so for now, don't try it. Everything else should theoretically work, so please report any bugs you find. So, this is pretty much it. I'd be very curious to hear feedback from people who objected to the old nall/stream design, whether they are happy with the new file loading system or think it could use further improvements. The 16-bit VRAM turned out to be a wash on performance (roughly the same as before. 1fps slower on Zelda 3, 1fps faster on Yoshi's Island.) The main reason for this was because Yoshi's Island was breaking horribly until I changed the vramRead, vramWrite functions to take uint15 instead of uint16. I suspect the issue is we're using uint16s in some areas now that need to be uint15, and this game is setting the VRAM address to 0x8000+, causing us to go out of bounds on memory accesses. But ... I want to go ahead and do something cute for fun, and just because we can ... and this new interface is so incredibly perfect for it!! I want to support an SNES unit with 128KiB of VRAM. Not out of the box, but as a fun little tweakable thing. The SNES was clearly designed to support that, they just didn't use big enough VRAM chips, and left one of the lines disconnected. So ... let's connect it anyway! In the end, if we design it right, the only code difference should be one area where we mask by 15-bits instead of by 16-bits. |
|
Tim Allen | 875f031182 |
Update to v099r06 release.
byuu says: Changelog: - Super Famicom core converted to use nall/vfs - excludes Super Game Boy; since that's invoked from inside the GB core This was definitely the major obstacle to test nall/vfs' applicability. Things worked out pretty great in the end. We went from 22.0KiB (cartridge) + 18.6KiB (interface) to 24.5KiB (cartridge) + 11.4KiB (interface). Or 40.7KiB to 36.0KiB. This removes a very large source of indirection. Before it was: "coprocessor <=> cartridge <=> interface" for loading and saving data, and now it's just "coprocessor <=> cartridge". And it may make sense to eventually turn this into just "cartridge -> coprocessor" by making each coprocessor class handle its own markup parsing. It's nice to have all the manifest parsing in one location (well, sans MSU1); but it's also nice for loading/unloading to be handled by each coprocessor itself. So I'll have to think longer about that one. I've also started handling Interface::save() differently. Instead of keeping track of memory IDs and filenames, and iterating through that vector of objects ... instead I now have a system that mirrors the markup parsing on loading, but handles saving instead. This was actually the reason the code size savings weren't more significant, but I like this style more. As before, it removes an extra level of indirection. So ... next up, I need to port over the GB, then GBA, then WS cores. These shouldn't take too long since they're all very simple with just ROM+RAM(+RTC) right now. Then get the SGB callbacks using vfs. Then after that, gut all the old stream stuff from nall and higan. Kill the (load,save)Request stuff, rename the load(Gamepak)Request to something simpler, and then we should be good. Anyway ... these are some huge changes. |
|
Tim Allen | f04d9d58f5 |
Update to v099r05 release.
byuu says: Changelog: - added nall/vfs - converted Famicom core to use nall/vfs interface instead of nall/stream interface |
|
Tim Allen | 40abcfc4a5 |
Update to v099r04 release.
byuu says: Changelog: - lots of code cleanups to processor/r6502 (the switch.cpp file is only halfway done ...) - lots of code cleanups to fc/cpu - removed fc/input - implemented fc/controller hex_usr, you may not like this, but I want to keep the controller port and expansion port interface separate, like I do with the SNES. I realize the NES' is used more for controllers, and the SNES' more for hardware expansions, but ... they're not compatible pinouts and you can't really connect one to the other. Right now, I've only implemented the controller portion. I'll have to get to the peripheral portion later. Also, the gamepad implementation there now may be wrong. It's based off the Super Famicom version obviously. I'm not sure if the Famicom has different behavior with latching $4016 writes, or not. But, it works in Mega Man II, so it's a start. Everyone, be sure to remap your controls, and then set port 1 -> gamepad after loading your first Famicom game with the new WIP. |
|
Tim Allen | 44a8c5a2b4 |
Update to v099r03 release.
byuu says: Changelog: - finished cleaning up the SFC core to my new coding conventions - removed sfc/controller/usart (superseded by 21fx) - hid Synchronize Video option from the menu (still in the configuration file) Pretty much the only minor detail left is some variable names in the SA-1 core that really won't look good at all if I move to camelCase, so I'll have to rethink how I handle those. It's probably a good area to attempt using BitFields, to see how it impacts performance. But I'll do that in a test branch first. But for the most part, this should be the end of the gigantic diffs (this one was 174KiB), at least for the SFC/WS cores. Still have the FC/GB/GBA cores to clean up more fully. Assuming we don't spot any new regressions, we should be ~95% out of the woods on code cleanups breaking things. |
|
Tim Allen | f1a80075fa |
Update to v099r02 release.
byuu says: Changelog: - renamed sfc/ppu/sprite (OAM oam;) to sfc/ppu/object (Object obj;) [hex_usr] - renamed sfc/ppu's memory {vram, oam, cgram} to just vram, oam, cgram - fixed addr&=~1 regression [hex_usr] - fixed 8bpp tiledata regression [hex_usr] |
|
Tim Allen | ae5b4c3bb3 |
Update to v099r01 release.
byuu says: Changelog: - massive cleanups and optimizations on the PPU core - ~9% speedup over v099 official This is pretty much it for the low-hanging fruit of speeding up higan. Any more gains from this point will be extremely hard-fought, unfortunately. |
|
Tim Allen | c074c6e064 |
Update to v099 release.
byuu says: Time for a new release. There are a few important emulation improvements and a few new features; but for the most part, this release focuses on major code refactoring, the details of which I will mostly spare you. The major change is that, as of v099, the SNES balanced and performance cores have been removed from higan. Basically, in addition to my five other emulation cores, these were too much of a burden to maintain. And they've come along as far as I was able to develop them. If you need to use these cores, please use these two from the v098 release. I'm very well aware that ~80% of the people using higan for SNES emulation were using the two removed profiles. But they simply had to go. Hopefully in the future, we can compensate for their loss by increasing the performance of the accuracy core. Changelog (since v098): SFC: balanced profile removed SFC: performance profile removed SFC: expansion port devices can now be changed during gameplay (atlhough you shouldn't) SFC: fixed bug in SharpRTC leap year calculations SFC: emulated new research findings for the S-DD1 coprocessor SFC: fixed CPU emulation-mode wrapping bug with pei, [dp], [dp]+y instructions [AWJ] SFC: fixed Super Game Boy bug that caused the bottom tile-row to flicker in games GB: added MBC1M (multi-cart) mapper; icarus can't detect these so manual manifests are needed for now GB: corrected return value when HuC3 unmapped RAM is read; fixes Robopon [endrift] GB: improved STAT IRQ emulation; fixes Altered Space, etc [endrift, gekkio] GB: partial emulation of DMG STAT write IRQ bug; fixes Legend of Zerd, Road Rash, etc nall: execute() fix, for some Linux platforms that had trouble detecting icarus nall: new BitField class; which allows for simplifying flag/register emulation in various cores ruby: added Windows WASAPI audio driver (experimental) ruby: remove attempts to call glSwapIntervalEXT (fixes crashing on some Linux systems) ui: timing settings panel removed video: restored saturation, gamma, luminance settings video: added new post-emulation sprite system; light gun cursors are now higher-resolution audio: new resampler (6th-order Butterworth biquad IIR); quite a bit faster than the old one audio: added optional basic reverb filter (for fun) higan: refresh video outside cooperative threads (workaround for shoddy code in AMD graphics drivers) higan: individual emulation cores no longer have unique names higan: really substantial code refactoring; 43% reduction in binary size Off the bat, here are the known bugs: hiro/Windows: focus stealing bug on startup. Needs to be fixed in hiro, not with a cheap hack to tomoko. higan/SFC: some of the coprocessors are saving some volatile memory to disk. Completely harmless, but still needs to be fixed. ruby/WASAPI: some sound cards have a lot of issues with the current driver (eg FitzRoy's). We need to find a clean way to fix this before it can be made the default driver. Which would be a huge win because the latency improvements are substantial, and in exclusive mode, WASAPI allows G-sync to work very well. [From the v099 WIP thread, here's the changelog since v098r19: - GB: don't force mode 1 during force-blank; fixes v098r16 regression with many Game Boy games - GB: only perform the STAT write IRQ bug during vblank, not hblank (still not hardware accurate, though) -Ed.] |
|
Tim Allen | 50420e3dd2 |
Update to v098r19 release.
byuu says: Changelog: - added nall/bit-field.hpp - updated all CPU cores (sans LR35902 due to some complexities) to use BitFields instead of bools - updated as many CPU cores as I could to use BitFields instead of union { struct { uint8_t ... }; }; pairs The speed changes are mostly a wash for this. In some instances, I noticed a ~2-3% speedup (eg SNES emulation), and in others a 2-3% slowdown (eg Famicom emulation.) It's within the margin of error, so it's safe to say it has no impact. This does give us a lot of new useful things, however: - no more manual reconstruction of flag values from lots of left shifts and ORs - no more manual deconstruction of flag values from lots of ANDs - ability to get completely free aliases to flag groups (eg GSU can provide alt2, alt1 and also alt (which is alt2,alt1 combined) - removes the need for the nasty order_lsbN macro hack (eventually will make higan 100% endian independent) - saves us from insane compilers that try and do nasty things with alignment on union-structs - saves us from insane compilers that try to store bit-field bits in reverse order - will allow some really novel new use cases (I'm planning an instant-decode ARM opcode function, for instance.) - reduces code size (we can serialize flag registers in one line instead of one for each flag) However, I probably won't use it for super critical code that's constantly reading out register values (eg PPU MMIO registers.) I think there we would end up with a performance penalty. |
|
Tim Allen | b08449215a |
Update to v098r18 release.
byuu says: Changelog: - hiro: fixed the BrowserDialog column resizing when navigating to new folders (prevents clipping of filenames) - note: this is kind of a quick-fix; but I have a good idea how to do the proper fix now - nall: added BitField<T, Lo, Hi> class - note: not yet working on the SFC CPU class; need to go at it with a debugger to find out what's happening - GB: emulated DMG/SGB STAT IRQ bug; fixes Zerd no Densetsu and Road Rash (won't fix anything else; don't get hopes up) |
|
Tim Allen | 9b452c9f5f |
Update to v098r17 release.
byuu says: Changelog: - fixed Super Game Boy regression from v096r04 with bottom tile row flickering - fixed GB STAT IRQ regression from previous WIP - Altered Space is now playable - GBVideoPlayer isn't; but nobody seems to know exactly what weird hardware quirk that one relies on to work - ~3-4% speed improvement in SuperFX games by eliminating function<> callback on register assignments - most noticeable in Doom in-game; least noticeable on Yoshi's Island title screen (darn) - finished GSU core and SuperFX coprocessor code cleanups - did some more work cleaning up the LR35902 core and GB CPU code Just a fair warning: don't get your hopes up on these GB fixes. Cliffhanger now hangs completely (har har), and none of the other bugs are fixed. We pretty much did all this work just for Altered Space. So, I hope you like playing Altered Space. |
|
Tim Allen | 3681961ca5 |
Update to v098r16 release.
byuu says: Changelog: - GNUmakefile: reverted $(call unique,) to $(strip) - processor/r6502: removed templates; reduces object size from 146.5kb to 107.6kb - processor/lr35902: removed templates; reduces object size from 386.2kb to 197.4kb - processor/spc700: merged op macros for switch table declarations - sfc/coprocessor/sa1: partial cleanups; flattened directory structure - sfc/coprocessor/superfx: partial cleanups; flattened directory structure - sfc/coprocessor/icd2: flattened directory structure - gb/ppu: changed behavior of STAT IRQs Major caveat! The GB/GBC STAT IRQ changes has a major bug in it somewhere that's seriously breaking most games. I'm pushing the WIP anyway, because I believe the changes to be mostly correct. I'd like to get more people looking at these changes, and also try more heavy-handed hacking and diff comparison logging between the previous WIP and this one. |