mirror of https://github.com/bsnes-emu/bsnes.git
28 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Tim Allen | 9a13863adb |
Update to v104r17 release.
byuu says: Changelog: - processor/m68k: fix error in disassembler [Sintendo] - processor/m68k: work around Clang compiler bug [Cydrak, Sintendo] This is one of the shortest WIPs I've done, but I'm trying not to change anything before v105. |
|
Tim Allen | 9c25f128f9 |
Update to v104r07 release.
byuu says: Changelog: - md/vdp: added VIP bit to status register; fixes Cliffhanger - processor/m68k/disassembler: added modes 7 and 8 to LEA address disassembly - processor/m68k/disassembler: enhanced ILLEGAL to display LINEA/LINEF $xxx variants - processor/m68k: ILLEGAL/LINEA/LINEF do not modify the stack register; fixes Caeser no Yabou II - icarus/sfc: request sgb1.boot.rom and sgb2.boot.rom separately; as they are different - icarus/sfc: removed support for external firmware when loading ROM images The hack to run Mega Drive Ballz 3D isn't in place, as I don't know if it's correct, and the graphics were corrupted anyway. The SGB boot ROM change is going to require updating the icarus database as well. I will add that in when I start dumping more cartridges here soon. Finally ... I explained this already, but I'll do so here as well: I removed icarus' support for loading SNES coprocessor firmware games with external firmware files (eg dsp1.program.rom + dsp1.data.rom in the same path as supermariokart.sfc, for example.) I realize most are going to see this as an antagonizing/stubborn move given the recent No-Intro discussion, and I won't deny that said thread is why this came to the forefront of my mind. But on my word, I honestly believe this was an ineffective solution for many reasons not related to our disagreements: 1. No-Intro distributes SNES coprocessor firmware as a merged file, eg "DSP1 (World).zip/DSP1 (World).bin" -- icarus can't possibly know about every ROM distribution set's naming conventions for firmware. (Right now, it appears GoodSNES and NSRT are mostly dead; but there may be more DATs in the future -- including my own.) 2. Even if the user obtains the firmware and tries to rename it, it won't work: icarus parses manifests generated by the heuristics module and sees two ROM files: dsp1.program.rom and dsp1.data.rom. icarus cannot identify a file named dsp1.rom as containing both of these sub-files. Users are going to have to know how to split files, which there is no way to do on stock Windows. Merging files, however, can be done via `copy /b supermariokart.sfc+dsp1.rom supermariokartdsp.sfc`; - and dsp1.rom can be named whatever now. I am not saying this will be easy for the average user, but it's easier than splitting files. 3. Separate firmware breaks icarus' database lookup. If you have pilotwings.sfc but without firmware, icarus will not find a match for it in the database lookup phase. It will then fall back on heuristics. The heuristics will pick DSP1B for compatibility with Ballz 3D which requires it. And so it will try to pull in the wrong firmware, and the game's intro will not work correctly. Furthermore, the database information will be unavailable, resulting in inaccurate mirroring. So for these reasons, I have removed said support. You must now load SNES coprocessor games into higan in one of two ways: 1) game paks with split files; or 2) SFC images with merged firmware. If and when No-Intro deploys a method I can actually use, I give you all my word I will give it a fair shot and if it's reasonable, I'll support it in icarus. |
|
Tim Allen | 571760c747 |
Update to v103r24 release.
byuu says: Changelog: - gb/mbc6: mapper is now functional, but Net de Get has some text corruption¹ - gb/mbc7: mapper is now functional² - gb/cpu: HDMA syncs other components after each byte transfer now - gb/ppu: LY,LX forced to zero when LCDC.d7 is lowered (eg disabled), not when it's raised (eg enabled) - gb/ppu: the LCD does not run at all when LCDC.d7 is clear³ - fixes graphical corruption between scene transitions in Legend of Zelda - Oracle of Ages - thanks to Cydrak, Shonumi, gekkio for their input on the cause of this issue - md/controller: renamed "Gamepad" to "Control Pad" per official terminology - md/controller: added "Fighting Pad" (6-button controller) emulation [hex\_usr] - processor/m68k: fixed TAS to set data.d7 when EA.mode==DataRegisterDirect; fixes Asterix - hiro/windows: removed carriage returns from mouse.cpp and desktop.cpp - ruby/audio/alsa: added device driver selection [SuperMikeMan] - ruby/audio/ao: set format.matrix=nullptr to prevent a crash on some systems [SuperMikeMan] - ruby/video/cgl: rename term() to terminate() to fix a crash on macOS [Sintendo] ¹: The observation that this mapper split $4000-7fff into two banks came from MAME's implementation. But their implementation was quite broken and incomplete, so I didn't actually use any of it. The observation that this mapper split $a000-bfff into two banks came from Tauwasser, and I did directly use that information, plus the knowledge that $0400/$0800 are the RAM bank select registers. The text corruption is due to a race condition with timing. The game is transferring font letters via HDMA, but the game code ends up setting the bank# with the font a bit too late after the HDMA has already occurred. I'm not sure how to fix this ... as a whole, I assumed my Game Boy timing was pretty good, but apparently it's not that good. ²: The entire design of this mapper comes from endrift's notes. endrift gets full credit for higan being able to emulate this mapper. Note that the accelerometer implementation is still not tested, and probably won't work right until I tweak the sensitivity a lot. ³: So the fun part of this is ... it breaks the strict 60fps rate of the Game Boy. This was always inevitable: certain timing conditions can stretch frames, too. But this is pretty much an absolute deal breaker for something like Vsync timing. This pretty much requires adaptive sync to run well without audio stuttering during the transition. There's currently one very important detail missing: when the LCD is turned off, presumably the image on the screen fades to white. I do not know how long this process takes, or how to really go about emulating it. Right now as an incomplete patch, I'm simply leaving the last displayed image on the screen until the LCD is turned on again. But I will have to output white, as well as add code to break out of the emulation loop periodically when the LCD is left off eg indefinitely, or bad things would happen. I'll work something out and then implement. Another detail is I'm not sure how long it takes for the LCD to start rendering again once enabled. Right now, it's immediate. I've heard it's as long as 1/60th of a second, but that really seems incredibly excessive? I'd like to know at least a reasonably well-supported estimate before I implement that. |
|
Tim Allen | 78f341489e |
Update to v103r03 release.
byuu says: Changelog: - md/psg: fixed output frequency rate regression from v103r02 - processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr, SuperMikeMan] - processor/spc700: renamed abbreviated instructions to functional descriptions (eg `XCN` → `ExchangeNibble`) - processor/spc700: removed memory.cpp shorthand functions (fetch, load, store, pull, push) - processor/spc700: updated all instructions to follow cycle behavior as documented by Overload with a logic analyzer Once again, the changes to the SPC700 core are really quite massive. And this time it's not just cosmetic: the idle cycles have been updated to pull from various memory addresses. This is why I removed the shorthand functions -- so that I could handle the at-times very bizarre addresses the SPC700 has on its address bus during its idle cycles. There is one behavior Overload mentioned that I don't emulate ... one of the cycles of the (X) transfer functions seems to not actually access the $f0-ff internal SMP registers? I don't fully understand what Overload is getting at, so I haven't tried to support it just yet. Also, there are limits to logic analyzers. In many cases the same address is read from twice consecutively. It is unclear which of the two reads the SPC700 actually utilizes. I tried to choose the most logical values (usually the first one), but ... I don't know that we'll be able to figure this one out. It's going to be virtually impossible to test this through software, because the PC can't really execute out of registers that have side effects on reads. |
|
Tim Allen | 82c58527c3 |
Update to v102r17 release.
byuu says: Changelog: - GBA: process audio at 2MHz instead of 32KHz¹ - MD: do not allow the 68K to stop the Z80, unless it has been granted bus access first - MD: do not reset bus requested/granted signals when the 68K resets the Z80 - the above two fix The Lost Vikings - MD: clean up the bus address decoding to be more readable - MD: add support for a13000-a130ff (#TIME) region; pass to cartridge I/O² - MD: emulate SRAM mapping used by >16mbit games; bank mapping used by >32mbit games³ - MD: add 'reset pending' flag so that loading save states won't reload 68K PC, SP registers - this fixes save state support ... mostly⁴ - MD: if DMA is not enabled, do not allow CD5 to be set [Cydrak] - this fixes in-game graphics for Ristar. Title screen still corrupted on first run - MD: detect and break sprite lists that form an infinite loop [Cydrak] - this fixes the emulator from dead-locking on certain games - MD: add DC offset to sign DAC PCM samples [Cydrak] - this improves audio in Sonic 3 - MD: 68K TAS has a hardware bug that prevents writing the result back to RAM - this fixes Gargoyles - MD: 68K TRAP should not change CPU interrupt level - this fixes Shining Force II, Shining in the Darkness, etc - icarus: better SRAM heuristics for Mega Drive games Todo: - need to serialize the new cartridge ramEnable, ramWritable, bank variables ¹: so technically, the GBA has its FIFO queue (raw PCM), plus a GB chipset. The GB audio runs at 2MHz. However, I was being lazy and running the sequencer 64 times in a row, thus decimating the audio to 32KHz. But simply discarding 63 out of every 64 samples resorts in muddier sound with more static in it. However ... increasing the audio thread processing intensity 64-fold, and requiring heavy-duty three-chain lowpass and highpass filters is not cheap. For this bump in sound quality, we're eating a loss of about 30% of previous performance. Also note that the GB audio emulation in the GBA core still lacks many of the improvements made to the GB core. I was hoping to complete the GB enhancements, but it seems like I'm never going to pass blargg's psychotic edge case tests. So, first I want to clean up the GB audio to my current coding standards, and then I'll port that over to the GBA, which should further increase sound quality. At that point, it sound exceed mGBA's audio quality (due to the ridiculously high sampling rate and strong-attenuation audio filtering.) ²: word writes are probably not handled correctly ... but games are only supposed to do byte writes here. ³: the SRAM mapping is used by games like "Story of Thor" and "Phantasy Star IV." Unfortunately, the former wasn't released in the US and is region protected. So you'll need to change the NTSU to NTSCJ in md/system/system.cpp in order to boot it. But it does work nicely now. The write protection bit is cleared in the game, and then it fails to write to SRAM (soooooooo many games with SRAM write protection do this), so for now I've had to disable checking that bit. Phantasy Star IV has a US release, but sadly the game doesn't boot yet. Hitting some other bug. The bank mapping is pretty much just for the 40mbit Super Street Fighter game. It shows the Sega and Capcom logos now, but is hitting yet another bug and deadlocking. For now, I emulate the SRAM/bank mapping registers on all cartridges, and set sane defaults. So long as games don't write to $a130XX, they should all continue to work. But obviously, we need to get to a point where higan/icarus can selectively enable these registers on a per-game basis. ⁴: so, the Mega Drive has various ways to lock a chip until another chip releases it. The VDP can lock the 68K, the 68K can lock the Z80, etc. If this happens when you save a state, it'll dead-lock the emulator. So that's obviously a problem that needs to be fixed. The fix will be nasty ... basically, bypassing the dead-lock, creating a miniature, one-instruction-long race condition. Extremely unlikely to cause any issues in practice (it's only a little worse than the SNES CPU/SMP desync), but ... there's nothing I can do about it. So you'll have to take it or leave it. But yeah, for now, save states may lock up the emulator. I need to add code to break the loops when in the process of creating a save state still. |
|
Tim Allen | 4c3f9b93e7 |
Update to v102r12 release.
byuu says: Changelog: - MD/PSG: fixed 68K bus Z80 status read address location - MS, GG, MD/PSG: channels post-decrement their counters, not pre-decrement [Cydrak]¹ - MD/VDP: cache screen width registers once per scanline; screen height registers once per frame - MD/VDP: support 256-width display mode (used in Shining Force, etc) - MD/YM2612: implemented timers² - MD/YM2612: implemented 8-bit PCM DAC² - 68000: TRAP instruction should index the vector location by 32 (eg by 128 bytes), fixes Shining Force - nall: updated hex(), octal(), binary() functions to take uintmax instead of template<typename T> parameter³ ¹: this one makes an incredible difference. Sie noticed that lots of games set a period of 0, which would end up being a really long period with pre-decrement. By fixing this, noise shows up in many more games, and sounds way better in games even where it did before. You can hear extra sound on Lunar - Sanposuru Gakuen's title screen, the noise in Sonic The Hedgehog (Mega Drive) sounds better, etc. ²: this also really helps sound. The timers allow PSG music to play back at the correct speed instead of playing back way too quickly. And the PCM DAC lets you hear a lot of drum effects, as well as the "Sega!!" sound at the start of Sonic the Hedgehog, and the infamous, "Rise from your grave!" line from Altered Beast. Still, most music on the Mega Drive comes from the FM channels, so there's still not a whole lot to listen to. I didn't implement Cydrak's $02c test register just yet. Sie wasn't 100% certain on how the extended DAC bit worked, so I'd like to play it a little conservative and get sound working, then I'll go back and add a toggle or something to enable undocumented registers, that way we can use that to detect any potential problems they might be causing. ³: unfortunately we lose support for using hex() on nall/arithmetic types. If I have a const Pair& version of the function, then the compiler gets confused on whether Natural<32> should use uintmax or const Pair&, because compilers are stupid, and you can't have explicit arguments in overloaded functions. So even though either function would work, it just decides to error out instead >_> This is actually really annoying, because I want hex() to be useful for printing out nall/crypto keys and hashes directly. But ... this change had to be made. Negative signed integers would crash programs, and that was taking out my 68000 disassembler. |
|
Tim Allen | d76c0c7e82 |
Update to v102r08 release.
byuu says: Changelog: - PCE: restructured VCE, VDCs to run one scanline at a time - PCE: bound VDCs to 1365x262 timing (in order to decouple the VDCs from the VCE) - PCE: the two changes above allow save states to function; also grants a minor speed boost - PCE: added cheat code support (uses 21-bit bus addressing; compare byte will be useful here) - 68K: fixed `mov *,ccr` to read two bytes instead of one [Cydrak] - Z80: emulated /BUSREQ, /BUSACK; allows 68K to suspend the Z80 [Cydrak] - MD: emulated the Z80 executing instructions [Cydrak] - MD: emulated Z80 interrupts (triggered during each Vblank period) [Cydrak] - MD: emulated Z80 memory map [Cydrak] - MD: added stubs for PSG, YM2612 accesses [Cydrak] - MD: improved bus emulation [Cydrak] The PCE core is pretty much ready to go. The only major feature missing is FM modulation. The Mega Drive improvements let us start to see the splash screens for Langrisser II, Shining Force, Shining in the Darkness. I was hoping I could get them in-game, but no such luck. My Z80 implementation is probably flawed in some way ... now that I think about it, I believe I missed the BusAPU::reset() check for having been granted access to the Z80 first. But I doubt that's the problem. Next step is to implement Cydrak's PSG core into the Master System emulator. Once that's in, I'm going to add save states and cheat code support to the Master System core. Next, I'll add the PSG core into the Mega Drive. Then I'll add the 'easy' PCM part of the YM2612. Then the rest of the beastly YM2612 core. Then finally, cap things off with save state and cheat code support. Should be nearing a new release at that point. |
|
Tim Allen | 5df717ff2a |
Update to v101r12 release.
byuu says: Changelog: - new md/bus/ module for bus reads/writes - abstracts byte/word accesses wherever possible (everything but RAM; forces all but I/O to word, I/O to byte) - holds the system RAM since that's technically not part of the CPU anyway - added md/controller and md/system/peripherals - added emulation of gamepads - added stub PSG audio output (silent) to cap the framerate at 60fps with audio sync enabled - fixed VSRAM reads for plane vertical scrolling (two bugs here: add instead of sub; interlave plane A/B) - mask nametable read offsets (can't exceed 8192-byte nametables apparently) - emulated VRAM/VSRAM/CRAM reads from VDP data port - fixed sprite width/height size calculations - added partial emulation of 40-tile per scanline limitation (enough to fix Sonic's title screen) - fixed off-by-one sprite range testing - fixed sprite tile indexing - Vblank happens at Y=224 with overscan disabled - unsure what happens when you toggle it between Y=224 and Y=240 ... probably bad things - fixed reading of address register for ADDA, CMPA, SUBA - fixed sign extension for MOVEA effect address reads - updated MOVEM to increment the read addresses (but not writeback) for (aN) mode With all of that out of the way, we finally have Sonic the Hedgehog (fully?) playable. I played to stage 1-2 and through the special stage, at least. EDIT: yeah, we probably need HIRQs for Labyrinth Zone. Not much else works, of course. Most games hang waiting on the Z80, and those that don't (like Altered Beast) are still royally screwed. Tons of features still missing; including all of the Z80/PSG/YM2612. A note on the perihperals this time around: the Mega Drive EXT port is basically identical to the regular controller ports. So unlike with the Famicom and Super Famicom, I'm inheriting the exension port from the controller class. |
|
Tim Allen | f7ddbfc462 |
Update to v101r11 release.
byuu says: Changelog: - 68K: fixed NEG/NEGX operand order - 68K: fixed bug in disassembler that was breaking trace logging - VDP: improved sprite rendering (still 100% broken) - VDP: added horizontal/vertical scrolling (90% broken) Forgot: - 68K: fix extension word sign bit on indexed modes for disassembler as well - 68K: emulate STOP properly (use r.stop flag; clear on IRQs firing) I'm really wearing out fast here. The Genesis documentation is somehow even worse than Game Boy documentation, but this is a far more complex system. It's a massive time sink to sit here banging away at every possible combination of how things could work, only to see no positive improvements. Nothing I do seems to get sprites to do a goddamn thing. squee says the sprite Y field is 10-bits, X field is 9-bits. genvdp says they're both 10-bits. BlastEm treats them like they're both 10-bits, then masks off the upper bit so it's effectively 9-bits anyway. Nothing ever bothers to tell you whether the horizontal scroll values are supposed to add or subtract from the current X position. Probably the most basic detail you could imagine for explaining horizontal scrolling and yet ... nope. Nothing. I can't even begin to understand how the VDP FIFO functionality works, or what the fuck is meant by "slots". I'm completely at a loss as how how in the holy hell the 68K works with 8-bit accesses. I don't know whether I need byte/word handlers for every device, or if I can just hook it right into the 68K core itself. This one's probably the most major design detail. I need to know this before I go and implement the PSG/YM2612/IO ports-\>gamepads/Z80/etc. Trying to debug the 68K is murder because basically every game likes to start with a 20,000,000-instruction reset phase of checksumming entire games, and clearing out the memory as agonizingly slowly as humanly possible. And like the ARM, there's too many registers so I'd need three widescreen monitors to comfortably view the entire debugger output lines onscreen. I can't get any test ROMs to debug functionality outside of full games because every **goddamned** test ROM coder thinks it's acceptable to tell people to go fetch some toolchain from a link that died in the late '90s and only works on MS-DOS 6.22 to build their fucking shit, because god forbid you include a 32KiB assembled ROM image in your fucking archives. ... I may have to take a break for a while. We'll see. |
|
Tim Allen | 0b70a01b47 |
Update to v101r10 release.
byuu says: Changelog: - 68K: MOVEQ is 8-bit signed - 68K: disassembler was print EOR for OR instructions - 68K: address/program-counter indexed mode had the signed-word/long bit backward - 68K: ADDQ/SUBQ #n,aN always works in long mode; regardless of size - 68K→VDP DMA needs to use `mode.bit(0)<<22|dmaSource`; increment by one instead of two - Z80: added registers and initial two instructions - MS: hooked up enough to load and start running games - Sonic the Hedgehog can execute exactly one instruction... whoo. |
|
Tim Allen | 4d2e17f9c0 |
Update to v101r09 release.
byuu says: Sorry, two WIPs in one day. Got excited and couldn't wait. Changelog: - ADDQ, SUBQ shouldn't update flags when targeting an address register - ADDA should sign extend effective address reads - JSR was pushing the PC too early - some improvements to 8-bit register reads on the VDP (still needs work) - added H/V counter reads to the VDP IO port region - icarus: added support for importing Master System and Game Gear ROMs - tomoko: added library sub-menus for each manufacturer - still need to sort Game Gear after Mega Drive somehow ... The sub-menu system actually isn't all that bad. It is indeed a bit more annoying, but not as annoying as I thought it was going to be. However, it looks a hell of a lot nicer now. |
|
Tim Allen | 043f6a8b33 |
Update to v101r08 release.
byuu says: Changelog: - 68K: fixed read-modify-write instructions - 68K: fixed ADDX bug (using wrong target) - 68K: fixed major bug with SUB using wrong argument ordering - 68K: fixed sign extension when reading address registers from effective addressing - 68K: fixed sign extension on CMPA, SUBA instructions - VDP: improved OAM sprite attribute table caching behavior - VDP: improved DMA fill operation behavior - added Master System / Game Gear stubs (needed for developing the Z80 core) |
|
Tim Allen | ac2d0ba1cf |
Update to v101r05 release.
byuu says: Changelog: - 68K: fixed bug that affected BSR return address - VDP: added very preliminary emulation of planes A, B, W (W is entirely broken though) - VDP: added command/address stuff so you can write to VRAM, CRAM, VSRAM - VDP: added VRAM fill DMA I would be really surprised if any commercial games showed anything at all, so I'd probably recommend against wasting your time trying, unless you're really bored :P Also, I wanted to add: I am accepting patches\! So if anyone wants to look over the 68K core for bugs, that would save me untold amounts of time in the near future :D |
|
Tim Allen | 1df2549d18 |
Update to v101r04 release.
byuu says: Changelog: - pulled the (u)intN type aliases into higan instead of leaving them in nall - added 68K LINEA, LINEF hooks for illegal instructions - filled the rest of the 68K lambda table with generic instance of ILLEGAL - completed the 68K disassembler effective addressing modes - still unsure whether I should use An to decode absolute addresses or not - pro: way easier to read where accesses are taking place - con: requires An to be valid; so as a disassembler it does a poor job - making it optional: too much work; ick - added I/O decoding for the VDP command-port registers - added skeleton timing to all five processor cores - output at 1280x480 (needed for mixed 256/320 widths; and to handle interlace modes) The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and syncing; which is the pathological worst case for libco. But they also have no logic inside of them. With all the above, I'm averaging around 250fps with just the 68K core actually functional, and the VDP doing a dumb "draw white pixels" loop. Still way too early to tell how this emulator is going to perform. Also, the 320x240 mode of the Genesis means that we don't need an aspect correction ratio. But we do need to ensure the output window is a multiple 320x240 so that the scale values work correctly. I was hard-coding aspect correction to stretch the window an additional \*8/7. But that won't work anymore so ... the main higan window is now 640x480, 960x720, or 1280x960. Toggling aspect correction only changes the video width inside the window. It's a bit jarring ... the window is a lot wider, more black space now for most modes. But for now, it is what it is. |
|
Tim Allen | 9b8c3ff8c0 |
Update to v101r03 release.
byuu says: The 68K core now implements all 88 instructions. It ended up being 111 instructions in my core due to splitting up opcodes with the same name but different addressing modes or directions (removes conditions at the expense of more code.) Technically, I don't have exceptions actually implemented yet, and RESET/STOP don't do anything but set flags. So there's still more to go. But ... close enough for statistics time! The M68K core source code is 124,712 bytes in size. The next largest core is the ARM7 core at 70,203 bytes in size. The M68K object size is 942KiB; with the next largest being the V30MZ core at 173KiB. There are a total of 19,656 invalid opcodes in the 68000 revision (unless of course I've made mistakes in my mappings, which is very probably.) Now the fun part ... figuring out how to fix bugs in this core without VDP emulation :/ |
|
Tim Allen | 0a57cac70c |
Update to v101r02 release.
byuu says: Changelog: - Emulator: use `(uintmax)-1 >> 1` for the units of time - MD: implemented 13 new 68K instructions (basically all of the remaining easy ones); 21 remain - nall: replaced `(u)intmax_t` (64-bit) with *actual* `(u)intmax` type (128-bit where available) - this extends to everything: atoi, string, etc. You can even print 128-bit variables if you like 22,552 opcodes still don't exist in the 68K map. Looking like quite a few entries will be blank once I finish. |
|
Tim Allen | 8bdf8f2a55 |
Update to v101r01 release.
byuu says: Changelog: - added eight more 68K instructions - split ADD(direction) into two separate ADD functions I now have 54 out of 88 instructions implemented (thus, 34 remaining.) The map is missing 25,182 entries out of 65,536. Down from 32,680 for v101.00 Aside: this version number feels really silly. r10 and r11 surely will as well ... |
|
Tim Allen | c50723ef61 |
Update to v100r15 release.
byuu wrote: Aforementioned scheduler changes added. Longer explanation of why here: http://hastebin.com/raw/toxedenece Again, we really need to test this as thoroughly as possible for regressions :/ This is a really major change that affects absolutely everything: all emulation cores, all coprocessors, etc. Also added ADDX and SUB to the 68K core, which brings us just barely above 50% of the instruction encoding space completed. [Editor's note: The "aformentioned scheduler changes" were described in a previous forum post: Unfortunately, 64-bits just wasn't enough precision (we were getting misalignments ~230 times a second on 21/24MHz clocks), so I had to move to 128-bit counters. This of course doesn't exist on 32-bit architectures (and probably not on all 64-bit ones either), so for now ... higan's only going to compile on 64-bit machines until we figure something out. Maybe we offer a "lower precision" fallback for machines that lack uint128_t or something. Using the booth algorithm would be way too slow. Anyway, the precision is now 2^-96, which is roughly 10^-29. That puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly referring to it as the byuusecond. The other 32-bits of precision allows a 1Hz clock to run up to one full second before all clocks need to be normalized to prevent overflow. I fixed a serious wobbling issue where I was using clock > other.clock for synchronization instead of clock >= other.clock; and also another aliasing issue when two threads share a common frequency, but don't run in lock-step. The latter I don't even fully understand, but I did observe it in testing. nall/serialization.hpp has been extended to support 128-bit integers, but without explicitly naming them (yay generic code), so nall will still compile on 32-bit platforms for all other applications. Speed is basically a wash now. FC's a bit slower, SFC's a bit faster. The "longer explanation" in the linked hastebin is: Okay, so the idea is that we can have an arbitrary number of oscillators. Take the SNES: - CPU/PPU clock = 21477272.727272hz - SMP/DSP clock = 24576000hz - Cartridge DSP1 clock = 8000000hz - Cartridge MSU1 clock = 44100hz - Controller Port 1 modem controller clock = 57600hz - Controller Port 2 barcode battler clock = 115200hz - Expansion Port exercise bike clock = 192000hz Is this a pathological case? Of course it is, but it's possible. The first four do exist in the wild already: see Rockman X2 MSU1 patch. Manifest files with higan let you specify any frequency you want for any component. The old trick higan used was to hold an int64 counter for each thread:thread synchronization, and adjust it like so: - if thread A steps X clocks; then clock += X * threadB.frequency - if clock >= 0; switch to threadB - if thread B steps X clocks; then clock -= X * threadA.frequency - if clock < 0; switch to threadA But there are also system configurations where one processor has to synchronize with more than one other processor. Take the Genesis: - the 68K has to sync with the Z80 and PSG and YM2612 and VDP - the Z80 has to sync with the 68K and PSG and YM2612 - the PSG has to sync with the 68K and Z80 and YM2612 Now I could do this by having an int64 clock value for every association. But these clock values would have to be outside the individual Thread class objects, and we would have to update every relationship's clock value. So the 68K would have to update the Z80, PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds per clock step event instead of one. As such, we have to account for both possibilities. The only way to do this is with a single time base. We do this like so: - setup: scalar = timeBase / frequency - step: clock += scalar * clocks Once per second, we look at every thread, find the smallest clock value. Then subtract that value from all threads. This prevents the clock counters from overflowing. Unfortunately, these oscillator values are psychotic, unpredictable, and often times repeating fractions. Even with a timeBase of 1,000,000,000,000,000,000 (one attosecond); we get rounding errors every ~16,300 synchronizations. Specifically, this happens with a CPU running at 21477273hz (rounded) and SMP running at 24576000hz. That may be good enough for most emulators, but ... you know how I am. Plus, even at the attosecond level, we're really pushing against the limits of 64-bit integers. Given the reciprocal inverse, a frequency of 1Hz (which does exist in higan!) would have a scalar that consumes 1/18th of the entire range of a uint64 on every single step. Yes, I could raise the frequency, and then step by that amount, I know. But I don't want to have weird gotchas like that in the scheduler core. Until I increase the accuracy to about 100 times greater than a yoctosecond, the rounding errors are too great. And since the only choice above 64-bit values is 128-bit values; we might as well use all the extra headroom. 2^-96 as a timebase gives me the ability to have both a 1Hz and 4GHz clock; and run them both for a full second; before an overflow event would occur. Another hastebin includes demonstration code: #include <libco/libco.h> #include <nall/nall.hpp> using namespace nall; // cothread_t mainThread = nullptr; const uint iterations = 100'000'000; const uint cpuFreq = 21477272.727272 + 0.5; const uint smpFreq = 24576000.000000 + 0.5; const uint cpuStep = 4; const uint smpStep = 5; // struct ThreadA { cothread_t handle = nullptr; uint64 frequency = 0; int64 clock = 0; auto create(auto (*entrypoint)() -> void, uint frequency) { this->handle = co_create(65536, entrypoint); this->frequency = frequency; this->clock = 0; } }; struct CPUA : ThreadA { static auto Enter() -> void; auto main() -> void; CPUA() { create(&CPUA::Enter, cpuFreq); } } cpuA; struct SMPA : ThreadA { static auto Enter() -> void; auto main() -> void; SMPA() { create(&SMPA::Enter, smpFreq); } } smpA; uint8 queueA[iterations]; uint offsetA; cothread_t resumeA = cpuA.handle; auto EnterA() -> void { offsetA = 0; co_switch(resumeA); } auto QueueA(uint value) -> void { queueA[offsetA++] = value; if(offsetA >= iterations) { resumeA = co_active(); co_switch(mainThread); } } auto CPUA::Enter() -> void { while(true) cpuA.main(); } auto CPUA::main() -> void { QueueA(1); smpA.clock -= cpuStep * smpA.frequency; if(smpA.clock < 0) co_switch(smpA.handle); } auto SMPA::Enter() -> void { while(true) smpA.main(); } auto SMPA::main() -> void { QueueA(2); smpA.clock += smpStep * cpuA.frequency; if(smpA.clock >= 0) co_switch(cpuA.handle); } // struct ThreadB { cothread_t handle = nullptr; uint128_t scalar = 0; uint128_t clock = 0; auto print128(uint128_t value) { string s; while(value) { s.append((char)('0' + value % 10)); value /= 10; } s.reverse(); print(s, "\n"); } //femtosecond (10^15) = 16306 //attosecond (10^18) = 688838 //zeptosecond (10^21) = 13712691 //yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble) //byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond) auto create(auto (*entrypoint)() -> void, uint128_t frequency) { this->handle = co_create(65536, entrypoint); uint128_t unitOfTime = 1; //for(uint n : range(29)) unitOfTime *= 10; unitOfTime <<= 96; //2^96 time units ... this->scalar = unitOfTime / frequency; print128(this->scalar); this->clock = 0; } auto step(uint128_t clocks) -> void { clock += clocks * scalar; } auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); } }; struct CPUB : ThreadB { static auto Enter() -> void; auto main() -> void; CPUB() { create(&CPUB::Enter, cpuFreq); } } cpuB; struct SMPB : ThreadB { static auto Enter() -> void; auto main() -> void; SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; } } smpB; auto correct() -> void { auto minimum = min(cpuB.clock, smpB.clock); cpuB.clock -= minimum; smpB.clock -= minimum; } uint8 queueB[iterations]; uint offsetB; cothread_t resumeB = cpuB.handle; auto EnterB() -> void { correct(); offsetB = 0; co_switch(resumeB); } auto QueueB(uint value) -> void { queueB[offsetB++] = value; if(offsetB >= iterations) { resumeB = co_active(); co_switch(mainThread); } } auto CPUB::Enter() -> void { while(true) cpuB.main(); } auto CPUB::main() -> void { QueueB(1); step(cpuStep); synchronize(smpB); } auto SMPB::Enter() -> void { while(true) smpB.main(); } auto SMPB::main() -> void { QueueB(2); step(smpStep); synchronize(cpuB); } // #include <nall/main.hpp> auto nall::main(string_vector) -> void { mainThread = co_active(); uint masterCounter = 0; while(true) { print(masterCounter++, " ...\n"); auto A = clock(); EnterA(); auto B = clock(); print((double)(B - A) / CLOCKS_PER_SEC, "s\n"); auto C = clock(); EnterB(); auto D = clock(); print((double)(D - C) / CLOCKS_PER_SEC, "s\n"); for(uint n : range(iterations)) { if(queueA[n] != queueB[n]) return print("fail at ", n, "\n"); } } } ...and that's everything.] |
|
Tim Allen | 306cac2b54 |
Update to v100r13 release.
byuu says: Changelog: M68K improvements, new instructions added. |
|
Tim Allen | f230d144b5 |
Update to v100r12 release.
byuu says: All of the above fixes, plus I added all 24 variations on the shift opcodes, plus SUBQ, plus fixes to the BCC instruction. I can now run 851,767 instructions into Sonic the Hedgehog before hitting an unimplemented instruction (SUB). The 68K core is probably only ~35% complete, and yet it's already within 4KiB of being the largest CPU core, code size wise, in all of higan. Fuck this chip. |
|
Tim Allen | 7ccfbe0206 |
Update to v100r11 release.
byuu says: I split the Register class and read/write handlers into DataRegister and AddressRegister, given that they have different behaviors on byte/word accesses (data tends to preserve the upper bits; address tends to sign-extend things.) I expanded EA to EffectiveAddress. No sense in abbreviating things to death. I've now implemented 26 instructions. But the new ones are just all the stupid from/to ccr/sr instructions. Ryphecha confirmed that you can't set the undefined bits, so I don't think the BitField concept is appropriate for the CCR/SR. Instead, I'm just storing direct flags and have (read,write)(CCR,SR) instead. This isn't like the 65816 where you have subroutines that push and pop the flag register. It's much more common to access individual flags. Doesn't match the consistency angle of the other CPU cores, but ... I think this is the right thing to for the 68K specifically. |
|
Tim Allen | 4b897ba791 |
Update to v100r10 release.
byuu says: Redesigned the handling of reading/writing registers to be about eight times faster than the old system. More work may be needed ... it seems data registers tend to preserve their upper bits upon assignment; whereas address registers tend to sign-extend values into them. It may make sense to have DataRegister and AddressRegister classes with separate read/write handlers. I'd have to hold two Register objects inside the EffectiveAddress (EA) class if we do that. Implemented 19 opcodes now (out of somewhere between 60 and 90.) That gets the first ~530,000 instructions in Sonic the Hedgehog running (though probably wrong. But we can run a lot thanks to large initialization loops.) If I force the core to loop back to the reset vector on an invalid opcode, I'm getting about 1500fps with a dumb 320x240 blit 60 times a second and just the 68K running alone (no Z80, PSG, VDP, YM2612.) I don't know if that's good or not. I guess we'll find out. I had to stop tonight because the final opcode I execute is an RTS (return from subroutine) that's branching back to address 0; which is invalid ... meaning something went terribly wrong and the system crashed. |
|
Tim Allen | be3f6ac0d5 |
Update to v100r09 release.
byuu says: Another six hours in ... I have all of the opcodes, memory access functions, disassembler mnemonics and table building converted over to the new template<uint Size> format. Certainly, it would be quite easy for this nightmare chip to throw me another curveball, but so far I can handle: - MOVE (EA to, EA from) case - read(from) has to update register index for +/-(aN) mode - MOVEM (EA from) case - when using +/-(aN), RA can't actually be updated until the transfer is completed - LEA (EA from) case - doesn't actually perform the final read; just returns the address to be read from - ANDI (EA from-and-to) case - same EA has to be read from and written to - for -(aN), the read has to come from aN-2, but can't update aN yet; so that the write also goes to aN-2 - no opcode can ever fetch the extension words more than once - manually control the order of extension word fetching order for proper opcode decoding To do all of that without a whole lot of duplicated code (or really bloating out every single instruction with red tape), I had to bring back the "bool valid / uint32 address" variables inside the EA struct =( If weird exceptions creep in like timing constraints only on certain opcodes, I can use template flags to the EA read/write functions to handle that. |
|
Tim Allen | 92fe5b0813 |
Update to v100r08 release.
byuu says: Six and a half hours this time ... one new opcode, and all old opcodes now in a deprecated format. Hooray, progress! For building the table, I've decided to move from: for(uint opcode : range(65536)) { if(match(...)) bind(opNAME, ...); } To instead having separate for loops for each supported opcode. This lets me specialize parts I want with templates. And to this aim, I'm moving to replace all of the (read,write)(size, ...) functions with (read,write)<Size>(...) functions. This will amount to the ~70ish instructions being triplicated ot ~210ish instructions; but I think this is really important. When I was getting into flag calculations, a ton of conditionals were needed to mask sizes to byte/word/long. There was also lots of conditionals in all the memory access handlers. The template code is ugly, but we eliminate a huge amount of branch conditions this way. |
|
Tim Allen | 059347e575 |
Update to v100r07 release.
byuu says: Four and a half hours of work and ... zero new opcodes implemented. This was the best job I could do refining the effective address computations. Should have all twelve 68000 modes implemented now. Still have a billion questions about when and how I'm supposed to perform certain edge case operations, though. |
|
Tim Allen | 0d6a09f9f8 |
Update to v100r06 release.
byuu says: Up to ten 68K instructions out of somewhere between 61 and 88, depending upon which PDF you look at. Of course, some of them aren't 100% completed yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant that needs stack push/pop functions. This WIP actually took over eight hours to make, going through every possible permutation on how to design the core itself. The updated design now builds both the instruction decoder+dispatcher and the disassembler decoder into the same main loop during M68K's constructor. The special cases are also really psychotic on this processor, and I'm afraid of missing something via the fallthrough cases. So instead, I'm ordering the instructions alphabetically, and including exclusion cases to ignore binding invalid cases. If I end up remapping an existing register, then it'll throw a run-time assertion at program startup. I wanted very much to get rid of struct EA (EffectiveAddress), but it's too difficult to keep track of the internal effective address without it. So I split out the size to a separate parameter, since every opcode only has one size parameter, and otherwise it was getting duplicated in opcodes that take two EAs, and was also awkward with the flag testing. It's a bit more typing, but I feel it's more clean this way. Overall, I'm really worried this is going to be too slow. I don't want to turn the EA stuff into templates, because that will massively bloat out compilation times and object sizes, and will also need a special DSL preprocessor since C++ doesn't have a static for loop. I can definitely optimize a lot of EA's address/read/write functions away once the core is completed, but it's never going to hold a candle to a templatized 68K core. ---- Forgot to include the SA-1 regression fix. I always remember immediately after I upload and archive the WIP. Will try to get that in next time, I guess. |
|
Tim Allen | b72f35a13e |
Update to v100r05 release.
byuu says: Alright, I'm definitely going to need to find some people willing to tolerate my questions on this chip, so I'm going to go ahead and announce I'm working on this I guess. This core is way too big for a surprise like the NES and WS cores were. It'll probably even span multiple v10x releases before it's even ready. |
|
Tim Allen | 1c0ef793fe |
Update to v100r04 release.
byuu says: I now have enough of three instructions implemented to get through the first four instructions in Sonic the Hedgehog. But they're far from complete. The very first instruction uses EA addressing, which is similar to x86's ModRM in terms of how disgustingly complex it is. And it also accesses Z80 control registers, which obviously isn't going to do anything yet. The slow speed was me being stupid again. It's not 7.6MHz per frame, it's 7.67MHz per second. So yeah, speed is so far acceptable again. But we'll see how things go as I keep emulating more. The 68K decode is not pretty at all. |