mirror of https://github.com/bsnes-emu/bsnes.git
9 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Tim Allen | 78f341489e |
Update to v103r03 release.
byuu says: Changelog: - md/psg: fixed output frequency rate regression from v103r02 - processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr, SuperMikeMan] - processor/spc700: renamed abbreviated instructions to functional descriptions (eg `XCN` → `ExchangeNibble`) - processor/spc700: removed memory.cpp shorthand functions (fetch, load, store, pull, push) - processor/spc700: updated all instructions to follow cycle behavior as documented by Overload with a logic analyzer Once again, the changes to the SPC700 core are really quite massive. And this time it's not just cosmetic: the idle cycles have been updated to pull from various memory addresses. This is why I removed the shorthand functions -- so that I could handle the at-times very bizarre addresses the SPC700 has on its address bus during its idle cycles. There is one behavior Overload mentioned that I don't emulate ... one of the cycles of the (X) transfer functions seems to not actually access the $f0-ff internal SMP registers? I don't fully understand what Overload is getting at, so I haven't tried to support it just yet. Also, there are limits to logic analyzers. In many cases the same address is read from twice consecutively. It is unclear which of the two reads the SPC700 actually utilizes. I tried to choose the most logical values (usually the first one), but ... I don't know that we'll be able to figure this one out. It's going to be virtually impossible to test this through software, because the PC can't really execute out of registers that have side effects on reads. |
|
Tim Allen | e7806dd6e8 |
Update to v102r27 release.
byuu says: Changelog: - processor/gsu: minor code cleanup - processor/hg51b: renamed reg(Read,Write) to register(Read,Write) - processor/lr35902: minor code cleanup - processor/spc700: completed code cleanup (sans disassembler) - no longer uses internal global state inside instructions - processor/spc700: will no longer hang the emulator if stuck in a WAI (SLEEP) or STP (STOP) instruction - processor/spc700: fixed bug in handling of OR1 and AND1 instructions - processor/z80: minor code cleanup - sfc/dsp: revert to initializing registers to 0x00; save for ENDX=random(), FLG=0xe0 [Jonas Quinn] Major testing of the SNES game library would be appreciated, now that its CPU cores have all been revised. We know the DSP registers read back as randomized data ... mostly, but there are apparently internal latches, which we can't emulate with the current DSP design. So until we know which registers have separate internal state that actually *is* initialized, I'm going to play it safe and not break more games. Thanks again to Jonas Quinn for the continued research into this issue. EDIT: that said ... `MD works if((ENDX&0x30) > 0)` is only a 3:4 chance that the game will work. That seems pretty unlikely that the odds of it working are that low, given hardware testing by others in the past :/ I thought if worked if `PITCH != 0` before, which would have been way more likely. The two remaining CPU cores that need major cleanup efforts are the LR35902 and ARM cores. Both are very large, complicated, annoying cores that will probably be better off as full rewrites from scratch. I don't think I want to delay v103 in trying to accomplish that, however. So I think it'll be best to focus on allowing the Mega Drive core to not lock when processors are frozen waiting on a response from other processors during a save state operation. Then we should be good for a new release. |
|
Tim Allen | 50411a17d1 |
Update to v102r26 release.
byuu says: Changelog: - md/ym2612: initialize DAC sample to center volume [Cydrak] - processor/arm: add accumulate mode extra cycle to mlal [Jonas Quinn] - processor/huc6280: split off algorithms, improve naming of functions - processor/mos6502: split off algorithms - processor/spc700: major revamp of entire core (~50% completed) - processor/wdc65816: fixed several bugs introduced by rewrite For the SPC700, this turns out to be very old code as well, with global object state variables, those annoying `{Boolean,Natural}BitField` types, `under_case` naming conventions, heavily abbreviated function names, etc. I'm working to get the code to be in the same design as the MOS6502, HuC6280, WDC65816 cores, since they're all extremely similar in terms of architectural design (the SPC700 is more of an off-label reimplementation of a 6502 core, but still.) The main thing left is that about 90% of the actual instructions still need to be adapted to not use the internal state (`aa`, `rd`, `dp`, `sp`, `bit` variables.) I wanted to finish this today, but ran out of time before work. I wouldn't suggest too much testing just yet. We should wait until the SPC700 core is finished for that. However, if some does want to and spots regressions, please let me know. |
|
Tim Allen | c50723ef61 |
Update to v100r15 release.
byuu wrote: Aforementioned scheduler changes added. Longer explanation of why here: http://hastebin.com/raw/toxedenece Again, we really need to test this as thoroughly as possible for regressions :/ This is a really major change that affects absolutely everything: all emulation cores, all coprocessors, etc. Also added ADDX and SUB to the 68K core, which brings us just barely above 50% of the instruction encoding space completed. [Editor's note: The "aformentioned scheduler changes" were described in a previous forum post: Unfortunately, 64-bits just wasn't enough precision (we were getting misalignments ~230 times a second on 21/24MHz clocks), so I had to move to 128-bit counters. This of course doesn't exist on 32-bit architectures (and probably not on all 64-bit ones either), so for now ... higan's only going to compile on 64-bit machines until we figure something out. Maybe we offer a "lower precision" fallback for machines that lack uint128_t or something. Using the booth algorithm would be way too slow. Anyway, the precision is now 2^-96, which is roughly 10^-29. That puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly referring to it as the byuusecond. The other 32-bits of precision allows a 1Hz clock to run up to one full second before all clocks need to be normalized to prevent overflow. I fixed a serious wobbling issue where I was using clock > other.clock for synchronization instead of clock >= other.clock; and also another aliasing issue when two threads share a common frequency, but don't run in lock-step. The latter I don't even fully understand, but I did observe it in testing. nall/serialization.hpp has been extended to support 128-bit integers, but without explicitly naming them (yay generic code), so nall will still compile on 32-bit platforms for all other applications. Speed is basically a wash now. FC's a bit slower, SFC's a bit faster. The "longer explanation" in the linked hastebin is: Okay, so the idea is that we can have an arbitrary number of oscillators. Take the SNES: - CPU/PPU clock = 21477272.727272hz - SMP/DSP clock = 24576000hz - Cartridge DSP1 clock = 8000000hz - Cartridge MSU1 clock = 44100hz - Controller Port 1 modem controller clock = 57600hz - Controller Port 2 barcode battler clock = 115200hz - Expansion Port exercise bike clock = 192000hz Is this a pathological case? Of course it is, but it's possible. The first four do exist in the wild already: see Rockman X2 MSU1 patch. Manifest files with higan let you specify any frequency you want for any component. The old trick higan used was to hold an int64 counter for each thread:thread synchronization, and adjust it like so: - if thread A steps X clocks; then clock += X * threadB.frequency - if clock >= 0; switch to threadB - if thread B steps X clocks; then clock -= X * threadA.frequency - if clock < 0; switch to threadA But there are also system configurations where one processor has to synchronize with more than one other processor. Take the Genesis: - the 68K has to sync with the Z80 and PSG and YM2612 and VDP - the Z80 has to sync with the 68K and PSG and YM2612 - the PSG has to sync with the 68K and Z80 and YM2612 Now I could do this by having an int64 clock value for every association. But these clock values would have to be outside the individual Thread class objects, and we would have to update every relationship's clock value. So the 68K would have to update the Z80, PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds per clock step event instead of one. As such, we have to account for both possibilities. The only way to do this is with a single time base. We do this like so: - setup: scalar = timeBase / frequency - step: clock += scalar * clocks Once per second, we look at every thread, find the smallest clock value. Then subtract that value from all threads. This prevents the clock counters from overflowing. Unfortunately, these oscillator values are psychotic, unpredictable, and often times repeating fractions. Even with a timeBase of 1,000,000,000,000,000,000 (one attosecond); we get rounding errors every ~16,300 synchronizations. Specifically, this happens with a CPU running at 21477273hz (rounded) and SMP running at 24576000hz. That may be good enough for most emulators, but ... you know how I am. Plus, even at the attosecond level, we're really pushing against the limits of 64-bit integers. Given the reciprocal inverse, a frequency of 1Hz (which does exist in higan!) would have a scalar that consumes 1/18th of the entire range of a uint64 on every single step. Yes, I could raise the frequency, and then step by that amount, I know. But I don't want to have weird gotchas like that in the scheduler core. Until I increase the accuracy to about 100 times greater than a yoctosecond, the rounding errors are too great. And since the only choice above 64-bit values is 128-bit values; we might as well use all the extra headroom. 2^-96 as a timebase gives me the ability to have both a 1Hz and 4GHz clock; and run them both for a full second; before an overflow event would occur. Another hastebin includes demonstration code: #include <libco/libco.h> #include <nall/nall.hpp> using namespace nall; // cothread_t mainThread = nullptr; const uint iterations = 100'000'000; const uint cpuFreq = 21477272.727272 + 0.5; const uint smpFreq = 24576000.000000 + 0.5; const uint cpuStep = 4; const uint smpStep = 5; // struct ThreadA { cothread_t handle = nullptr; uint64 frequency = 0; int64 clock = 0; auto create(auto (*entrypoint)() -> void, uint frequency) { this->handle = co_create(65536, entrypoint); this->frequency = frequency; this->clock = 0; } }; struct CPUA : ThreadA { static auto Enter() -> void; auto main() -> void; CPUA() { create(&CPUA::Enter, cpuFreq); } } cpuA; struct SMPA : ThreadA { static auto Enter() -> void; auto main() -> void; SMPA() { create(&SMPA::Enter, smpFreq); } } smpA; uint8 queueA[iterations]; uint offsetA; cothread_t resumeA = cpuA.handle; auto EnterA() -> void { offsetA = 0; co_switch(resumeA); } auto QueueA(uint value) -> void { queueA[offsetA++] = value; if(offsetA >= iterations) { resumeA = co_active(); co_switch(mainThread); } } auto CPUA::Enter() -> void { while(true) cpuA.main(); } auto CPUA::main() -> void { QueueA(1); smpA.clock -= cpuStep * smpA.frequency; if(smpA.clock < 0) co_switch(smpA.handle); } auto SMPA::Enter() -> void { while(true) smpA.main(); } auto SMPA::main() -> void { QueueA(2); smpA.clock += smpStep * cpuA.frequency; if(smpA.clock >= 0) co_switch(cpuA.handle); } // struct ThreadB { cothread_t handle = nullptr; uint128_t scalar = 0; uint128_t clock = 0; auto print128(uint128_t value) { string s; while(value) { s.append((char)('0' + value % 10)); value /= 10; } s.reverse(); print(s, "\n"); } //femtosecond (10^15) = 16306 //attosecond (10^18) = 688838 //zeptosecond (10^21) = 13712691 //yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble) //byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond) auto create(auto (*entrypoint)() -> void, uint128_t frequency) { this->handle = co_create(65536, entrypoint); uint128_t unitOfTime = 1; //for(uint n : range(29)) unitOfTime *= 10; unitOfTime <<= 96; //2^96 time units ... this->scalar = unitOfTime / frequency; print128(this->scalar); this->clock = 0; } auto step(uint128_t clocks) -> void { clock += clocks * scalar; } auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); } }; struct CPUB : ThreadB { static auto Enter() -> void; auto main() -> void; CPUB() { create(&CPUB::Enter, cpuFreq); } } cpuB; struct SMPB : ThreadB { static auto Enter() -> void; auto main() -> void; SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; } } smpB; auto correct() -> void { auto minimum = min(cpuB.clock, smpB.clock); cpuB.clock -= minimum; smpB.clock -= minimum; } uint8 queueB[iterations]; uint offsetB; cothread_t resumeB = cpuB.handle; auto EnterB() -> void { correct(); offsetB = 0; co_switch(resumeB); } auto QueueB(uint value) -> void { queueB[offsetB++] = value; if(offsetB >= iterations) { resumeB = co_active(); co_switch(mainThread); } } auto CPUB::Enter() -> void { while(true) cpuB.main(); } auto CPUB::main() -> void { QueueB(1); step(cpuStep); synchronize(smpB); } auto SMPB::Enter() -> void { while(true) smpB.main(); } auto SMPB::main() -> void { QueueB(2); step(smpStep); synchronize(cpuB); } // #include <nall/main.hpp> auto nall::main(string_vector) -> void { mainThread = co_active(); uint masterCounter = 0; while(true) { print(masterCounter++, " ...\n"); auto A = clock(); EnterA(); auto B = clock(); print((double)(B - A) / CLOCKS_PER_SEC, "s\n"); auto C = clock(); EnterB(); auto D = clock(); print((double)(D - C) / CLOCKS_PER_SEC, "s\n"); for(uint n : range(iterations)) { if(queueA[n] != queueB[n]) return print("fail at ", n, "\n"); } } } ...and that's everything.] |
|
Tim Allen | 50420e3dd2 |
Update to v098r19 release.
byuu says: Changelog: - added nall/bit-field.hpp - updated all CPU cores (sans LR35902 due to some complexities) to use BitFields instead of bools - updated as many CPU cores as I could to use BitFields instead of union { struct { uint8_t ... }; }; pairs The speed changes are mostly a wash for this. In some instances, I noticed a ~2-3% speedup (eg SNES emulation), and in others a 2-3% slowdown (eg Famicom emulation.) It's within the margin of error, so it's safe to say it has no impact. This does give us a lot of new useful things, however: - no more manual reconstruction of flag values from lots of left shifts and ORs - no more manual deconstruction of flag values from lots of ANDs - ability to get completely free aliases to flag groups (eg GSU can provide alt2, alt1 and also alt (which is alt2,alt1 combined) - removes the need for the nasty order_lsbN macro hack (eventually will make higan 100% endian independent) - saves us from insane compilers that try and do nasty things with alignment on union-structs - saves us from insane compilers that try to store bit-field bits in reverse order - will allow some really novel new use cases (I'm planning an instant-decode ARM opcode function, for instance.) - reduces code size (we can serialize flag registers in one line instead of one for each flag) However, I probably won't use it for super critical code that's constantly reading out register values (eg PPU MMIO registers.) I think there we would end up with a performance penalty. |
|
Tim Allen | 47d4bd4d81 |
Update to v096r01 release.
byuu says: Changelog: - restructured the project and removed a whole bunch of old/dead directives from higan/GNUmakefile - huge amounts of work on hiro/cocoa (compiles but ~70% of the functionality is commented out) - fixed a masking error in my ARM CPU disassembler [Lioncash] - SFC: decided to change board cic=(411,413) back to board region=(ntsc,pal) ... the former was too obtuse If you rename Boolean (it's a problem with an include from ruby, not from hiro) and disable all the ruby drivers, you can compile an OS X binary, but obviously it's not going to do anything. It's a boring WIP, I just wanted to push out the project structure change now at the start of this WIP cycle. |
|
Tim Allen | 4e2eb23835 |
Update to v093 release.
byuu says: Changelog: - added Cocoa target: higan can now be compiled for OS X Lion [Cydrak, byuu] - SNES/accuracy profile hires color blending improvements - fixes Marvelous text [AWJ] - fixed a slight bug in SNES/SA-1 VBR support caused by a typo - added support for multi-pass shaders that can load external textures (requires OpenGL 3.2+) - added game library path (used by ananke->Import Game) to Settings->Advanced - system profiles, shaders and cheats database can be stored in "all users" shared folders now (eg /usr/share on Linux) - all configuration files are in BML format now, instead of XML (much easier to read and edit this way) - main window supports drag-and-drop of game folders (but not game files / ZIP archives) - audio buffer clears when entering a modal loop on Windows (prevents audio repetition with DirectSound driver) - a substantial amount of code clean-up (probably the biggest refactoring to date) One highly desired target for this release was to default to the optimal drivers instead of the safest drivers, but because AMD drivers don't seem to like my OpenGL 3.2 driver, I've decided to postpone that. AMD has too big a market share. Hopefully with v093 officially released, we can get some public input on what AMD doesn't like. |
|
Tim Allen | 29ea5bd599 |
Update to v092r09 release.
byuu says: This will be another massive diff from the previous version. All of higan was updated to use the new foo& bar syntax, and I also updated switch statements to be consistent as well (but not in the disassemblers, was starting to get an RSI just from what I already did.) phoenix/{windows, cocoa, qt} need to be updated to use "string foo" instead of "const string& foo", and after that, the major diffs should be finished. This archive is the first time I'm posting my copy-on-write, size+capacity nall::string class, so any feedback on that is welcome as well. |
|
Tim Allen | 94b2538af5 |
Update to higan v091 release.
byuu says: Basically just a project rename, with s/bsnes/higan and the new icon from lowkee added in. It won't compile on Windows because I forgot to update the resource.rc file, and a path transform command isn't working on Windows. It was really just meant as a starting point, so that v091 WIPs can flow starting from .00 with the new name (it overshadows bsnes v091, so publicly speaking this "shouldn't exist" and will probably be deleted from Google Code when v092 is ready.) |