Commit Graph

20 Commits

Author SHA1 Message Date
Tim Allen 9c25f128f9 Update to v104r07 release.
byuu says:

Changelog:

  - md/vdp: added VIP bit to status register; fixes Cliffhanger
  - processor/m68k/disassembler: added modes 7 and 8 to LEA address
    disassembly
  - processor/m68k/disassembler: enhanced ILLEGAL to display LINEA/LINEF
    $xxx variants
  - processor/m68k: ILLEGAL/LINEA/LINEF do not modify the stack
    register; fixes Caeser no Yabou II
  - icarus/sfc: request sgb1.boot.rom and sgb2.boot.rom separately; as
    they are different
  - icarus/sfc: removed support for external firmware when loading ROM
    images

The hack to run Mega Drive Ballz 3D isn't in place, as I don't know if
it's correct, and the graphics were corrupted anyway.

The SGB boot ROM change is going to require updating the icarus database
as well. I will add that in when I start dumping more cartridges here
soon.

Finally ... I explained this already, but I'll do so here as well: I
removed icarus' support for loading SNES coprocessor firmware games with
external firmware files (eg dsp1.program.rom + dsp1.data.rom in the same
path as supermariokart.sfc, for example.)

I realize most are going to see this as an antagonizing/stubborn move
given the recent No-Intro discussion, and I won't deny that said thread
is why this came to the forefront of my mind. But on my word, I honestly
believe this was an ineffective solution for many reasons not related to
our disagreements:

 1. No-Intro distributes SNES coprocessor firmware as a merged file, eg
    "DSP1 (World).zip/DSP1 (World).bin" -- icarus can't possibly know
    about every ROM distribution set's naming conventions for firmware.
    (Right now, it appears GoodSNES and NSRT are mostly dead; but there
    may be more DATs in the future -- including my own.)
 2. Even if the user obtains the firmware and tries to rename it, it
    won't work: icarus parses manifests generated by the heuristics
    module and sees two ROM files: dsp1.program.rom and dsp1.data.rom.
    icarus cannot identify a file named dsp1.rom as containing both
    of these sub-files. Users are going to have to know how to split
    files, which there is no way to do on stock Windows. Merging files,
    however, can be done via `copy /b supermariokart.sfc+dsp1.rom
    supermariokartdsp.sfc`; - and dsp1.rom can be named whatever now.
    I am not saying this will be easy for the average user, but it's
    easier than splitting files.
 3. Separate firmware breaks icarus' database lookup. If you have
    pilotwings.sfc but without firmware, icarus will not find a match
    for it in the database lookup phase. It will then fall back on
    heuristics. The heuristics will pick DSP1B for compatibility with
    Ballz 3D which requires it. And so it will try to pull in the
    wrong firmware, and the game's intro will not work correctly.
    Furthermore, the database information will be unavailable, resulting
    in inaccurate mirroring.

So for these reasons, I have removed said support. You must now load
SNES coprocessor games into higan in one of two ways: 1) game paks with
split files; or 2) SFC images with merged firmware.

If and when No-Intro deploys a method I can actually use, I give you all
my word I will give it a fair shot and if it's reasonable, I'll support
it in icarus.
2017-08-28 22:46:14 +10:00
Tim Allen bdc100e123 Update to v102r02 release.
byuu says:

Changelog:

  - I caved on the `samples[] = {0.0}` thing, but I'm very unhappy about it
      - if it's really invalid C++, then GCC needs to stop accepting it
        in strict `-std=c++14` mode
  - Emulator::Interface::Information::resettable is gone
  - Emulator::Interface::reset() is gone
  - FC, SFC, MD cores updated to remove soft reset behavior
  - split GameBoy::Interface into GameBoyInterface,
    GameBoyColorInterface
  - split WonderSwan::Interface into WonderSwanInterface,
    WonderSwanColorInterface
  - PCE: fixed off-by-one scanline error [hex_usr]
  - PCE: temporary hack to prevent crashing when VDS is set to < 2
  - hiro: Cocoa: removed (u)int(#) constants; converted (u)int(#)
    types to (u)int_(#)t types
  - icarus: replaced usage of unique with strip instead (so we don't
    mess up frameworks on macOS)
  - libco: added macOS-specific section marker [Ryphecha]

So ... the major news this time is the removal of the soft reset
behavior. This is a major!! change that results in a 100KiB diff file,
and it's very prone to accidental mistakes!! If anyone is up for
testing, or even better -- looking over the code changes between v102r01
and v102r02 and looking for any issues, please do so. Ideally we'll want
to test every NES mapper type and every SNES coprocessor type by loading
said games and power cycling to make sure the games are all cleanly
resetting. It's too big of a change for me to cover there not being any
issues on my own, but this is truly critical code, so yeah ... please
help if you can.

We technically lose a bit of hardware documentation here. The soft reset
events do all kinds of interesting things in all kinds of different
chips -- or at least they do on the SNES. This is obviously not ideal.
But in the process of removing these portions of code, I found a few
mistakes I had made previously. It simplifies resetting the system state
a lot when not trying to have all the power() functions call the reset()
functions to share partial functionality.

In the future, the goal will be to come up with a way to add back in the
soft reset behavior via keyboard binding as with the Master System core.
What's going to have to happen is that the key binding will have to send
a "reset pulse" to every emulated chip, and those chips are going to
have to act independently to power() instead of reusing functionality.
We'll get there eventually, but there's many things of vastly greater
importance to work on right now, so it'll be a while. The information
isn't lost ... we'll just have to pull it out of v102 when we are ready.

Note that I left the SNES reset vector simulation code in, even though
it's not possible to trigger, for the time being.

Also ... the Super Game Boy core is still disconnected. To be honest, it
totally slipped my mind when I released v102 that it wasn't connected
again yet. This one's going to be pretty tricky to be honest. I'm
thinking about making a third GameBoy::Interface class just for SGB, and
coming up with some way of bypassing platform-> calls when in this
mode.
2017-01-23 08:04:26 +11:00
Tim Allen 4d2e17f9c0 Update to v101r09 release.
byuu says:

Sorry, two WIPs in one day. Got excited and couldn't wait.

Changelog:

  - ADDQ, SUBQ shouldn't update flags when targeting an address register
  - ADDA should sign extend effective address reads
  - JSR was pushing the PC too early
  - some improvements to 8-bit register reads on the VDP (still needs
    work)
  - added H/V counter reads to the VDP IO port region
  - icarus: added support for importing Master System and Game Gear ROMs
  - tomoko: added library sub-menus for each manufacturer
      - still need to sort Game Gear after Mega Drive somehow ...

The sub-menu system actually isn't all that bad. It is indeed a bit more
annoying, but not as annoying as I thought it was going to be. However,
it looks a hell of a lot nicer now.
2016-08-18 08:05:50 +10:00
Tim Allen 043f6a8b33 Update to v101r08 release.
byuu says:

Changelog:

  - 68K: fixed read-modify-write instructions
  - 68K: fixed ADDX bug (using wrong target)
  - 68K: fixed major bug with SUB using wrong argument ordering
  - 68K: fixed sign extension when reading address registers from
    effective addressing
  - 68K: fixed sign extension on CMPA, SUBA instructions
  - VDP: improved OAM sprite attribute table caching behavior
  - VDP: improved DMA fill operation behavior
  - added Master System / Game Gear stubs (needed for developing the Z80
    core)
2016-08-17 22:31:22 +10:00
Tim Allen ac2d0ba1cf Update to v101r05 release.
byuu says:

Changelog:

  - 68K: fixed bug that affected BSR return address
  - VDP: added very preliminary emulation of planes A, B, W (W is
    entirely broken though)
  - VDP: added command/address stuff so you can write to VRAM, CRAM,
    VSRAM
  - VDP: added VRAM fill DMA

I would be really surprised if any commercial games showed anything at
all, so I'd probably recommend against wasting your time trying, unless
you're really bored :P

Also, I wanted to add: I am accepting patches\! So if anyone wants to
look over the 68K core for bugs, that would save me untold amounts of
time in the near future :D
2016-08-13 09:47:30 +10:00
Tim Allen 1df2549d18 Update to v101r04 release.
byuu says:

Changelog:

  - pulled the (u)intN type aliases into higan instead of leaving them
    in nall
  - added 68K LINEA, LINEF hooks for illegal instructions
  - filled the rest of the 68K lambda table with generic instance of
    ILLEGAL
  - completed the 68K disassembler effective addressing modes
      - still unsure whether I should use An to decode absolute
        addresses or not
      - pro: way easier to read where accesses are taking place
      - con: requires An to be valid; so as a disassembler it does a
        poor job
      - making it optional: too much work; ick
  - added I/O decoding for the VDP command-port registers
  - added skeleton timing to all five processor cores
  - output at 1280x480 (needed for mixed 256/320 widths; and to handle
    interlace modes)

The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and
syncing; which is the pathological worst case for libco. But they also
have no logic inside of them. With all the above, I'm averaging around
250fps with just the 68K core actually functional, and the VDP doing a
dumb "draw white pixels" loop. Still way too early to tell how this
emulator is going to perform.

Also, the 320x240 mode of the Genesis means that we don't need an aspect
correction ratio. But we do need to ensure the output window is a
multiple 320x240 so that the scale values work correctly. I was
hard-coding aspect correction to stretch the window an additional \*8/7.
But that won't work anymore so ... the main higan window is now 640x480,
960x720, or 1280x960. Toggling aspect correction only changes the video
width inside the window.

It's a bit jarring ... the window is a lot wider, more black space now
for most modes. But for now, it is what it is.
2016-08-12 11:07:04 +10:00
Tim Allen 9b8c3ff8c0 Update to v101r03 release.
byuu says:

The 68K core now implements all 88 instructions. It ended up being 111
instructions in my core due to splitting up opcodes with the same name
but different addressing modes or directions (removes conditions at the
expense of more code.)

Technically, I don't have exceptions actually implemented yet, and
RESET/STOP don't do anything but set flags. So there's still more to
go. But ... close enough for statistics time!

The M68K core source code is 124,712 bytes in size. The next largest
core is the ARM7 core at 70,203 bytes in size.

The M68K object size is 942KiB; with the next largest being the V30MZ
core at 173KiB.

There are a total of 19,656 invalid opcodes in the 68000 revision (unless
of course I've made mistakes in my mappings, which is very probably.)

Now the fun part ... figuring out how to fix bugs in this core without
VDP emulation :/
2016-08-11 08:02:02 +10:00
Tim Allen 0a57cac70c Update to v101r02 release.
byuu says:

Changelog:

  - Emulator: use `(uintmax)-1 >> 1` for the units of time
  - MD: implemented 13 new 68K instructions (basically all of the
    remaining easy ones); 21 remain
  - nall: replaced `(u)intmax_t` (64-bit) with *actual* `(u)intmax` type
    (128-bit where available)
      - this extends to everything: atoi, string, etc. You can even
        print 128-bit variables if you like

22,552 opcodes still don't exist in the 68K map. Looking like quite a
few entries will be blank once I finish.
2016-08-09 21:07:18 +10:00
Tim Allen 8bdf8f2a55 Update to v101r01 release.
byuu says:

Changelog:

  - added eight more 68K instructions
  - split ADD(direction) into two separate ADD functions

I now have 54 out of 88 instructions implemented (thus, 34 remaining.)
The map is missing 25,182 entries out of 65,536. Down from 32,680 for
v101.00

Aside: this version number feels really silly. r10 and r11 surely will
as well ...
2016-08-08 20:12:03 +10:00
Tim Allen c50723ef61 Update to v100r15 release.
byuu wrote:

Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece

Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.

Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.

[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:

    Unfortunately, 64-bits just wasn't enough precision (we were
    getting misalignments ~230 times a second on 21/24MHz clocks), so
    I had to move to 128-bit counters. This of course doesn't exist on
    32-bit architectures (and probably not on all 64-bit ones either),
    so for now ... higan's only going to compile on 64-bit machines
    until we figure something out. Maybe we offer a "lower precision"
    fallback for machines that lack uint128_t or something. Using the
    booth algorithm would be way too slow.

    Anyway, the precision is now 2^-96, which is roughly 10^-29. That
    puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
    referring to it as the byuusecond. The other 32-bits of precision
    allows a 1Hz clock to run up to one full second before all clocks
    need to be normalized to prevent overflow.

    I fixed a serious wobbling issue where I was using clock > other.clock
    for synchronization instead of clock >= other.clock; and also another
    aliasing issue when two threads share a common frequency, but don't
    run in lock-step. The latter I don't even fully understand, but I
    did observe it in testing.

    nall/serialization.hpp has been extended to support 128-bit integers,
    but without explicitly naming them (yay generic code), so nall will
    still compile on 32-bit platforms for all other applications.

    Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.

The "longer explanation" in the linked hastebin is:

    Okay, so the idea is that we can have an arbitrary number of
    oscillators. Take the SNES:

    - CPU/PPU clock = 21477272.727272hz
    - SMP/DSP clock = 24576000hz
    - Cartridge DSP1 clock = 8000000hz
    - Cartridge MSU1 clock = 44100hz
    - Controller Port 1 modem controller clock = 57600hz
    - Controller Port 2 barcode battler clock = 115200hz
    - Expansion Port exercise bike clock = 192000hz

    Is this a pathological case? Of course it is, but it's possible. The
    first four do exist in the wild already: see Rockman X2 MSU1
    patch. Manifest files with higan let you specify any frequency you
    want for any component.

    The old trick higan used was to hold an int64 counter for each
    thread:thread synchronization, and adjust it like so:

    - if thread A steps X clocks; then clock += X * threadB.frequency
      - if clock >= 0; switch to threadB
    - if thread B steps X clocks; then clock -= X * threadA.frequency
      - if clock <  0; switch to threadA

    But there are also system configurations where one processor has to
    synchronize with more than one other processor. Take the Genesis:

    - the 68K has to sync with the Z80 and PSG and YM2612 and VDP
    - the Z80 has to sync with the 68K and PSG and YM2612
    - the PSG has to sync with the 68K and Z80 and YM2612

    Now I could do this by having an int64 clock value for every
    association. But these clock values would have to be outside the
    individual Thread class objects, and we would have to update every
    relationship's clock value. So the 68K would have to update the Z80,
    PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
    per clock step event instead of one.

    As such, we have to account for both possibilities. The only way to
    do this is with a single time base. We do this like so:

    - setup: scalar = timeBase / frequency
    - step: clock += scalar * clocks

    Once per second, we look at every thread, find the smallest clock
    value. Then subtract that value from all threads. This prevents the
    clock counters from overflowing.

    Unfortunately, these oscillator values are psychotic, unpredictable,
    and often times repeating fractions. Even with a timeBase of
    1,000,000,000,000,000,000 (one attosecond); we get rounding errors
    every ~16,300 synchronizations. Specifically, this happens with a CPU
    running at 21477273hz (rounded) and SMP running at 24576000hz. That
    may be good enough for most emulators, but ... you know how I am.

    Plus, even at the attosecond level, we're really pushing against the
    limits of 64-bit integers. Given the reciprocal inverse, a frequency
    of 1Hz (which does exist in higan!) would have a scalar that consumes
    1/18th of the entire range of a uint64 on every single step. Yes, I
    could raise the frequency, and then step by that amount, I know. But
    I don't want to have weird gotchas like that in the scheduler core.

    Until I increase the accuracy to about 100 times greater than a
    yoctosecond, the rounding errors are too great. And since the only
    choice above 64-bit values is 128-bit values; we might as well use
    all the extra headroom. 2^-96 as a timebase gives me the ability to
    have both a 1Hz and 4GHz clock; and run them both for a full second;
    before an overflow event would occur.

Another hastebin includes demonstration code:

    #include <libco/libco.h>

    #include <nall/nall.hpp>
    using namespace nall;

    //

    cothread_t mainThread = nullptr;
    const uint iterations = 100'000'000;
    const uint cpuFreq = 21477272.727272 + 0.5;
    const uint smpFreq = 24576000.000000 + 0.5;
    const uint cpuStep = 4;
    const uint smpStep = 5;

    //

    struct ThreadA {
      cothread_t handle = nullptr;
      uint64 frequency = 0;
      int64 clock = 0;

      auto create(auto (*entrypoint)() -> void, uint frequency) {
        this->handle = co_create(65536, entrypoint);
        this->frequency = frequency;
        this->clock = 0;
      }
    };

    struct CPUA : ThreadA {
      static auto Enter() -> void;
      auto main() -> void;
      CPUA() { create(&CPUA::Enter, cpuFreq); }
    } cpuA;

    struct SMPA : ThreadA {
      static auto Enter() -> void;
      auto main() -> void;
      SMPA() { create(&SMPA::Enter, smpFreq); }
    } smpA;

    uint8 queueA[iterations];
    uint offsetA;
    cothread_t resumeA = cpuA.handle;

    auto EnterA() -> void {
      offsetA = 0;
      co_switch(resumeA);
    }

    auto QueueA(uint value) -> void {
      queueA[offsetA++] = value;
      if(offsetA >= iterations) {
        resumeA = co_active();
        co_switch(mainThread);
      }
    }

    auto CPUA::Enter() -> void { while(true) cpuA.main(); }

    auto CPUA::main() -> void {
      QueueA(1);
      smpA.clock -= cpuStep * smpA.frequency;
      if(smpA.clock < 0) co_switch(smpA.handle);
    }

    auto SMPA::Enter() -> void { while(true) smpA.main(); }

    auto SMPA::main() -> void {
      QueueA(2);
      smpA.clock += smpStep * cpuA.frequency;
      if(smpA.clock >= 0) co_switch(cpuA.handle);
    }

    //

    struct ThreadB {
      cothread_t handle = nullptr;
      uint128_t scalar = 0;
      uint128_t clock = 0;

      auto print128(uint128_t value) {
        string s;
        while(value) {
          s.append((char)('0' + value % 10));
          value /= 10;
        }
        s.reverse();
        print(s, "\n");
      }

      //femtosecond (10^15) =    16306
      //attosecond  (10^18) =   688838
      //zeptosecond (10^21) = 13712691
      //yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
      //byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)

      auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
        this->handle = co_create(65536, entrypoint);

        uint128_t unitOfTime = 1;
      //for(uint n : range(29)) unitOfTime *= 10;
        unitOfTime <<= 96;  //2^96 time units ...

        this->scalar = unitOfTime / frequency;
        print128(this->scalar);
        this->clock = 0;
      }

      auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
      auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
    };

    struct CPUB : ThreadB {
      static auto Enter() -> void;
      auto main() -> void;
      CPUB() { create(&CPUB::Enter, cpuFreq); }
    } cpuB;

    struct SMPB : ThreadB {
      static auto Enter() -> void;
      auto main() -> void;
      SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
    } smpB;

    auto correct() -> void {
      auto minimum = min(cpuB.clock, smpB.clock);
      cpuB.clock -= minimum;
      smpB.clock -= minimum;
    }

    uint8 queueB[iterations];
    uint offsetB;
    cothread_t resumeB = cpuB.handle;

    auto EnterB() -> void {
      correct();
      offsetB = 0;
      co_switch(resumeB);
    }

    auto QueueB(uint value) -> void {
      queueB[offsetB++] = value;
      if(offsetB >= iterations) {
        resumeB = co_active();
        co_switch(mainThread);
      }
    }

    auto CPUB::Enter() -> void { while(true) cpuB.main(); }

    auto CPUB::main() -> void {
      QueueB(1);
      step(cpuStep);
      synchronize(smpB);
    }

    auto SMPB::Enter() -> void { while(true) smpB.main(); }

    auto SMPB::main() -> void {
      QueueB(2);
      step(smpStep);
      synchronize(cpuB);
    }

    //

    #include <nall/main.hpp>
    auto nall::main(string_vector) -> void {
      mainThread = co_active();

      uint masterCounter = 0;
      while(true) {
        print(masterCounter++, " ...\n");

        auto A = clock();
        EnterA();
        auto B = clock();
        print((double)(B - A) / CLOCKS_PER_SEC, "s\n");

        auto C = clock();
        EnterB();
        auto D = clock();
        print((double)(D - C) / CLOCKS_PER_SEC, "s\n");

        for(uint n : range(iterations)) {
          if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
        }
      }
    }

...and that's everything.]
2016-07-31 12:11:20 +10:00
Tim Allen 306cac2b54 Update to v100r13 release.
byuu says:

Changelog: M68K improvements, new instructions added.
2016-07-26 20:46:43 +10:00
Tim Allen f230d144b5 Update to v100r12 release.
byuu says:

All of the above fixes, plus I added all 24 variations on the shift
opcodes, plus SUBQ, plus fixes to the BCC instruction.

I can now run 851,767 instructions into Sonic the Hedgehog before hitting
an unimplemented instruction (SUB).

The 68K core is probably only ~35% complete, and yet it's already within
4KiB of being the largest CPU core, code size wise, in all of higan. Fuck
this chip.
2016-07-25 23:15:54 +10:00
Tim Allen 7ccfbe0206 Update to v100r11 release.
byuu says:

I split the Register class and read/write handlers into DataRegister and
AddressRegister, given that they have different behaviors on byte/word
accesses (data tends to preserve the upper bits; address tends to
sign-extend things.)

I expanded EA to EffectiveAddress. No sense in abbreviating things
to death.

I've now implemented 26 instructions. But the new ones are just all the
stupid from/to ccr/sr instructions.

Ryphecha confirmed that you can't set the undefined bits, so I don't
think the BitField concept is appropriate for the CCR/SR. Instead, I'm
just storing direct flags and have (read,write)(CCR,SR) instead. This
isn't like the 65816 where you have subroutines that push and pop the
flag register. It's much more common to access individual flags. Doesn't
match the consistency angle of the other CPU cores, but ... I think this
is the right thing to for the 68K specifically.
2016-07-23 12:32:35 +10:00
Tim Allen 4b897ba791 Update to v100r10 release.
byuu says:

Redesigned the handling of reading/writing registers to be about eight
times faster than the old system. More work may be needed ... it seems
data registers tend to preserve their upper bits upon assignment; whereas
address registers tend to sign-extend values into them. It may make
sense to have DataRegister and AddressRegister classes with separate
read/write handlers. I'd have to hold two Register objects inside the
EffectiveAddress (EA) class if we do that.

Implemented 19 opcodes now (out of somewhere between 60 and 90.) That gets
the first ~530,000 instructions in Sonic the Hedgehog running (though
probably wrong. But we can run a lot thanks to large initialization
loops.)

If I force the core to loop back to the reset vector on an invalid opcode,
I'm getting about 1500fps with a dumb 320x240 blit 60 times a second and
just the 68K running alone (no Z80, PSG, VDP, YM2612.) I don't know if
that's good or not. I guess we'll find out.

I had to stop tonight because the final opcode I execute is an RTS
(return from subroutine) that's branching back to address 0; which is
invalid ... meaning something went terribly wrong and the system crashed.
2016-07-22 22:03:25 +10:00
Tim Allen be3f6ac0d5 Update to v100r09 release.
byuu says:

Another six hours in ...

I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.

Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:

- MOVE (EA to, EA from) case
  - read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
  - when using +/-(aN), RA can't actually be updated until the transfer
    is completed
- LEA (EA from) case
  - doesn't actually perform the final read; just returns the address
    to be read from
- ANDI (EA from-and-to) case
  - same EA has to be read from and written to
  - for -(aN), the read has to come from aN-2, but can't update aN yet;
    so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
  opcode decoding

To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(

If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 19:12:05 +10:00
Tim Allen 92fe5b0813 Update to v100r08 release.
byuu says:

Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!

For building the table, I've decided to move from:

    for(uint opcode : range(65536)) {
      if(match(...)) bind(opNAME, ...);
    }

To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.

And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.

This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.

When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.

The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-18 08:11:29 +10:00
Tim Allen 059347e575 Update to v100r07 release.
byuu says:

Four and a half hours of work and ... zero new opcodes implemented.

This was the best job I could do refining the effective address
computations. Should have all twelve 68000 modes implemented now. Still
have a billion questions about when and how I'm supposed to perform
certain edge case operations, though.
2016-07-17 13:24:28 +10:00
Tim Allen 0d6a09f9f8 Update to v100r06 release.
byuu says:

Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.

This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.

The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.

I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.

Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.

----

Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 18:39:44 +10:00
Tim Allen b72f35a13e Update to v100r05 release.
byuu says:

Alright, I'm definitely going to need to find some people willing to
tolerate my questions on this chip, so I'm going to go ahead and announce
I'm working on this I guess.

This core is way too big for a surprise like the NES and WS cores
were. It'll probably even span multiple v10x releases before it's
even ready.
2016-07-13 08:47:04 +10:00
Tim Allen 1c0ef793fe Update to v100r04 release.
byuu says:

I now have enough of three instructions implemented to get through the
first four instructions in Sonic the Hedgehog.

But they're far from complete. The very first instruction uses EA
addressing, which is similar to x86's ModRM in terms of how disgustingly
complex it is. And it also accesses Z80 control registers, which obviously
isn't going to do anything yet.

The slow speed was me being stupid again. It's not 7.6MHz per frame,
it's 7.67MHz per second. So yeah, speed is so far acceptable again. But
we'll see how things go as I keep emulating more. The 68K decode is not
pretty at all.
2016-07-12 20:19:31 +10:00