byuu says:
Changelog:
- added (poorly-named) castable<To, With> template
- Z80 debugger rewritten to make declaring instructions much simpler
- Z80 has more instructions implemented; supports displacement on
(IX), (IY) now
- added `Processor::M68K::Bus` to mirror `Processor::Z80::Bus`
- it does add a pointer indirection; so I'm not sure if I want to
do this for all of my emulator cores ...
byuu says:
Changelog:
- pulled the (u)intN type aliases into higan instead of leaving them
in nall
- added 68K LINEA, LINEF hooks for illegal instructions
- filled the rest of the 68K lambda table with generic instance of
ILLEGAL
- completed the 68K disassembler effective addressing modes
- still unsure whether I should use An to decode absolute
addresses or not
- pro: way easier to read where accesses are taking place
- con: requires An to be valid; so as a disassembler it does a
poor job
- making it optional: too much work; ick
- added I/O decoding for the VDP command-port registers
- added skeleton timing to all five processor cores
- output at 1280x480 (needed for mixed 256/320 widths; and to handle
interlace modes)
The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and
syncing; which is the pathological worst case for libco. But they also
have no logic inside of them. With all the above, I'm averaging around
250fps with just the 68K core actually functional, and the VDP doing a
dumb "draw white pixels" loop. Still way too early to tell how this
emulator is going to perform.
Also, the 320x240 mode of the Genesis means that we don't need an aspect
correction ratio. But we do need to ensure the output window is a
multiple 320x240 so that the scale values work correctly. I was
hard-coding aspect correction to stretch the window an additional \*8/7.
But that won't work anymore so ... the main higan window is now 640x480,
960x720, or 1280x960. Toggling aspect correction only changes the video
width inside the window.
It's a bit jarring ... the window is a lot wider, more black space now
for most modes. But for now, it is what it is.
byuu says:
All of the above fixes, plus I added all 24 variations on the shift
opcodes, plus SUBQ, plus fixes to the BCC instruction.
I can now run 851,767 instructions into Sonic the Hedgehog before hitting
an unimplemented instruction (SUB).
The 68K core is probably only ~35% complete, and yet it's already within
4KiB of being the largest CPU core, code size wise, in all of higan. Fuck
this chip.
byuu says:
I split the Register class and read/write handlers into DataRegister and
AddressRegister, given that they have different behaviors on byte/word
accesses (data tends to preserve the upper bits; address tends to
sign-extend things.)
I expanded EA to EffectiveAddress. No sense in abbreviating things
to death.
I've now implemented 26 instructions. But the new ones are just all the
stupid from/to ccr/sr instructions.
Ryphecha confirmed that you can't set the undefined bits, so I don't
think the BitField concept is appropriate for the CCR/SR. Instead, I'm
just storing direct flags and have (read,write)(CCR,SR) instead. This
isn't like the 65816 where you have subroutines that push and pop the
flag register. It's much more common to access individual flags. Doesn't
match the consistency angle of the other CPU cores, but ... I think this
is the right thing to for the 68K specifically.
byuu says:
Redesigned the handling of reading/writing registers to be about eight
times faster than the old system. More work may be needed ... it seems
data registers tend to preserve their upper bits upon assignment; whereas
address registers tend to sign-extend values into them. It may make
sense to have DataRegister and AddressRegister classes with separate
read/write handlers. I'd have to hold two Register objects inside the
EffectiveAddress (EA) class if we do that.
Implemented 19 opcodes now (out of somewhere between 60 and 90.) That gets
the first ~530,000 instructions in Sonic the Hedgehog running (though
probably wrong. But we can run a lot thanks to large initialization
loops.)
If I force the core to loop back to the reset vector on an invalid opcode,
I'm getting about 1500fps with a dumb 320x240 blit 60 times a second and
just the 68K running alone (no Z80, PSG, VDP, YM2612.) I don't know if
that's good or not. I guess we'll find out.
I had to stop tonight because the final opcode I execute is an RTS
(return from subroutine) that's branching back to address 0; which is
invalid ... meaning something went terribly wrong and the system crashed.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
byuu says:
Four and a half hours of work and ... zero new opcodes implemented.
This was the best job I could do refining the effective address
computations. Should have all twelve 68000 modes implemented now. Still
have a billion questions about when and how I'm supposed to perform
certain edge case operations, though.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
byuu says:
I now have enough of three instructions implemented to get through the
first four instructions in Sonic the Hedgehog.
But they're far from complete. The very first instruction uses EA
addressing, which is similar to x86's ModRM in terms of how disgustingly
complex it is. And it also accesses Z80 control registers, which obviously
isn't going to do anything yet.
The slow speed was me being stupid again. It's not 7.6MHz per frame,
it's 7.67MHz per second. So yeah, speed is so far acceptable again. But
we'll see how things go as I keep emulating more. The 68K decode is not
pretty at all.
byuu says:
Changelog:
- moved Thread, Scheduler, Cheat functionality into emulator/ for
all cores
- start of actual Mega Drive emulation (two 68K instructions)
I'm going to be rather terse on MD emulation, as it's too early for any
meaningful dialogue here.