byuu says:
Added VDP sprite rendering. Can't get any games far enough in to see if
it actually works. So in other words, it doesn't work at all and is 100%
completely broken.
Also added 68K exceptions and interrupts. So far only the VDP interrupt
is present. It definitely seems to be firing in commercial games, so
that's promising. But the implementation is almost certainly completely
wrong. There is fuck all of nothing for documentation on how interrupts
actually work. I had to find out the interrupt vector numbers from
reading the comments from the Sonic the Hedgehog disassembly. I have
literally no fucking clue what I0-I2 (3-bit integer priority value in
the status register) is supposed to do. I know that Vblank=6, Hblank=4,
Ext(gamepad)=2. I know that at reset, SR.I=7. I don't know if I'm
supposed to block interrupts when I is >, >=, <, <= to the interrupt
level. I don't know what level CPU exceptions are supposed to be.
Also implemented VDP regular DMA. No idea if it works correctly since
none of the commercial games run far enough to use it. So again, it's
horribly broken for usre.
Also improved VDP fill mode. But I don't understand how it takes
byte-lengths when the bus is 16-bit. The transfer times indicate it's
actually transferring at the same speed as the 68K->VDP copy, strongly
suggesting it's actually doing 16-bit transfers at a time. In which case,
what happens when you set an odd transfer length?
Also, both DMA modes can now target VRAM, VSRAM, CRAM. Supposedly there's
all kinds of weird shit going on when you target VSRAM, CRAM with VDP
fill/copy modes, but whatever. Get to that later.
Also implemented a very lazy preliminary wait mechanism to to stall out
a processor while another processor exerts control over the bus. This
one's going to be a major work in progress. For one, it totally breaks
the model I use to do save states with libco. For another, I don't
know if a 68K->VDP DMA instantly locks the CPU, or if it the CPU could
actually keep running if it was executing out of RAM when it started
the DMA transfer from ROM (eg it's a bus busy stall, not a hard chip
stall.) That'll greatly change how I handle the waiting.
Also, the OSS driver now supports Audio::Latency. Sound should be
even lower latency now. On FreeBSD when set to 0ms, it's absolutely
incredible. Cannot detect latency whatsoever. The Mario jump sound seems
to happen at the very instant I hear my cherry blue keyswitch activate.
byuu says:
Changelog:
- 68K: fixed bug that affected BSR return address
- VDP: added very preliminary emulation of planes A, B, W (W is
entirely broken though)
- VDP: added command/address stuff so you can write to VRAM, CRAM,
VSRAM
- VDP: added VRAM fill DMA
I would be really surprised if any commercial games showed anything at
all, so I'd probably recommend against wasting your time trying, unless
you're really bored :P
Also, I wanted to add: I am accepting patches\! So if anyone wants to
look over the 68K core for bugs, that would save me untold amounts of
time in the near future :D
byuu says:
Changelog:
- pulled the (u)intN type aliases into higan instead of leaving them
in nall
- added 68K LINEA, LINEF hooks for illegal instructions
- filled the rest of the 68K lambda table with generic instance of
ILLEGAL
- completed the 68K disassembler effective addressing modes
- still unsure whether I should use An to decode absolute
addresses or not
- pro: way easier to read where accesses are taking place
- con: requires An to be valid; so as a disassembler it does a
poor job
- making it optional: too much work; ick
- added I/O decoding for the VDP command-port registers
- added skeleton timing to all five processor cores
- output at 1280x480 (needed for mixed 256/320 widths; and to handle
interlace modes)
The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and
syncing; which is the pathological worst case for libco. But they also
have no logic inside of them. With all the above, I'm averaging around
250fps with just the 68K core actually functional, and the VDP doing a
dumb "draw white pixels" loop. Still way too early to tell how this
emulator is going to perform.
Also, the 320x240 mode of the Genesis means that we don't need an aspect
correction ratio. But we do need to ensure the output window is a
multiple 320x240 so that the scale values work correctly. I was
hard-coding aspect correction to stretch the window an additional \*8/7.
But that won't work anymore so ... the main higan window is now 640x480,
960x720, or 1280x960. Toggling aspect correction only changes the video
width inside the window.
It's a bit jarring ... the window is a lot wider, more black space now
for most modes. But for now, it is what it is.
byuu says:
(Windows: compile with -fpermissive to silence an annoying error. I'll
fix it in the next WIP.)
I completely replaced the time management system in higan and overhauled
the scheduler.
Before, processor threads would have "int64 clock"; and there would
be a 1:1 relationship between two threads. When thread A ran for X
cycles, it'd subtract X * B.Frequency from clock; and when thread B ran
for Y cycles, it'd add Y * A.Frequency from clock. This worked well
and allowed perfect precision; but it doesn't work when you have more
complicated relationships: eg the 68K can sync to the Z80 and PSG; the
Z80 to the 68K and PSG; so the PSG needs two counters.
The new system instead uses a "uint64 clock" variable that represents
time in attoseconds. Every time the scheduler exits, it subtracts
the smallest clock count from all threads, to prevent an overflow
scenario. The only real downside is that rounding errors mean that
roughly every 20 minutes, we have a rounding error of one clock cycle
(one 20,000,000th of a second.) However, this only applies to systems
with multiple oscillators, like the SNES. And when you're in that
situation ... there's no such thing as a perfect oscillator anyway. A
real SNES will be thousands of times less out of spec than 1hz per 20
minutes.
The advantages are pretty immense. First, we obviously can now support
more complex relationships between threads. Second, we can build a
much more abstracted scheduler. All of libco is now abstracted away
completely, which may permit a state-machine / coroutine version of
Thread in the future. We've basically gone from this:
auto SMP::step(uint clocks) -> void {
clock += clocks * (uint64)cpu.frequency;
dsp.clock -= clocks;
if(dsp.clock < 0 && !scheduler.synchronizing()) co_switch(dsp.thread);
if(clock >= 0 && !scheduler.synchronizing()) co_switch(cpu.thread);
}
To this:
auto SMP::step(uint clocks) -> void {
Thread::step(clocks);
synchronize(dsp);
synchronize(cpu);
}
As you can see, we don't have to do multiple clock adjustments anymore.
This is a huge win for the SNES CPU that had to update the SMP, DSP, all
peripherals and all coprocessors. Likewise, we don't have to synchronize
all coprocessors when one runs, now we can just synchronize the active
one to the CPU.
Third, when changing the frequencies of threads (think SGB speed setting
modes, GBC double-speed mode, etc), it no longer causes the "int64
clock" value to be erroneous.
Fourth, this results in a fairly decent speedup, mostly across the
board. Aside from the GBA being mostly a wash (for unknown reasons),
it's about an 8% - 12% speedup in every other emulation core.
Now, all of this said ... this was an unbelievably massive change, so
... you know what that means >_> If anyone can help test all types of
SNES coprocessors, and some other system games, it'd be appreciated.
----
Lastly, we have a bitchin' new about screen. It unfortunately adds
~200KiB onto the binary size, because the PNG->C++ header file
transformation doesn't compress very well, and I want to keep the
original resource files in with the higan archive. I might try some
things to work around this file size increase in the future, but for now
... yeah, slightly larger archive sizes, sorry.
The logo's a bit busted on Windows (the Label control's background
transparency and alignment settings aren't working), but works well on
GTK. I'll have to fix Windows before the next official release. For now,
look on my Twitter feed if you want to see what it's supposed to look
like.
----
EDIT: forgot about ICD2::Enter. It's doing some weird inverse
run-to-save thing that I need to implement support for somehow. So, save
states on the SGB core probably won't work with this WIP.
byuu says:
All of the above fixes, plus I added all 24 variations on the shift
opcodes, plus SUBQ, plus fixes to the BCC instruction.
I can now run 851,767 instructions into Sonic the Hedgehog before hitting
an unimplemented instruction (SUB).
The 68K core is probably only ~35% complete, and yet it's already within
4KiB of being the largest CPU core, code size wise, in all of higan. Fuck
this chip.
byuu says:
I split the Register class and read/write handlers into DataRegister and
AddressRegister, given that they have different behaviors on byte/word
accesses (data tends to preserve the upper bits; address tends to
sign-extend things.)
I expanded EA to EffectiveAddress. No sense in abbreviating things
to death.
I've now implemented 26 instructions. But the new ones are just all the
stupid from/to ccr/sr instructions.
Ryphecha confirmed that you can't set the undefined bits, so I don't
think the BitField concept is appropriate for the CCR/SR. Instead, I'm
just storing direct flags and have (read,write)(CCR,SR) instead. This
isn't like the 65816 where you have subroutines that push and pop the
flag register. It's much more common to access individual flags. Doesn't
match the consistency angle of the other CPU cores, but ... I think this
is the right thing to for the 68K specifically.
byuu says:
Redesigned the handling of reading/writing registers to be about eight
times faster than the old system. More work may be needed ... it seems
data registers tend to preserve their upper bits upon assignment; whereas
address registers tend to sign-extend values into them. It may make
sense to have DataRegister and AddressRegister classes with separate
read/write handlers. I'd have to hold two Register objects inside the
EffectiveAddress (EA) class if we do that.
Implemented 19 opcodes now (out of somewhere between 60 and 90.) That gets
the first ~530,000 instructions in Sonic the Hedgehog running (though
probably wrong. But we can run a lot thanks to large initialization
loops.)
If I force the core to loop back to the reset vector on an invalid opcode,
I'm getting about 1500fps with a dumb 320x240 blit 60 times a second and
just the 68K running alone (no Z80, PSG, VDP, YM2612.) I don't know if
that's good or not. I guess we'll find out.
I had to stop tonight because the final opcode I execute is an RTS
(return from subroutine) that's branching back to address 0; which is
invalid ... meaning something went terribly wrong and the system crashed.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
byuu says:
Four and a half hours of work and ... zero new opcodes implemented.
This was the best job I could do refining the effective address
computations. Should have all twelve 68000 modes implemented now. Still
have a billion questions about when and how I'm supposed to perform
certain edge case operations, though.
byuu says:
I now have enough of three instructions implemented to get through the
first four instructions in Sonic the Hedgehog.
But they're far from complete. The very first instruction uses EA
addressing, which is similar to x86's ModRM in terms of how disgustingly
complex it is. And it also accesses Z80 control registers, which obviously
isn't going to do anything yet.
The slow speed was me being stupid again. It's not 7.6MHz per frame,
it's 7.67MHz per second. So yeah, speed is so far acceptable again. But
we'll see how things go as I keep emulating more. The 68K decode is not
pretty at all.
byuu says:
Changelog:
- moved Thread, Scheduler, Cheat functionality into emulator/ for
all cores
- start of actual Mega Drive emulation (two 68K instructions)
I'm going to be rather terse on MD emulation, as it's too early for any
meaningful dialogue here.
byuu says:
Sigh ... I'm really not a good person. I'm inherently selfish.
My responsibility and obligation right now is to work on loki, and
then on the Tengai Makyou Zero translation, and then on improving the
Famicom emulation.
And yet ... it's not what I really want to do. That shouldn't matter;
I should work on my responsibilities first.
Instead, I'm going to be a greedy, self-centered asshole, and work on
what I really want to instead.
I'm really sorry, guys. I'm sure this will make a few people happy,
and probably upset even more people.
I'm also making zero guarantees that this ever gets finished. As always,
I wish I could keep these things secret, so if I fail / give up, I could
just drop it with no shame. But I would have to cut everyone out of the
WIP process completely to make it happen. So, here goes ...
This WIP adds the initial skeleton for Sega Mega Drive / Genesis
emulation. God help us.
(minor note: apparently the new extension for Mega Drive games is .md,
neat. That's what I chose for the folders too. I thought it was .smd,
so that'll be fixed in icarus for the next WIP.)
(aside: this is why I wanted to get v100 out. I didn't want this code in
a skeleton state in v100's source. Nor did I want really broken emulation,
which the first release is sure to be, tarring said release.)
...
So, basically, I've been ruminating on the legacy I want to leave behind
with higan. 3D systems are just plain out. I'm never going to support
them. They're too complex for my abilities, and they would run too slowly
with my design style. I'm not willing to compromise my design ideals. And
I would never want to play a 3D game system at native 240p/480i resolution
... but 1080p+ upscaling is not accurate, so that's a conflict I want
to avoid entirely. It's also never going to emulate computer systems
(X68K, PC-98, FM-Towns, etc) because holy shit that would completely
destroy me. It's also never going emulate arcade machines.
So I think of higan as a collection of 2D emulators for consoles
and handhelds. I've gone over every major 2D gaming system there is,
looking for ones with games I actually care about and enjoy. And I
basically have five of those systems supported already. Looking at the
remaining list, I see only three systems left that I have any interest
in whatsoever: PC-Engine, Master System, Mega Drive. Again, I'm not in
any way committing to emulating any of these, but ... if I had all of
those in higan, I think I'd be content to really, truly, finally stop
writing more emulators for the rest of my life.
And so I decided to tackle the most difficult system first. If I'm
successful, the Z80 core should cover a lot of the work on the SMS. And
the HuC6280 should land somewhere between the NES and SNES in terms of
difficulty ... closer to the NES.
The systems that just don't appeal to me at all, which I will never touch,
include, but are not limited to:
* Atari 2600/5200/7800
* Lynx
* Jaguar
* Vectrex
* Colecovision
* Commodore 64
* Neo-Geo
* Neo-Geo Pocket / Color
* Virtual Boy
* Super A'can
* 32X
* CD-i
* etc, etc, etc.
And really, even if something were mildly interesting in there ... we
have to stop. I can't scale infinitely. I'm already way past my limit,
but I'm doing this anyway. Too many cores bloats everything and kills
quality on everything. I don't want higan to become MESS v2.
I don't know what I'll do about the Famicom Disk System, PC-Engine CD,
and Mega CD. I don't think I'll be able to achieve 60fps emulating the
Mega CD, even if I tried to.
I don't know what's going to happen here with even the Mega Drive. Maybe
I'll get driven crazy with the documentation and quit. Maybe it'll end
up being too complicated and I'll quit. Maybe the emulation will end up
way too slow and I'll give up. Maybe it'll take me seven years to get
any games playable at all. Maybe Steve Snake, AamirM and Mike Pavone
will pool money to hire a hitman to come after me. Who knows.
But this is what I want to do, so ... here goes nothing.