byuu says:
Changelog:
- SMS: background VDP clips partial tiles on the left (math may not be
right ... it's hard to reason about)
- SMS: fix background VDP scroll locks
- SMS: fix VDP sprite coordinates
- SMS: paint black after the end of the visible display
- todo: shouldn't be a brute force at the end of the main VDP
loop, should happen in each rendering unit
- higan: removed emulator/debugger.hpp
- higan: removed privileged: access specifier
- SFC: removed debugger hooks
- todo: remove sfc/debugger.hpp
- Z80: fixed disassembly of (fd,dd) cb (displacement) (opcode)
instructions
- Z80: fix to prevent interrupts from firing between ix/iy prefixes
and opcodes
- todo: this is a rather hacky fix that could, if exploited, crash
the stack frame
- Z80: fix BIT flags
- Z80: fix ADD hl,reg flags
- Z80: fix CPD, CPI flags
- Z80: fix IND, INI flags
- Z80: fix INDR, INIT loop flag check
- Z80: fix OUTD, OUTI flags
- Z80: fix OTDR, OTIR loop flag check
byuu says:
Changelog:
- SMS: emulated the remaining 240 instructions in the (0xfd, 0xdd)
0xcb (displacement) (opcode) set
- 1/8th of these were "legal" instructions, and apparently games
use them a lot
- SMS: emulated the standard gamepad controllers
- reset button not emulated yet
The reset button is tricky. In every other case, reset is a hardware
thing that instantly reboots the entire machine.
But on the SMS, it's more like a gamepad button that's attached to the
front of the device. When you press it, it fires off a reset vector
interrupt and the gamepad polling routine lets you query the status of
the button.
Just having a reset option in the "Master System" hardware menu is not
sufficient to fully emulate the behavior. Even more annoying is that the
Game Gear doesn't have such a button, yet the core information structs
aren't flexible enough for the Master System to have it, and the Game
Gear to not have it, in the main menu. But that doesn't matter anyway,
since it won't work having it in the menu for the Master System.
So as a result, I'm going to have to have a new "input device" called
"Hardware" that has the "Reset" button listed under there. And for the
sake of consistency, I'm not sure if we should treat the other systems
the same way or not :/
byuu says:
Changelog:
- SMS: emulated the generic Sega memory mapper (none of the more
limited forms of it yet)
- (missing ROM shift, ROM write enable emulation -- no commercial
games use either, though)
- SMS: bus I/O returns 0xff instead of 0x00 so games don't think every
key is being pressed at once
- (this is a hack until I implement proper controller pad reading)
- SMS: very limited protection against reading/writing past the end of
ROM/RAM (todo: should mirror)
- SMS: VDP background HSCROLL subtracts, rather than adds, to the
offset (unlike VSCROLL)
- SMS: VDP VSCROLL is 9-bit, modulates voffset+vscroll to 224 in
192-line mode (32x28 tilemap)
- SMS: VDP tiledata for backgrounds and sprites use `7-(x&7)` rather
than `(x&7)`
- SMS: fix output color to be 6-bit rather than 5-bit
- SMS: left clip uses register `#7`, not palette color `#7`
- (todo: do we want `color[reg7]` or `color[16 + reg7]`?)
- SMS: refined handling of 0xcb, 0xed prefixes in the Z80 core and its
disassembler
- SMS: emulated (0xfd, 0xdd) 0xcb opcodes 0x00-0x0f (still missing
0x10-0xff)
- SMS: fixed 0xcb 0b-----110 opcodes to use direct HL and never allow
(IX,IY)+d
- SMS: fixed major logic bug in (IX,IY)+d displacement
- (was using `read(x)` instead of `operand()` for the displacement
byte fetch before)
- icarus: fake there always being 32KiB of RAM in all SMS cartridges
for the time being
- (not sure how to detect this stuff yet; although I've read it's
not even really possible `>_>`)
TODO: remove processor/z80/dissassembler.cpp code block at line 396 (as it's unnecessary.)
Lots of commercial games are starting to show trashed graphical output now.
byuu says:
Changelog:
- Makefile: added $(windres), -lpthread to Windows port
- GBA: WAITCNT.prefetch is not writable (should fix Donkey Kong: King
of Swing) \[endrift\]
- SMS: fixed hcounter shift value \[hex\_usr\]
- SMS: emulated interrupts (reset button isn't hooked up anywhere, not
sure where to put it yet)
This WIP actually took a really long time because the documentation on
SMS interrupts was all over the place. I'm hoping I've emulated them
correctly, but I honestly have no idea. It's based off my best
understanding from four or five different sources. So it's probably
quite buggy.
However, a few interrupts fire in Sonic the Hedgehog, so that's
something to start with. Now I just have to hope I've gotten some games
far enough in that I can start seeing some data in the VDP VRAM. I need
that before I can start emulating graphics mode 4 to get some actual
screen output.
Or I can just say to hell with it and use a "Hello World" test ROM.
That'd probably be smarter.
byuu says:
Changelog:
- SMS: extended bus mapping of in/out ports: now decoding them fully
inside ms/bus
- SMS: moved Z80 disassembly code from processor/z80 to ms/cpu
(cosmetic)
- SMS: hooked up non-functional silent PSG sample generation, so I can
cap the framerate at 60fps
- SMS: hooked up the VDP main loop: 684 clocks/scanline, 262
scanlines/frame (no PAL support yet)
- SMS: emulated the VDP Vcounter and Hcounter polling ... hopefully
it's right, as it's very bizarre
- SMS: emulated VDP in/out ports (data read, data write, status read,
control write, register write)
- SMS: decoding and caching all VDP register flags (variable names
will probably change)
- nall: \#undef IN on Windows port (prevent compilation warning on
processor/z80)
Watching Sonic the Hedgehog, I can definitely see some VDP register
writes going through, which is a good sign.
Probably the big thing that's needed before I can get enough into the
VDP to start showing graphics is interrupt support. And interrupts are
never fun to figure out :/
What really sucks on this front is I'm flying blind on the Z80 CPU core.
Without a working VDP, I can't run any Z80 test ROMs to look for CPU
bugs. And the CPU is certainly too buggy still to run said test ROM
anyway. I can't find any SMS emulators with trace logging from reset.
Such logs vastly accelerate tracking down CPU logic bugs, so without
them, it's going to take a lot longer.
byuu says:
This is a really tiny WIP. Just wanted to add the known fixes before I start debugging it against Mednafen in a fork.
Changelog:
- Z80: fixed flag calculations on 8-bit ADC, ADD, SBC, SUB
- Z80: fixed flag calculations on 16-bit ADD
- Z80: simplified DAA logic \[AWJ\]
- Z80: RETI sets IFF1=IFF2 (same as RETN)
byuu says:
Changelog:
- Z80: emulated 83 new instructions
- Z80: timing improvements
DAA is a skeleton implementation to complete the normal opcode set. Also
worth noting that I don't know exactly what the hell RETI is doing,
so for now it acts like RET. RETN probably needs some special handling
besides just setting IFF1=IFF2 as well.
I'm now missing 24 ED-prefix instructions, plus DAA, for a total of 25
opcodes remaining. And then, of course, several weeks worth of debugging
all of the inevitable bugs in the core.
byuu says:
Changelog:
- added \~130 new PAL games to icarus (courtesy of Smarthuman
and aquaman)
- added all three Korean-localized games to icarus
- sfc: removed SuperDisc emulation (it was going nowhere)
- sfc: fixed MSU1 regression where the play/repeat flags were not
being cleared on track select
- nall: cryptography support added; will be used to sign future
databases (validation will always be optional)
- minor shims to fix compilation issues due to nall changes
The real magic is that we now have 25-30% of the PAL SNES library in
icarus!
Signing will be tricky. Obviously if I put the public key inside the
higan archive, then all anyone has to do is change that public key for
their own releases. And if you download from my site (which is now over
HTTPS), then you don't need the signing to verify integrity. I may just
put the public key on my site on my site and leave it at that, we'll
see.
byuu says:
Changelog:
- Z80: added most opcodes between 0x00 and 0x3f (two or three hard
ones missing still)
- Z80: redid register declaration *again* to handle AF', BC', DE',
HL' (ugggggh, the fuck? Alternate registers??)
- basically, using `#define <register name>` values to get around
horrendously awful naming syntax
- Z80: improved handling of displace() so that it won't ever trigger
on (BC) or (DE)
byuu says:
Changelog:
- Z80: implemented 113 new instructions (all the easy
LD/ADC/ADD/AND/OR/SBC/SUB/XOR ones)
- Z80: used alternative to castable<To, With> type (manual cast inside
instruction() register macros)
- Z80: debugger: used register macros to reduce typing and increase
readability
- Z80: debugger: smarter way of handling multiple DD/FD prefixes
(using gotos, yay!)
- ruby: fixed crash with Windows input driver on exit (from SuperMikeMan)
I have no idea how the P/V flag is supposed to work on AND/OR/XOR, so
that's probably wrong for now. HALT is also mostly a dummy function for
now. But I typically implement those inside instruction(), so it
probably won't need to be changed? We'll see.
byuu says:
Changelog:
- added (poorly-named) castable<To, With> template
- Z80 debugger rewritten to make declaring instructions much simpler
- Z80 has more instructions implemented; supports displacement on
(IX), (IY) now
- added `Processor::M68K::Bus` to mirror `Processor::Z80::Bus`
- it does add a pointer indirection; so I'm not sure if I want to
do this for all of my emulator cores ...
byuu says:
Changelog:
- rewrote the Z80 core to properly handle 0xDD (IX0 and 0xFD (IY)
prefixes
- added Processor::Z80::Bus as a new type of abstraction
- all of the instructions implemented have their proper T-cycle counts
now
- added nall/certificates for my public keys
The goal of `Processor::Z80::Bus` is to simulate the opcode fetches being
2-read + 2-wait states; operand+regular reads/writes being 3-read. For
now, this puts the cycle counts inside the CPU core. At the moment, I
can't think of any CPU core where this wouldn't be appropriate. But it's
certainly possible that such a case exists. So this may not be the
perfect solution.
The reason for having it be a subclass of Processor::Z80 instead of
virtual functions for the MasterSystem::CPU core to define is due to
naming conflicts. I wanted the core to say `in(addr)` and have it take
the four clocks. But I also wanted a version of the function that didn't
consume time when called. One way to do that would be for the core to
call `Z80::in(addr)`, which then calls the regular `in(addr)` that goes to
`MasterSystem::CPU::in(addr)`. But I don't want to put the `Z80::`
prefix on all of the opcodes. Very easy to forget it, and then end up not
consuming any time. Another is to use uglier names in the
`MasterSystem::CPU` core, like `read_`, `write_`, `in_`, `out_`, etc. But,
yuck.
So ... yeah, this is an experiment. We'll see how it goes.
byuu says:
Changelog:
- MS: added ms/bus
- Z80: implemented JP/JR/CP/DI/IM/IN instructions
- MD/VDP: added window layer emulation
- MD/controller/gamepad: fixed d2,d3 bits (Altered Beast requires
this)
The Z80 is definitely a lot nastier than the LR35902. There's a lot of
table duplication with HL→IX→IY; and two of them nest two levels deep
(eg FD CB xx xx), so the design may change as I implement more.
byuu says:
Changelog:
- new md/bus/ module for bus reads/writes
- abstracts byte/word accesses wherever possible (everything but
RAM; forces all but I/O to word, I/O to byte)
- holds the system RAM since that's technically not part of the
CPU anyway
- added md/controller and md/system/peripherals
- added emulation of gamepads
- added stub PSG audio output (silent) to cap the framerate at 60fps
with audio sync enabled
- fixed VSRAM reads for plane vertical scrolling (two bugs here: add
instead of sub; interlave plane A/B)
- mask nametable read offsets (can't exceed 8192-byte nametables
apparently)
- emulated VRAM/VSRAM/CRAM reads from VDP data port
- fixed sprite width/height size calculations
- added partial emulation of 40-tile per scanline limitation (enough
to fix Sonic's title screen)
- fixed off-by-one sprite range testing
- fixed sprite tile indexing
- Vblank happens at Y=224 with overscan disabled
- unsure what happens when you toggle it between Y=224 and Y=240
... probably bad things
- fixed reading of address register for ADDA, CMPA, SUBA
- fixed sign extension for MOVEA effect address reads
- updated MOVEM to increment the read addresses (but not writeback)
for (aN) mode
With all of that out of the way, we finally have Sonic the Hedgehog
(fully?) playable. I played to stage 1-2 and through the special stage,
at least. EDIT: yeah, we probably need HIRQs for Labyrinth Zone.
Not much else works, of course. Most games hang waiting on the Z80, and
those that don't (like Altered Beast) are still royally screwed. Tons of
features still missing; including all of the Z80/PSG/YM2612.
A note on the perihperals this time around: the Mega Drive EXT port is
basically identical to the regular controller ports. So unlike with the
Famicom and Super Famicom, I'm inheriting the exension port from the
controller class.
byuu says:
Changelog:
- 68K: fixed NEG/NEGX operand order
- 68K: fixed bug in disassembler that was breaking trace logging
- VDP: improved sprite rendering (still 100% broken)
- VDP: added horizontal/vertical scrolling (90% broken)
Forgot:
- 68K: fix extension word sign bit on indexed modes for disassembler
as well
- 68K: emulate STOP properly (use r.stop flag; clear on IRQs firing)
I'm really wearing out fast here. The Genesis documentation is somehow
even worse than Game Boy documentation, but this is a far more complex
system.
It's a massive time sink to sit here banging away at every possible
combination of how things could work, only to see no positive
improvements. Nothing I do seems to get sprites to do a goddamn thing.
squee says the sprite Y field is 10-bits, X field is 9-bits. genvdp says
they're both 10-bits. BlastEm treats them like they're both 10-bits,
then masks off the upper bit so it's effectively 9-bits anyway.
Nothing ever bothers to tell you whether the horizontal scroll values
are supposed to add or subtract from the current X position. Probably
the most basic detail you could imagine for explaining horizontal
scrolling and yet ... nope. Nothing.
I can't even begin to understand how the VDP FIFO functionality works,
or what the fuck is meant by "slots".
I'm completely at a loss as how how in the holy hell the 68K works with
8-bit accesses. I don't know whether I need byte/word handlers for every
device, or if I can just hook it right into the 68K core itself. This
one's probably the most major design detail. I need to know this before
I go and implement the PSG/YM2612/IO ports-\>gamepads/Z80/etc.
Trying to debug the 68K is murder because basically every game likes to
start with a 20,000,000-instruction reset phase of checksumming entire
games, and clearing out the memory as agonizingly slowly as humanly
possible. And like the ARM, there's too many registers so I'd need three
widescreen monitors to comfortably view the entire debugger output lines
onscreen.
I can't get any test ROMs to debug functionality outside of full games
because every **goddamned** test ROM coder thinks it's acceptable to tell
people to go fetch some toolchain from a link that died in the late '90s
and only works on MS-DOS 6.22 to build their fucking shit, because god
forbid you include a 32KiB assembled ROM image in your fucking archives.
... I may have to take a break for a while. We'll see.
byuu says:
Changelog:
- 68K: MOVEQ is 8-bit signed
- 68K: disassembler was print EOR for OR instructions
- 68K: address/program-counter indexed mode had the signed-word/long
bit backward
- 68K: ADDQ/SUBQ #n,aN always works in long mode; regardless of size
- 68K→VDP DMA needs to use `mode.bit(0)<<22|dmaSource`; increment by
one instead of two
- Z80: added registers and initial two instructions
- MS: hooked up enough to load and start running games
- Sonic the Hedgehog can execute exactly one instruction... whoo.
byuu says:
Sorry, two WIPs in one day. Got excited and couldn't wait.
Changelog:
- ADDQ, SUBQ shouldn't update flags when targeting an address register
- ADDA should sign extend effective address reads
- JSR was pushing the PC too early
- some improvements to 8-bit register reads on the VDP (still needs
work)
- added H/V counter reads to the VDP IO port region
- icarus: added support for importing Master System and Game Gear ROMs
- tomoko: added library sub-menus for each manufacturer
- still need to sort Game Gear after Mega Drive somehow ...
The sub-menu system actually isn't all that bad. It is indeed a bit more
annoying, but not as annoying as I thought it was going to be. However,
it looks a hell of a lot nicer now.
byuu says:
Added VDP sprite rendering. Can't get any games far enough in to see if
it actually works. So in other words, it doesn't work at all and is 100%
completely broken.
Also added 68K exceptions and interrupts. So far only the VDP interrupt
is present. It definitely seems to be firing in commercial games, so
that's promising. But the implementation is almost certainly completely
wrong. There is fuck all of nothing for documentation on how interrupts
actually work. I had to find out the interrupt vector numbers from
reading the comments from the Sonic the Hedgehog disassembly. I have
literally no fucking clue what I0-I2 (3-bit integer priority value in
the status register) is supposed to do. I know that Vblank=6, Hblank=4,
Ext(gamepad)=2. I know that at reset, SR.I=7. I don't know if I'm
supposed to block interrupts when I is >, >=, <, <= to the interrupt
level. I don't know what level CPU exceptions are supposed to be.
Also implemented VDP regular DMA. No idea if it works correctly since
none of the commercial games run far enough to use it. So again, it's
horribly broken for usre.
Also improved VDP fill mode. But I don't understand how it takes
byte-lengths when the bus is 16-bit. The transfer times indicate it's
actually transferring at the same speed as the 68K->VDP copy, strongly
suggesting it's actually doing 16-bit transfers at a time. In which case,
what happens when you set an odd transfer length?
Also, both DMA modes can now target VRAM, VSRAM, CRAM. Supposedly there's
all kinds of weird shit going on when you target VSRAM, CRAM with VDP
fill/copy modes, but whatever. Get to that later.
Also implemented a very lazy preliminary wait mechanism to to stall out
a processor while another processor exerts control over the bus. This
one's going to be a major work in progress. For one, it totally breaks
the model I use to do save states with libco. For another, I don't
know if a 68K->VDP DMA instantly locks the CPU, or if it the CPU could
actually keep running if it was executing out of RAM when it started
the DMA transfer from ROM (eg it's a bus busy stall, not a hard chip
stall.) That'll greatly change how I handle the waiting.
Also, the OSS driver now supports Audio::Latency. Sound should be
even lower latency now. On FreeBSD when set to 0ms, it's absolutely
incredible. Cannot detect latency whatsoever. The Mario jump sound seems
to happen at the very instant I hear my cherry blue keyswitch activate.
byuu says:
Changelog:
- 68K: fixed bug that affected BSR return address
- VDP: added very preliminary emulation of planes A, B, W (W is
entirely broken though)
- VDP: added command/address stuff so you can write to VRAM, CRAM,
VSRAM
- VDP: added VRAM fill DMA
I would be really surprised if any commercial games showed anything at
all, so I'd probably recommend against wasting your time trying, unless
you're really bored :P
Also, I wanted to add: I am accepting patches\! So if anyone wants to
look over the 68K core for bugs, that would save me untold amounts of
time in the near future :D
byuu says:
Changelog:
- pulled the (u)intN type aliases into higan instead of leaving them
in nall
- added 68K LINEA, LINEF hooks for illegal instructions
- filled the rest of the 68K lambda table with generic instance of
ILLEGAL
- completed the 68K disassembler effective addressing modes
- still unsure whether I should use An to decode absolute
addresses or not
- pro: way easier to read where accesses are taking place
- con: requires An to be valid; so as a disassembler it does a
poor job
- making it optional: too much work; ick
- added I/O decoding for the VDP command-port registers
- added skeleton timing to all five processor cores
- output at 1280x480 (needed for mixed 256/320 widths; and to handle
interlace modes)
The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and
syncing; which is the pathological worst case for libco. But they also
have no logic inside of them. With all the above, I'm averaging around
250fps with just the 68K core actually functional, and the VDP doing a
dumb "draw white pixels" loop. Still way too early to tell how this
emulator is going to perform.
Also, the 320x240 mode of the Genesis means that we don't need an aspect
correction ratio. But we do need to ensure the output window is a
multiple 320x240 so that the scale values work correctly. I was
hard-coding aspect correction to stretch the window an additional \*8/7.
But that won't work anymore so ... the main higan window is now 640x480,
960x720, or 1280x960. Toggling aspect correction only changes the video
width inside the window.
It's a bit jarring ... the window is a lot wider, more black space now
for most modes. But for now, it is what it is.
byuu says:
The 68K core now implements all 88 instructions. It ended up being 111
instructions in my core due to splitting up opcodes with the same name
but different addressing modes or directions (removes conditions at the
expense of more code.)
Technically, I don't have exceptions actually implemented yet, and
RESET/STOP don't do anything but set flags. So there's still more to
go. But ... close enough for statistics time!
The M68K core source code is 124,712 bytes in size. The next largest
core is the ARM7 core at 70,203 bytes in size.
The M68K object size is 942KiB; with the next largest being the V30MZ
core at 173KiB.
There are a total of 19,656 invalid opcodes in the 68000 revision (unless
of course I've made mistakes in my mappings, which is very probably.)
Now the fun part ... figuring out how to fix bugs in this core without
VDP emulation :/
byuu says:
Changelog:
- Emulator: use `(uintmax)-1 >> 1` for the units of time
- MD: implemented 13 new 68K instructions (basically all of the
remaining easy ones); 21 remain
- nall: replaced `(u)intmax_t` (64-bit) with *actual* `(u)intmax` type
(128-bit where available)
- this extends to everything: atoi, string, etc. You can even
print 128-bit variables if you like
22,552 opcodes still don't exist in the 68K map. Looking like quite a
few entries will be blank once I finish.
byuu says:
Changelog:
- added eight more 68K instructions
- split ADD(direction) into two separate ADD functions
I now have 54 out of 88 instructions implemented (thus, 34 remaining.)
The map is missing 25,182 entries out of 65,536. Down from 32,680 for
v101.00
Aside: this version number feels really silly. r10 and r11 surely will
as well ...
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
byuu says:
All of the above fixes, plus I added all 24 variations on the shift
opcodes, plus SUBQ, plus fixes to the BCC instruction.
I can now run 851,767 instructions into Sonic the Hedgehog before hitting
an unimplemented instruction (SUB).
The 68K core is probably only ~35% complete, and yet it's already within
4KiB of being the largest CPU core, code size wise, in all of higan. Fuck
this chip.
byuu says:
I split the Register class and read/write handlers into DataRegister and
AddressRegister, given that they have different behaviors on byte/word
accesses (data tends to preserve the upper bits; address tends to
sign-extend things.)
I expanded EA to EffectiveAddress. No sense in abbreviating things
to death.
I've now implemented 26 instructions. But the new ones are just all the
stupid from/to ccr/sr instructions.
Ryphecha confirmed that you can't set the undefined bits, so I don't
think the BitField concept is appropriate for the CCR/SR. Instead, I'm
just storing direct flags and have (read,write)(CCR,SR) instead. This
isn't like the 65816 where you have subroutines that push and pop the
flag register. It's much more common to access individual flags. Doesn't
match the consistency angle of the other CPU cores, but ... I think this
is the right thing to for the 68K specifically.
byuu says:
Redesigned the handling of reading/writing registers to be about eight
times faster than the old system. More work may be needed ... it seems
data registers tend to preserve their upper bits upon assignment; whereas
address registers tend to sign-extend values into them. It may make
sense to have DataRegister and AddressRegister classes with separate
read/write handlers. I'd have to hold two Register objects inside the
EffectiveAddress (EA) class if we do that.
Implemented 19 opcodes now (out of somewhere between 60 and 90.) That gets
the first ~530,000 instructions in Sonic the Hedgehog running (though
probably wrong. But we can run a lot thanks to large initialization
loops.)
If I force the core to loop back to the reset vector on an invalid opcode,
I'm getting about 1500fps with a dumb 320x240 blit 60 times a second and
just the 68K running alone (no Z80, PSG, VDP, YM2612.) I don't know if
that's good or not. I guess we'll find out.
I had to stop tonight because the final opcode I execute is an RTS
(return from subroutine) that's branching back to address 0; which is
invalid ... meaning something went terribly wrong and the system crashed.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
byuu says:
Four and a half hours of work and ... zero new opcodes implemented.
This was the best job I could do refining the effective address
computations. Should have all twelve 68000 modes implemented now. Still
have a billion questions about when and how I'm supposed to perform
certain edge case operations, though.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
byuu says:
Alright, I'm definitely going to need to find some people willing to
tolerate my questions on this chip, so I'm going to go ahead and announce
I'm working on this I guess.
This core is way too big for a surprise like the NES and WS cores
were. It'll probably even span multiple v10x releases before it's
even ready.
byuu says:
I now have enough of three instructions implemented to get through the
first four instructions in Sonic the Hedgehog.
But they're far from complete. The very first instruction uses EA
addressing, which is similar to x86's ModRM in terms of how disgustingly
complex it is. And it also accesses Z80 control registers, which obviously
isn't going to do anything yet.
The slow speed was me being stupid again. It's not 7.6MHz per frame,
it's 7.67MHz per second. So yeah, speed is so far acceptable again. But
we'll see how things go as I keep emulating more. The 68K decode is not
pretty at all.
byuu says:
Changelog:
- moved Thread, Scheduler, Cheat functionality into emulator/ for
all cores
- start of actual Mega Drive emulation (two 68K instructions)
I'm going to be rather terse on MD emulation, as it's too early for any
meaningful dialogue here.
byuu says:
Sigh ... I'm really not a good person. I'm inherently selfish.
My responsibility and obligation right now is to work on loki, and
then on the Tengai Makyou Zero translation, and then on improving the
Famicom emulation.
And yet ... it's not what I really want to do. That shouldn't matter;
I should work on my responsibilities first.
Instead, I'm going to be a greedy, self-centered asshole, and work on
what I really want to instead.
I'm really sorry, guys. I'm sure this will make a few people happy,
and probably upset even more people.
I'm also making zero guarantees that this ever gets finished. As always,
I wish I could keep these things secret, so if I fail / give up, I could
just drop it with no shame. But I would have to cut everyone out of the
WIP process completely to make it happen. So, here goes ...
This WIP adds the initial skeleton for Sega Mega Drive / Genesis
emulation. God help us.
(minor note: apparently the new extension for Mega Drive games is .md,
neat. That's what I chose for the folders too. I thought it was .smd,
so that'll be fixed in icarus for the next WIP.)
(aside: this is why I wanted to get v100 out. I didn't want this code in
a skeleton state in v100's source. Nor did I want really broken emulation,
which the first release is sure to be, tarring said release.)
...
So, basically, I've been ruminating on the legacy I want to leave behind
with higan. 3D systems are just plain out. I'm never going to support
them. They're too complex for my abilities, and they would run too slowly
with my design style. I'm not willing to compromise my design ideals. And
I would never want to play a 3D game system at native 240p/480i resolution
... but 1080p+ upscaling is not accurate, so that's a conflict I want
to avoid entirely. It's also never going to emulate computer systems
(X68K, PC-98, FM-Towns, etc) because holy shit that would completely
destroy me. It's also never going emulate arcade machines.
So I think of higan as a collection of 2D emulators for consoles
and handhelds. I've gone over every major 2D gaming system there is,
looking for ones with games I actually care about and enjoy. And I
basically have five of those systems supported already. Looking at the
remaining list, I see only three systems left that I have any interest
in whatsoever: PC-Engine, Master System, Mega Drive. Again, I'm not in
any way committing to emulating any of these, but ... if I had all of
those in higan, I think I'd be content to really, truly, finally stop
writing more emulators for the rest of my life.
And so I decided to tackle the most difficult system first. If I'm
successful, the Z80 core should cover a lot of the work on the SMS. And
the HuC6280 should land somewhere between the NES and SNES in terms of
difficulty ... closer to the NES.
The systems that just don't appeal to me at all, which I will never touch,
include, but are not limited to:
* Atari 2600/5200/7800
* Lynx
* Jaguar
* Vectrex
* Colecovision
* Commodore 64
* Neo-Geo
* Neo-Geo Pocket / Color
* Virtual Boy
* Super A'can
* 32X
* CD-i
* etc, etc, etc.
And really, even if something were mildly interesting in there ... we
have to stop. I can't scale infinitely. I'm already way past my limit,
but I'm doing this anyway. Too many cores bloats everything and kills
quality on everything. I don't want higan to become MESS v2.
I don't know what I'll do about the Famicom Disk System, PC-Engine CD,
and Mega CD. I don't think I'll be able to achieve 60fps emulating the
Mega CD, even if I tried to.
I don't know what's going to happen here with even the Mega Drive. Maybe
I'll get driven crazy with the documentation and quit. Maybe it'll end
up being too complicated and I'll quit. Maybe the emulation will end up
way too slow and I'll give up. Maybe it'll take me seven years to get
any games playable at all. Maybe Steve Snake, AamirM and Mike Pavone
will pool money to hire a hitman to come after me. Who knows.
But this is what I want to do, so ... here goes nothing.
[This version, with the internal version number changed back to "v100",
replaced the original v100 source archive on byuu.org soon after v100's
release, because it fixes important bugs in that version. --Ed]
byuu says:
Changelog:
- fixed default paths for Sufami Turbo slotted games
- moved WonderSwan orientation controls to the port rather than the device
- I do like hex_usr's idea here; but that'll need more consideration;
so this is a temporary fix
- added new debugger interface (see the public topic for more on that)
byuu says:
Changelog:
- hiro: BrowserDialog can navigate up to drive selection on Windows
- nall: (file,path,dir,base,prefix,suffix)name =>
Location::(file,path,dir,base,prefix,suffix)
- higan/tomoko: rename audio filter label from "Sinc" to "IIR - Biquad"
- higan/tomoko: allow loading files via icarus on the command-line
once again
- higan/tomoko: (begrudging) quick hack to fix presentation window focus
on startup
- higan/audio: don't divide output audio volume by number of streams
- processor/r65816: fix a regression in (read,write)DB; fixes Taz-Mania
- fixed compilation regressions on Windows and Linux
I'm happy with where we are at with code cleanups and stability, so I'd
like to release v100. But even though I'm not assigning any special
significance to this version, we should probably test it more thoroughly
first.
byuu says:
Changelog:
- (u)int(max,ptr) abbreviations removed; use _t suffix now [didn't feel
like they were contributing enough to be worth it]
- cleaned up nall::integer,natural,real functionality
- toInteger, toNatural, toReal for parsing strings to numbers
- fromInteger, fromNatural, fromReal for creating strings from numbers
- (string,Markup::Node,SQL-based-classes)::(integer,natural,real)
left unchanged
- template<typename T> numeral(T value, long padding, char padchar)
-> string for print() formatting
- deduces integer,natural,real based on T ... cast the value if you
want to override
- there still exists binary,octal,hex,pointer for explicit print()
formatting
- lstring -> string_vector [but using lstring = string_vector; is
declared]
- would be nice to remove the using lstring eventually ... but that'd
probably require 10,000 lines of changes >_>
- format -> string_format [no using here; format was too ambiguous]
- using integer = Integer<sizeof(int)*8>; and using natural =
Natural<sizeof(uint)*8>; declared
- for consistency with boolean. These three are meant for creating
zero-initialized values implicitly (various uses)
- R65816::io() -> idle() and SPC700::io() -> idle() [more clear; frees
up struct IO {} io; naming]
- SFC CPU, PPU, SMP use struct IO {} io; over struct (Status,Registers) {}
(status,registers); now
- still some CPU::Status status values ... they didn't really fit into
IO functionality ... will have to think about this more
- SFC CPU, PPU, SMP now use step() exclusively instead of addClocks()
calling into step()
- SFC CPU joypad1_bits, joypad2_bits were unused; killed them
- SFC PPU CGRAM moved into PPU::Screen; since nothing else uses it
- SFC PPU OAM moved into PPU::Object; since nothing else uses it
- the raw uint8[544] array is gone. OAM::read() constructs values from
the OAM::Object[512] table now
- this avoids having to determine how we want to sub-divide the two
OAM memory sections
- this also eliminates the OAM::synchronize() functionality
- probably more I'm forgetting
The FPS fluctuations are driving me insane. This WIP went from 128fps to
137fps. Settled on 133.5fps for the final build. But nothing I changed
should have affected performance at all. This level of fluctuation makes
it damn near impossible to know whether I'm speeding things up or slowing
things down with changes.
byuu says:
Changelog:
- GB core code cleanup completed
- GBA core code cleanup completed
- some more cleanup on missed processor/arm functions/variables
- fixed FC loading icarus bug
- "Load ROM File" icarus functionality restored
- minor code unification efforts all around (not perfect yet)
- MMIO->IO
- mmio.cpp->io.cpp
- read,write->readIO,writeIO
It's been a very long work in progress ... starting all the way back with
v094r09, but the major part of the higan code cleanup is now completed! Of
course, it's very important to note that this is only for the basic style:
- under_score functions and variables are now camelCase
- return-type function-name() are now auto function-name() -> return-type
- Natural<T>/Integer<T> replace (u)intT_n types where possible
- signed/unsigned are now int/uint
- most of the x==true,x==false tests changed to x,!x
A lot of spot improvements to consistency, simplicity and quality have
gone in along the way, of course. But we'll probably never fully finishing
beautifying every last line of code in the entire codebase. Still,
this is a really great start. Going forward, WIP diffs should start
being smaller and of higher quality once again.
I know the joke is, "until my coding style changes again", but ... this
was way too stressful, way too time consuming, and way too risky. I'm
too old and tired now for extreme upheavel like this again. The only
major change I'm slowly mulling over would be renaming the using
Natural<T>/Integer<T> = (u)intT; shorthand to something that isn't as
easily confused with the (u)int_t types ... but we'll see. I'll definitely
continue to change small things all the time, but for the larger picture,
I need to just accept the style I have and live with it.
byuu says:
Changelog:
- fixed FC AxROM / VRC7 regression
- BitField split to BooleanBitField/NaturalBitField (in preparation
for IntegerBitField)
- BitFieldReference removed
- GB CPU cleaned up
- GB Cartridge + Mappers cleaned up
- SFC CGRAM is now emulated as uint15[256] instead of uint[512]
- sfc/ppu/memory.cpp no longer needed; removed
- purged SFC Debugger hooks for now (some of the operator[] calls were
bypassing them anyway)
Unfortunately, for reasons that defy all semblance of logic, the CGRAM
change caused a slight speed hit. As have the last few changes. We're
now down to around 129.5fps compared to 123.fps for v099 and 134.5fps
at our peak (v099r01-r02).
I really like the style I came up with for the Game Boy mappers to settle
the purpose(ROM,RAM) vs (rom,ram)Purpose naming convention. If I ever get
around to redoing the NES mappers, that's likely the approach I'll take.
byuu says:
Changelog:
- lots of code cleanups to processor/r6502 (the switch.cpp file is only
halfway done ...)
- lots of code cleanups to fc/cpu
- removed fc/input
- implemented fc/controller
hex_usr, you may not like this, but I want to keep the controller port
and expansion port interface separate, like I do with the SNES. I realize
the NES' is used more for controllers, and the SNES' more for hardware
expansions, but ... they're not compatible pinouts and you can't really
connect one to the other.
Right now, I've only implemented the controller portion. I'll have to
get to the peripheral portion later.
Also, the gamepad implementation there now may be wrong. It's based off
the Super Famicom version obviously. I'm not sure if the Famicom has
different behavior with latching $4016 writes, or not. But, it works in
Mega Man II, so it's a start.
Everyone, be sure to remap your controls, and then set port 1 -> gamepad
after loading your first Famicom game with the new WIP.
byuu says:
Changelog:
- added nall/bit-field.hpp
- updated all CPU cores (sans LR35902 due to some complexities) to use
BitFields instead of bools
- updated as many CPU cores as I could to use BitFields instead of union {
struct { uint8_t ... }; }; pairs
The speed changes are mostly a wash for this. In some instances,
I noticed a ~2-3% speedup (eg SNES emulation), and in others a 2-3%
slowdown (eg Famicom emulation.) It's within the margin of error, so
it's safe to say it has no impact.
This does give us a lot of new useful things, however:
- no more manual reconstruction of flag values from lots of left shifts
and ORs
- no more manual deconstruction of flag values from lots of ANDs
- ability to get completely free aliases to flag groups (eg GSU can
provide alt2, alt1 and also alt (which is alt2,alt1 combined)
- removes the need for the nasty order_lsbN macro hack (eventually will
make higan 100% endian independent)
- saves us from insane compilers that try and do nasty things with
alignment on union-structs
- saves us from insane compilers that try to store bit-field bits in
reverse order
- will allow some really novel new use cases (I'm planning an
instant-decode ARM opcode function, for instance.)
- reduces code size (we can serialize flag registers in one line instead
of one for each flag)
However, I probably won't use it for super critical code that's constantly
reading out register values (eg PPU MMIO registers.) I think there we
would end up with a performance penalty.
byuu says:
Changelog:
- hiro: fixed the BrowserDialog column resizing when navigating to new
folders (prevents clipping of filenames)
- note: this is kind of a quick-fix; but I have a good idea how to do
the proper fix now
- nall: added BitField<T, Lo, Hi> class
- note: not yet working on the SFC CPU class; need to go at it with
a debugger to find out what's happening
- GB: emulated DMG/SGB STAT IRQ bug; fixes Zerd no Densetsu and Road Rash
(won't fix anything else; don't get hopes up)
byuu says:
Changelog:
- fixed Super Game Boy regression from v096r04 with bottom tile row
flickering
- fixed GB STAT IRQ regression from previous WIP
- Altered Space is now playable
- GBVideoPlayer isn't; but nobody seems to know exactly what weird
hardware quirk that one relies on to work
- ~3-4% speed improvement in SuperFX games by eliminating function<>
callback on register assignments
- most noticeable in Doom in-game; least noticeable on Yoshi's Island
title screen (darn)
- finished GSU core and SuperFX coprocessor code cleanups
- did some more work cleaning up the LR35902 core and GB CPU code
Just a fair warning: don't get your hopes up on these GB
fixes. Cliffhanger now hangs completely (har har), and none of the
other bugs are fixed. We pretty much did all this work just for Altered
Space. So, I hope you like playing Altered Space.
byuu says:
Changelog:
- GNUmakefile: reverted $(call unique,) to $(strip)
- processor/r6502: removed templates; reduces object size from 146.5kb
to 107.6kb
- processor/lr35902: removed templates; reduces object size from 386.2kb
to 197.4kb
- processor/spc700: merged op macros for switch table declarations
- sfc/coprocessor/sa1: partial cleanups; flattened directory structure
- sfc/coprocessor/superfx: partial cleanups; flattened directory structure
- sfc/coprocessor/icd2: flattened directory structure
- gb/ppu: changed behavior of STAT IRQs
Major caveat! The GB/GBC STAT IRQ changes has a major bug in it somewhere
that's seriously breaking most games. I'm pushing the WIP anyway, because
I believe the changes to be mostly correct. I'd like to get more people
looking at these changes, and also try more heavy-handed hacking and
diff comparison logging between the previous WIP and this one.
byuu says:
Changelog:
- removed template usage from processor/spc700; cleaned up many function
names and the switch table
- object size: 176.8kb => 127.3kb
- source code size: 43.5kb => 37.0kb
- fixed processor/r65816 BRK/COP vector regression [hex_usr]
- corrected HuC3 unmapped RAM read value; fixes Robopon [endrift]
- cosmetic: simplified the butterworth constant calculation
[Wolfram|Alpha]
The SPC700 core changes took forever, about three hours of work.
Only the LR35902 and R6502 still need their template functions
removed. The point of this is that it doesn't cause any speed penalty
to do so, and it results in smaller binary sizes and faster compilation
times.