byuu says:
Changelog:
- emulator/interface: removed unused Region struct
- gba/cpu: optimized CPU::step() as much as I could for a slight
speedup¹
- gba/cpu: synchronize the APU better during FIFO updates
- higan/md, icarus: add automatic region detection; make it the
default option [hex\_usr]
- picks NTSC-J if there's more than one match ... eventually, this
will be a setting
- higan/md, icarus: support all three combinations of SRAM (8-bit low,
8-bit high, 16-bit)
- processor/arm7tdmi: fix bug when changing to THUMB mode via MSR
[MerryMage]
- tomoko: redesigned crash detector to only occur once for all three
ruby drivers
- this will reduce disk thrashing since the configuration file
only needs to be written out one extra time
- technically, it's twice ... but we should've always been writing
one out on first run in case it crashes then
- tomoko: defaulted back to the safest ruby drivers, given the optimal
drivers have some stability concerns
¹: minor errata: spotted a typo saying `synchronize(cpu)` when the CPU
is stopped, instead of `synchronize(ppu)`. This will be fixed in the v104
official 7zip archives.
I'm kind of rushing here but, it's really good timing for me to push out
a new official release. The blocking issues are resolved or close to it,
and we need lots of testing of the new major changes.
I'm going to consider this a semi-stable testing release and leave links
to v103 just in case.
byuu says:
Changelog:
- Master System: merged Bus into CPU
- Mega Drive: merged BusCPU into CPU; BusAPU into AU
- Mega Drive: added TMSS emulation; disabled by default [hex\_usr]
- VDP lockout not yet emulated
- processor/arm7tdmi: renamed interrupt() to exception()
- processor/arm7tdmi: CPSR.F (FIQ disable) flag is set on reset
- processor/arm7tdmi: pipeline decode stage caches CPSR.T (THUMB mode)
[MerryMage]
- fixes `msr_tests.gba` test F
- processor/arm7tdmi/disassembler: add PC address to left of currently
executing instruction
- processor/arm7tdmi: stop forcing CPSR.M (mode flags) bit 4 high (I
don't know what really happens here)
- processor/arm7tdmi: undefined instructions now generate Undefined
0x4 exception
- processor/arm7tdmi: thumbInstructionAddRegister masks PC by &~3
instead of &~2
- hopefully this is correct; &~2 felt very wrong
- processor/arm7tdmi: thumbInstructionStackMultiple can use sequential
timing for PC/LR PUSH/POP [Cydrak]
- systems/Mega Drive.sys: added tmss.rom; enable with cpu version=1
- tomoko: detect when a ruby video/audio/input driver crashes higan;
disable it on next program startup
v104 blockers:
- Mega Drive: support 8-bit SRAM (even if we don't support 16-bit;
don't force 8-bit to 16-bit)
- Mega Drive: add region detection support to icarus
- ruby: add default audio device information so certain drivers won't
default to silence out of the box
byuu says:
Changelog:
- gba/cpu: slight speedup to CPU::step()
- processor/arm7tdmi: fixed about ten bugs, ST018 and GBA games are
now playable once again
- processor/arm: removed core from codebase
- processor/v30mz: code cleanup (renamed functions; updated
instruction() for consistency with other cores)
It turns out on my much faster system, the new ARM7TDMI core is very
slightly slower than the old one (by about 2% or so FPS.) But the
CPU::step() improvement basically made it a wash.
So yeah, I'm in really serious trouble with how slow my GBA core is now.
Sigh.
As for higan/processor ... this concludes the first phase of major
cleanups and rewrites.
There will always be work to do, and I have two more phases in mind.
One is that a lot of the instruction disassemblers are very old. One
even uses sprintf still. I'd like to modernize them all. Also, the
ARM7TDMI core (and the ARM core before it) can't really disassemble
because the PC address used for instruction execution is not known prior
to calling instruction(), due to pipeline reload fetches that may occur
inside of said function. I had a nasty hack for debugging the new core,
but I'd like to come up with a clean way to allow tracing the new
ARM7TDMI core.
Another is that I'd still like to rename a lot of instruction function
names in various cores to be more descriptive. I really liked how the
LR35902 core came out there, and would like to get that level of detail
in with the other cores as well.
byuu says:
Changelog:
- processor/arm7tdmi: completed implemented
- gba/cpu, sfc/coprocessor/armdsp: use arm7tdmi instead of arm
- sfc/cpu: experimental fix for newly discovered HDMA emulation issue
Notes:
The ARM7TDMI core crashes pretty quickly when trying to run GBA games,
and I'm certain the same will be the case with the ST018. It was never
all that likely I could rewrite 70KiB of code in 20 hours and have it
work perfectly on the first try. So, now it's time for lots and lots of
debugging. Any help would *really* be appreciated, if anyone were up for
comparing the two implementations for regressions =^-^= I often have a
really hard time spotting simple typos that I make.
Also, the SNES HDMA fix is temporary. I would like it if testers could
run through a bunch of games that are known for being tricky with HDMA
(or if these aren't known to said tester, any games are fine then.) If
we can confirm regressions, then we'll know the fix is either incorrect
or incomplete. But if we don't find any, then it's a good sign that
we're on the right path.
byuu says:
Changelog:
- processor/arm7tdmi: implementation all nine remaining ARM
instructions
- processor/arm7tdmi: implemented five more THUMB instructions
(sixteen remain)
byuu says:
Changelog:
- processor/arm7tdmi: implemented 10 of 19 ARM instructions
- processor/arm7tdmi: implemented 1 of 22 THUMB instructions
Today's WIP was 6 hours of work, and yesterday's was 5 hours.
Half of today was just trying to come up with the design to use a
lambda-based dispatcher to map both instructions and disassembly,
similar to the 68K core. The problem is that the ARM core has 28 unique
bits, which is just far too many bits to have a full lookup table like
the 16-bit 68K core.
The thing I wanted more than anything else was to perform the opcode
bitfield decoding once, and have it decoded for both instructions and
the disassembler. It took three hours to come up with a design that
worked for the ARM half ... relying on #defines being able to pull in
other #defines that were declared and changed later after the first
one. But, I'm happy with it. The decoding is in the table building, as
it is with the 68K core. The decoding does happen at run-time on each
instruction invocation, but it has to be done.
As to the THUMB core, I can create a 64K-entry lambda table to cover all
possible encodings, and ... even though it's a cache killer, I've
decided to go for it, given the outstanding performance it obtained in
the M68K core, as well as considering that THUMB mode is far more common
in GBA games.
As to both cores ... I'm a little torn between two extremes:
On the one hand, I can condense the number of ARM/THUMB instructions
further to eliminate more redundant code. On the other, I can split them
apart to reduce the number of conditional tests needed to execute each
instruction. It's really the disassembler that makes me not want to
split them up further ... as I have to split the disassembler functions
up equally to the instruction functions. But it may be worth it if it's
a speed improvement.
byuu says:
Changelog:
- hiro/windows: set dpiAware=false, fixes icarus window sizes relative
to higan window sizes
- higan, icarus, hiro, ruby: add support for high resolution displays
on macOS [ncbncb]
- processor/lr35902-legacy: removed
- processor/arm7tdmi: new processor core started; intended to one day
be a replacement for processor/arm
It will probably take several WIPs to get the new ARM core up and
running. It's the last processor rewrite. After this, all processor
cores will be up to date with all my current programming conventions.
byuu says:
Changelog:
- processor/lr35902: completed rewrite
I'd appreciate regression testing of the Game Boy and Game Boy Color
emulation between v103r24 and v103r26 (skip r25) if anyone wouldn't
mind.
I fixed up processor/lr35902-legacy to compile and run, so that trace
logs can be created between the two cores to find errors. I'm going to
kill processor/lr35902-legacy with the next WIP release, as well as make
changes to the trace format (add flags externally from AF; much easier
to read them that way), which will make it more difficult to do these
comparisons in the future, hence r26 may prove important later on if we
miss regressions this time.
As for the speed of the new CPU core, not too much to report ... at
least it's not slower :)
Mega Man II: 212.5 to 214.5fps
Shiro no Sho: 191.5 to 191.5fps
Oracle of Ages: 182.5 to 190.5fps
byuu says:
Changelog:
- gb/cpu: force STAT mode to 0 when LCD is disabled (fixes Pokemon
Pinball, etc)
- gb/ppu: when LCD is disabled, require at least one-frame wait to
re-enable, display white during this time
- todo: should step by a scanline at a time: worst-case is an
extra 99% of a frame to enable again
- gba/ppu: cache tilemap lookups and attribute parsing
- it's more accurate because the GBA wouldn't read this for every
pixel
- but unfortunately, this didn't provide any speedup at all ...
sigh
- ruby/audio/alsa: fixed const issue with free()
- ruby/video/cgl: removed `glDisable(GL_ALPHA_TEST)` [deprecated]
- ruby/video/cgl: removed `glEnable(GL_TEXTURE_2D)` [unnecessary as
we use shaders]
- processor/lr35902: started rewrite¹
¹: so, the Game Boy and Game Boy Color cores will be completely
broken for at least the next two or three WIPs.
The old LR35902 was complete garbage, written in early 2011. So I'm
rewriting it to provide a massive cleanup and consistency with other
processor cores, especially the Z80 core.
I've got about 85% of the main instructions implemented, and then I have
to do the CB instructions. The CB instructions are easier because
they're mostly just a small number of opcodes in many small variations,
but it'll still be tedious.
byuu says:
Changelog:
- gb/mbc6: mapper is now functional, but Net de Get has some text
corruption¹
- gb/mbc7: mapper is now functional²
- gb/cpu: HDMA syncs other components after each byte transfer now
- gb/ppu: LY,LX forced to zero when LCDC.d7 is lowered (eg disabled),
not when it's raised (eg enabled)
- gb/ppu: the LCD does not run at all when LCDC.d7 is clear³
- fixes graphical corruption between scene transitions in Legend
of Zelda - Oracle of Ages
- thanks to Cydrak, Shonumi, gekkio for their input on the cause
of this issue
- md/controller: renamed "Gamepad" to "Control Pad" per official
terminology
- md/controller: added "Fighting Pad" (6-button controller) emulation
[hex\_usr]
- processor/m68k: fixed TAS to set data.d7 when
EA.mode==DataRegisterDirect; fixes Asterix
- hiro/windows: removed carriage returns from mouse.cpp and
desktop.cpp
- ruby/audio/alsa: added device driver selection [SuperMikeMan]
- ruby/audio/ao: set format.matrix=nullptr to prevent a crash on some
systems [SuperMikeMan]
- ruby/video/cgl: rename term() to terminate() to fix a crash on macOS
[Sintendo]
¹: The observation that this mapper split $4000-7fff into two banks
came from MAME's implementation. But their implementation was quite
broken and incomplete, so I didn't actually use any of it. The
observation that this mapper split $a000-bfff into two banks came from
Tauwasser, and I did directly use that information, plus the knowledge
that $0400/$0800 are the RAM bank select registers.
The text corruption is due to a race condition with timing. The game is
transferring font letters via HDMA, but the game code ends up setting
the bank# with the font a bit too late after the HDMA has already
occurred. I'm not sure how to fix this ... as a whole, I assumed my Game
Boy timing was pretty good, but apparently it's not that good.
²: The entire design of this mapper comes from endrift's notes.
endrift gets full credit for higan being able to emulate this mapper.
Note that the accelerometer implementation is still not tested, and
probably won't work right until I tweak the sensitivity a lot.
³: So the fun part of this is ... it breaks the strict 60fps rate of
the Game Boy. This was always inevitable: certain timing conditions can
stretch frames, too. But this is pretty much an absolute deal breaker
for something like Vsync timing. This pretty much requires adaptive sync
to run well without audio stuttering during the transition.
There's currently one very important detail missing: when the LCD is
turned off, presumably the image on the screen fades to white. I do not
know how long this process takes, or how to really go about emulating
it. Right now as an incomplete patch, I'm simply leaving the last
displayed image on the screen until the LCD is turned on again. But I
will have to output white, as well as add code to break out of the
emulation loop periodically when the LCD is left off eg indefinitely, or
bad things would happen. I'll work something out and then implement.
Another detail is I'm not sure how long it takes for the LCD to start
rendering again once enabled. Right now, it's immediate. I've heard it's
as long as 1/60th of a second, but that really seems incredibly
excessive? I'd like to know at least a reasonably well-supported
estimate before I implement that.