byuu says:
Changelog:
- 68K: fixed bug that affected BSR return address
- VDP: added very preliminary emulation of planes A, B, W (W is
entirely broken though)
- VDP: added command/address stuff so you can write to VRAM, CRAM,
VSRAM
- VDP: added VRAM fill DMA
I would be really surprised if any commercial games showed anything at
all, so I'd probably recommend against wasting your time trying, unless
you're really bored :P
Also, I wanted to add: I am accepting patches\! So if anyone wants to
look over the 68K core for bugs, that would save me untold amounts of
time in the near future :D
byuu says:
Changelog:
- pulled the (u)intN type aliases into higan instead of leaving them
in nall
- added 68K LINEA, LINEF hooks for illegal instructions
- filled the rest of the 68K lambda table with generic instance of
ILLEGAL
- completed the 68K disassembler effective addressing modes
- still unsure whether I should use An to decode absolute
addresses or not
- pro: way easier to read where accesses are taking place
- con: requires An to be valid; so as a disassembler it does a
poor job
- making it optional: too much work; ick
- added I/O decoding for the VDP command-port registers
- added skeleton timing to all five processor cores
- output at 1280x480 (needed for mixed 256/320 widths; and to handle
interlace modes)
The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and
syncing; which is the pathological worst case for libco. But they also
have no logic inside of them. With all the above, I'm averaging around
250fps with just the 68K core actually functional, and the VDP doing a
dumb "draw white pixels" loop. Still way too early to tell how this
emulator is going to perform.
Also, the 320x240 mode of the Genesis means that we don't need an aspect
correction ratio. But we do need to ensure the output window is a
multiple 320x240 so that the scale values work correctly. I was
hard-coding aspect correction to stretch the window an additional \*8/7.
But that won't work anymore so ... the main higan window is now 640x480,
960x720, or 1280x960. Toggling aspect correction only changes the video
width inside the window.
It's a bit jarring ... the window is a lot wider, more black space now
for most modes. But for now, it is what it is.
byuu says:
The 68K core now implements all 88 instructions. It ended up being 111
instructions in my core due to splitting up opcodes with the same name
but different addressing modes or directions (removes conditions at the
expense of more code.)
Technically, I don't have exceptions actually implemented yet, and
RESET/STOP don't do anything but set flags. So there's still more to
go. But ... close enough for statistics time!
The M68K core source code is 124,712 bytes in size. The next largest
core is the ARM7 core at 70,203 bytes in size.
The M68K object size is 942KiB; with the next largest being the V30MZ
core at 173KiB.
There are a total of 19,656 invalid opcodes in the 68000 revision (unless
of course I've made mistakes in my mappings, which is very probably.)
Now the fun part ... figuring out how to fix bugs in this core without
VDP emulation :/
byuu says:
Changelog:
- Emulator: use `(uintmax)-1 >> 1` for the units of time
- MD: implemented 13 new 68K instructions (basically all of the
remaining easy ones); 21 remain
- nall: replaced `(u)intmax_t` (64-bit) with *actual* `(u)intmax` type
(128-bit where available)
- this extends to everything: atoi, string, etc. You can even
print 128-bit variables if you like
22,552 opcodes still don't exist in the 68K map. Looking like quite a
few entries will be blank once I finish.
byuu says:
Changelog:
- added eight more 68K instructions
- split ADD(direction) into two separate ADD functions
I now have 54 out of 88 instructions implemented (thus, 34 remaining.)
The map is missing 25,182 entries out of 65,536. Down from 32,680 for
v101.00
Aside: this version number feels really silly. r10 and r11 surely will
as well ...
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
byuu says:
All of the above fixes, plus I added all 24 variations on the shift
opcodes, plus SUBQ, plus fixes to the BCC instruction.
I can now run 851,767 instructions into Sonic the Hedgehog before hitting
an unimplemented instruction (SUB).
The 68K core is probably only ~35% complete, and yet it's already within
4KiB of being the largest CPU core, code size wise, in all of higan. Fuck
this chip.
byuu says:
I split the Register class and read/write handlers into DataRegister and
AddressRegister, given that they have different behaviors on byte/word
accesses (data tends to preserve the upper bits; address tends to
sign-extend things.)
I expanded EA to EffectiveAddress. No sense in abbreviating things
to death.
I've now implemented 26 instructions. But the new ones are just all the
stupid from/to ccr/sr instructions.
Ryphecha confirmed that you can't set the undefined bits, so I don't
think the BitField concept is appropriate for the CCR/SR. Instead, I'm
just storing direct flags and have (read,write)(CCR,SR) instead. This
isn't like the 65816 where you have subroutines that push and pop the
flag register. It's much more common to access individual flags. Doesn't
match the consistency angle of the other CPU cores, but ... I think this
is the right thing to for the 68K specifically.
byuu says:
Redesigned the handling of reading/writing registers to be about eight
times faster than the old system. More work may be needed ... it seems
data registers tend to preserve their upper bits upon assignment; whereas
address registers tend to sign-extend values into them. It may make
sense to have DataRegister and AddressRegister classes with separate
read/write handlers. I'd have to hold two Register objects inside the
EffectiveAddress (EA) class if we do that.
Implemented 19 opcodes now (out of somewhere between 60 and 90.) That gets
the first ~530,000 instructions in Sonic the Hedgehog running (though
probably wrong. But we can run a lot thanks to large initialization
loops.)
If I force the core to loop back to the reset vector on an invalid opcode,
I'm getting about 1500fps with a dumb 320x240 blit 60 times a second and
just the 68K running alone (no Z80, PSG, VDP, YM2612.) I don't know if
that's good or not. I guess we'll find out.
I had to stop tonight because the final opcode I execute is an RTS
(return from subroutine) that's branching back to address 0; which is
invalid ... meaning something went terribly wrong and the system crashed.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
byuu says:
Four and a half hours of work and ... zero new opcodes implemented.
This was the best job I could do refining the effective address
computations. Should have all twelve 68000 modes implemented now. Still
have a billion questions about when and how I'm supposed to perform
certain edge case operations, though.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
byuu says:
Alright, I'm definitely going to need to find some people willing to
tolerate my questions on this chip, so I'm going to go ahead and announce
I'm working on this I guess.
This core is way too big for a surprise like the NES and WS cores
were. It'll probably even span multiple v10x releases before it's
even ready.
byuu says:
I now have enough of three instructions implemented to get through the
first four instructions in Sonic the Hedgehog.
But they're far from complete. The very first instruction uses EA
addressing, which is similar to x86's ModRM in terms of how disgustingly
complex it is. And it also accesses Z80 control registers, which obviously
isn't going to do anything yet.
The slow speed was me being stupid again. It's not 7.6MHz per frame,
it's 7.67MHz per second. So yeah, speed is so far acceptable again. But
we'll see how things go as I keep emulating more. The 68K decode is not
pretty at all.
byuu says:
Changelog:
- moved Thread, Scheduler, Cheat functionality into emulator/ for
all cores
- start of actual Mega Drive emulation (two 68K instructions)
I'm going to be rather terse on MD emulation, as it's too early for any
meaningful dialogue here.
byuu says:
Sigh ... I'm really not a good person. I'm inherently selfish.
My responsibility and obligation right now is to work on loki, and
then on the Tengai Makyou Zero translation, and then on improving the
Famicom emulation.
And yet ... it's not what I really want to do. That shouldn't matter;
I should work on my responsibilities first.
Instead, I'm going to be a greedy, self-centered asshole, and work on
what I really want to instead.
I'm really sorry, guys. I'm sure this will make a few people happy,
and probably upset even more people.
I'm also making zero guarantees that this ever gets finished. As always,
I wish I could keep these things secret, so if I fail / give up, I could
just drop it with no shame. But I would have to cut everyone out of the
WIP process completely to make it happen. So, here goes ...
This WIP adds the initial skeleton for Sega Mega Drive / Genesis
emulation. God help us.
(minor note: apparently the new extension for Mega Drive games is .md,
neat. That's what I chose for the folders too. I thought it was .smd,
so that'll be fixed in icarus for the next WIP.)
(aside: this is why I wanted to get v100 out. I didn't want this code in
a skeleton state in v100's source. Nor did I want really broken emulation,
which the first release is sure to be, tarring said release.)
...
So, basically, I've been ruminating on the legacy I want to leave behind
with higan. 3D systems are just plain out. I'm never going to support
them. They're too complex for my abilities, and they would run too slowly
with my design style. I'm not willing to compromise my design ideals. And
I would never want to play a 3D game system at native 240p/480i resolution
... but 1080p+ upscaling is not accurate, so that's a conflict I want
to avoid entirely. It's also never going to emulate computer systems
(X68K, PC-98, FM-Towns, etc) because holy shit that would completely
destroy me. It's also never going emulate arcade machines.
So I think of higan as a collection of 2D emulators for consoles
and handhelds. I've gone over every major 2D gaming system there is,
looking for ones with games I actually care about and enjoy. And I
basically have five of those systems supported already. Looking at the
remaining list, I see only three systems left that I have any interest
in whatsoever: PC-Engine, Master System, Mega Drive. Again, I'm not in
any way committing to emulating any of these, but ... if I had all of
those in higan, I think I'd be content to really, truly, finally stop
writing more emulators for the rest of my life.
And so I decided to tackle the most difficult system first. If I'm
successful, the Z80 core should cover a lot of the work on the SMS. And
the HuC6280 should land somewhere between the NES and SNES in terms of
difficulty ... closer to the NES.
The systems that just don't appeal to me at all, which I will never touch,
include, but are not limited to:
* Atari 2600/5200/7800
* Lynx
* Jaguar
* Vectrex
* Colecovision
* Commodore 64
* Neo-Geo
* Neo-Geo Pocket / Color
* Virtual Boy
* Super A'can
* 32X
* CD-i
* etc, etc, etc.
And really, even if something were mildly interesting in there ... we
have to stop. I can't scale infinitely. I'm already way past my limit,
but I'm doing this anyway. Too many cores bloats everything and kills
quality on everything. I don't want higan to become MESS v2.
I don't know what I'll do about the Famicom Disk System, PC-Engine CD,
and Mega CD. I don't think I'll be able to achieve 60fps emulating the
Mega CD, even if I tried to.
I don't know what's going to happen here with even the Mega Drive. Maybe
I'll get driven crazy with the documentation and quit. Maybe it'll end
up being too complicated and I'll quit. Maybe the emulation will end up
way too slow and I'll give up. Maybe it'll take me seven years to get
any games playable at all. Maybe Steve Snake, AamirM and Mike Pavone
will pool money to hire a hitman to come after me. Who knows.
But this is what I want to do, so ... here goes nothing.
[This version, with the internal version number changed back to "v100",
replaced the original v100 source archive on byuu.org soon after v100's
release, because it fixes important bugs in that version. --Ed]
byuu says:
Changelog:
- fixed default paths for Sufami Turbo slotted games
- moved WonderSwan orientation controls to the port rather than the device
- I do like hex_usr's idea here; but that'll need more consideration;
so this is a temporary fix
- added new debugger interface (see the public topic for more on that)
byuu says:
Changelog:
- hiro: BrowserDialog can navigate up to drive selection on Windows
- nall: (file,path,dir,base,prefix,suffix)name =>
Location::(file,path,dir,base,prefix,suffix)
- higan/tomoko: rename audio filter label from "Sinc" to "IIR - Biquad"
- higan/tomoko: allow loading files via icarus on the command-line
once again
- higan/tomoko: (begrudging) quick hack to fix presentation window focus
on startup
- higan/audio: don't divide output audio volume by number of streams
- processor/r65816: fix a regression in (read,write)DB; fixes Taz-Mania
- fixed compilation regressions on Windows and Linux
I'm happy with where we are at with code cleanups and stability, so I'd
like to release v100. But even though I'm not assigning any special
significance to this version, we should probably test it more thoroughly
first.
byuu says:
Changelog:
- (u)int(max,ptr) abbreviations removed; use _t suffix now [didn't feel
like they were contributing enough to be worth it]
- cleaned up nall::integer,natural,real functionality
- toInteger, toNatural, toReal for parsing strings to numbers
- fromInteger, fromNatural, fromReal for creating strings from numbers
- (string,Markup::Node,SQL-based-classes)::(integer,natural,real)
left unchanged
- template<typename T> numeral(T value, long padding, char padchar)
-> string for print() formatting
- deduces integer,natural,real based on T ... cast the value if you
want to override
- there still exists binary,octal,hex,pointer for explicit print()
formatting
- lstring -> string_vector [but using lstring = string_vector; is
declared]
- would be nice to remove the using lstring eventually ... but that'd
probably require 10,000 lines of changes >_>
- format -> string_format [no using here; format was too ambiguous]
- using integer = Integer<sizeof(int)*8>; and using natural =
Natural<sizeof(uint)*8>; declared
- for consistency with boolean. These three are meant for creating
zero-initialized values implicitly (various uses)
- R65816::io() -> idle() and SPC700::io() -> idle() [more clear; frees
up struct IO {} io; naming]
- SFC CPU, PPU, SMP use struct IO {} io; over struct (Status,Registers) {}
(status,registers); now
- still some CPU::Status status values ... they didn't really fit into
IO functionality ... will have to think about this more
- SFC CPU, PPU, SMP now use step() exclusively instead of addClocks()
calling into step()
- SFC CPU joypad1_bits, joypad2_bits were unused; killed them
- SFC PPU CGRAM moved into PPU::Screen; since nothing else uses it
- SFC PPU OAM moved into PPU::Object; since nothing else uses it
- the raw uint8[544] array is gone. OAM::read() constructs values from
the OAM::Object[512] table now
- this avoids having to determine how we want to sub-divide the two
OAM memory sections
- this also eliminates the OAM::synchronize() functionality
- probably more I'm forgetting
The FPS fluctuations are driving me insane. This WIP went from 128fps to
137fps. Settled on 133.5fps for the final build. But nothing I changed
should have affected performance at all. This level of fluctuation makes
it damn near impossible to know whether I'm speeding things up or slowing
things down with changes.
byuu says:
Changelog:
- GB core code cleanup completed
- GBA core code cleanup completed
- some more cleanup on missed processor/arm functions/variables
- fixed FC loading icarus bug
- "Load ROM File" icarus functionality restored
- minor code unification efforts all around (not perfect yet)
- MMIO->IO
- mmio.cpp->io.cpp
- read,write->readIO,writeIO
It's been a very long work in progress ... starting all the way back with
v094r09, but the major part of the higan code cleanup is now completed! Of
course, it's very important to note that this is only for the basic style:
- under_score functions and variables are now camelCase
- return-type function-name() are now auto function-name() -> return-type
- Natural<T>/Integer<T> replace (u)intT_n types where possible
- signed/unsigned are now int/uint
- most of the x==true,x==false tests changed to x,!x
A lot of spot improvements to consistency, simplicity and quality have
gone in along the way, of course. But we'll probably never fully finishing
beautifying every last line of code in the entire codebase. Still,
this is a really great start. Going forward, WIP diffs should start
being smaller and of higher quality once again.
I know the joke is, "until my coding style changes again", but ... this
was way too stressful, way too time consuming, and way too risky. I'm
too old and tired now for extreme upheavel like this again. The only
major change I'm slowly mulling over would be renaming the using
Natural<T>/Integer<T> = (u)intT; shorthand to something that isn't as
easily confused with the (u)int_t types ... but we'll see. I'll definitely
continue to change small things all the time, but for the larger picture,
I need to just accept the style I have and live with it.
byuu says:
Changelog:
- fixed FC AxROM / VRC7 regression
- BitField split to BooleanBitField/NaturalBitField (in preparation
for IntegerBitField)
- BitFieldReference removed
- GB CPU cleaned up
- GB Cartridge + Mappers cleaned up
- SFC CGRAM is now emulated as uint15[256] instead of uint[512]
- sfc/ppu/memory.cpp no longer needed; removed
- purged SFC Debugger hooks for now (some of the operator[] calls were
bypassing them anyway)
Unfortunately, for reasons that defy all semblance of logic, the CGRAM
change caused a slight speed hit. As have the last few changes. We're
now down to around 129.5fps compared to 123.fps for v099 and 134.5fps
at our peak (v099r01-r02).
I really like the style I came up with for the Game Boy mappers to settle
the purpose(ROM,RAM) vs (rom,ram)Purpose naming convention. If I ever get
around to redoing the NES mappers, that's likely the approach I'll take.
byuu says:
Changelog:
- lots of code cleanups to processor/r6502 (the switch.cpp file is only
halfway done ...)
- lots of code cleanups to fc/cpu
- removed fc/input
- implemented fc/controller
hex_usr, you may not like this, but I want to keep the controller port
and expansion port interface separate, like I do with the SNES. I realize
the NES' is used more for controllers, and the SNES' more for hardware
expansions, but ... they're not compatible pinouts and you can't really
connect one to the other.
Right now, I've only implemented the controller portion. I'll have to
get to the peripheral portion later.
Also, the gamepad implementation there now may be wrong. It's based off
the Super Famicom version obviously. I'm not sure if the Famicom has
different behavior with latching $4016 writes, or not. But, it works in
Mega Man II, so it's a start.
Everyone, be sure to remap your controls, and then set port 1 -> gamepad
after loading your first Famicom game with the new WIP.
byuu says:
Changelog:
- added nall/bit-field.hpp
- updated all CPU cores (sans LR35902 due to some complexities) to use
BitFields instead of bools
- updated as many CPU cores as I could to use BitFields instead of union {
struct { uint8_t ... }; }; pairs
The speed changes are mostly a wash for this. In some instances,
I noticed a ~2-3% speedup (eg SNES emulation), and in others a 2-3%
slowdown (eg Famicom emulation.) It's within the margin of error, so
it's safe to say it has no impact.
This does give us a lot of new useful things, however:
- no more manual reconstruction of flag values from lots of left shifts
and ORs
- no more manual deconstruction of flag values from lots of ANDs
- ability to get completely free aliases to flag groups (eg GSU can
provide alt2, alt1 and also alt (which is alt2,alt1 combined)
- removes the need for the nasty order_lsbN macro hack (eventually will
make higan 100% endian independent)
- saves us from insane compilers that try and do nasty things with
alignment on union-structs
- saves us from insane compilers that try to store bit-field bits in
reverse order
- will allow some really novel new use cases (I'm planning an
instant-decode ARM opcode function, for instance.)
- reduces code size (we can serialize flag registers in one line instead
of one for each flag)
However, I probably won't use it for super critical code that's constantly
reading out register values (eg PPU MMIO registers.) I think there we
would end up with a performance penalty.
byuu says:
Changelog:
- hiro: fixed the BrowserDialog column resizing when navigating to new
folders (prevents clipping of filenames)
- note: this is kind of a quick-fix; but I have a good idea how to do
the proper fix now
- nall: added BitField<T, Lo, Hi> class
- note: not yet working on the SFC CPU class; need to go at it with
a debugger to find out what's happening
- GB: emulated DMG/SGB STAT IRQ bug; fixes Zerd no Densetsu and Road Rash
(won't fix anything else; don't get hopes up)
byuu says:
Changelog:
- fixed Super Game Boy regression from v096r04 with bottom tile row
flickering
- fixed GB STAT IRQ regression from previous WIP
- Altered Space is now playable
- GBVideoPlayer isn't; but nobody seems to know exactly what weird
hardware quirk that one relies on to work
- ~3-4% speed improvement in SuperFX games by eliminating function<>
callback on register assignments
- most noticeable in Doom in-game; least noticeable on Yoshi's Island
title screen (darn)
- finished GSU core and SuperFX coprocessor code cleanups
- did some more work cleaning up the LR35902 core and GB CPU code
Just a fair warning: don't get your hopes up on these GB
fixes. Cliffhanger now hangs completely (har har), and none of the
other bugs are fixed. We pretty much did all this work just for Altered
Space. So, I hope you like playing Altered Space.
byuu says:
Changelog:
- GNUmakefile: reverted $(call unique,) to $(strip)
- processor/r6502: removed templates; reduces object size from 146.5kb
to 107.6kb
- processor/lr35902: removed templates; reduces object size from 386.2kb
to 197.4kb
- processor/spc700: merged op macros for switch table declarations
- sfc/coprocessor/sa1: partial cleanups; flattened directory structure
- sfc/coprocessor/superfx: partial cleanups; flattened directory structure
- sfc/coprocessor/icd2: flattened directory structure
- gb/ppu: changed behavior of STAT IRQs
Major caveat! The GB/GBC STAT IRQ changes has a major bug in it somewhere
that's seriously breaking most games. I'm pushing the WIP anyway, because
I believe the changes to be mostly correct. I'd like to get more people
looking at these changes, and also try more heavy-handed hacking and
diff comparison logging between the previous WIP and this one.
byuu says:
Changelog:
- removed template usage from processor/spc700; cleaned up many function
names and the switch table
- object size: 176.8kb => 127.3kb
- source code size: 43.5kb => 37.0kb
- fixed processor/r65816 BRK/COP vector regression [hex_usr]
- corrected HuC3 unmapped RAM read value; fixes Robopon [endrift]
- cosmetic: simplified the butterworth constant calculation
[Wolfram|Alpha]
The SPC700 core changes took forever, about three hours of work.
Only the LR35902 and R6502 still need their template functions
removed. The point of this is that it doesn't cause any speed penalty
to do so, and it results in smaller binary sizes and faster compilation
times.
byuu says:
Changelog:
- nall/dsp returns with new iir/biquad.hpp and resampler/cubic.hpp files
- nall/queue.hpp added (simple ring buffer ... nall/vector wouldn't
cause too many moves with FIFO)
- audio streams now only buffer 20ms; so even if multiple audio streams
desync, latency can never exceed 20ms
- replaced blackman windwed sinc FIR hermite audio filter with transposed
direct form II biquadratic sixth-order IIR butterworth filter (better
attenuation of frequencies above 20KHz, faster, no need for decimation,
less code)
- put in experimental eight-tap echo filter (a lot better than what I
had before, but still rather weak)
- substantial cleanups to the SuperFX GSU processor core (slightly
faster, 479KB->100KB object file, 42.7KB->33.4KB source code size,
way less code duplication)
We'll definitely want to test the whole SuperFX library (not many games)
just to make sure there's no regressions caused by this one.
Not sure what I want to do with audio processing effects yet. I've always
really wanted lots of fun controls to customize audio, and now finally
with this new biquad filter, I can finally start implementing real
effects. For instance, an equalizer wouldn't be too complicated anymore.
The new reverb effect is still a poor man's version. I need to find human
readable source for implementing a comb-filter properly. I'm pretty sure
I can already treat nall::queue as an all-pass filter since all that
does is phase shift (fancy audio term for "delay audio"). What's really
going to be hard is figuring out how to expose user-friendly settings for
controlling it. It looks like you need a bunch of coprime coefficients,
and I don't think casual users are going to be able to hand-enter coprime
values to get the echo effect they want. I uh ... don't even know how
to calculate coprime values dynamically right now >_> But we're going
to have to, as they are correlated to the output sampling rate.
We'll definitely want to make some audio profiles so that users can
quickly select pre-configured themes that sound nice, but expose the
underlying coefficients so that they can tweak stuff to their liking. This
isn't just about higan, this is about me trying to learn digital signal
processing, so please don't be too upset about feature creep or anything
on this.
Anyway ... I'm having some difficulties with my audio right now. When
the reverb effect is enabled, there's a bunch of static on system
reset for just a moment. But this should not be possible. nall::queue
is initializing all previous reverb sample elements to 0.0. I don't
understand where static is coming in from. Further, we have the same
issue with both the windowed sinc and the biquad filters ... a bit of
a popping sound when starting a game. Any help tracking this down would
be appreciated.
There's also one really annoying issue ... I can't seem to do reverb
or volume adjustments with normalized samples. If I say "volume *= 0.5"
in higan/audio/audio.cpp line 68, it doesn't just halve the volume, it
adds a whole bunch of distortion. This makes absolutely zero sense to
me. The sample values are between 0.0 (mute) and 1.0 (full volume) here,
so multiplying a double by 0.5 shouldn't cause distortion. So right now,
I'm doing these adjustments with less precision after denormalizing back
to int16. Anyone ever see something like that? :/
byuu says:
Changelog:
- fixed nall/path.hpp compilation issue
- fixed ruby/audio/xaudio header declaration compilation issue (again)
- cleaned up xaudio2.hpp file to match my coding syntax (12.5% of the
file was whitespace overkill)
- added null terminator entry to nall/windows/utf8.hpp argc[] array
- nall/windows/guid.hpp uses the Windows API for generating the GUID
- this should stop all the bug reports where two nall users were
generating GUIDs at the exact same second
- fixed hiro/cocoa compilation issue with uint# types
- fixed major higan/sfc Super Game Boy audio latency issue
- fixed higan/sfc CPU core bug with pei, [dp], [dp]+y instructions
- major cleanups to higan/processor/r65816 core
- merged emulation/native-mode opcodes
- use camel-case naming on memory.hpp functions
- simplify address masking code for memory.hpp functions
- simplify a few opcodes themselves (avoid redundant copies, etc)
- rename regs.* to r.* to match modern convention of other CPU cores
- removed device.order<> concept from Emulator::Interface
- cores will now do the translation to make the job of the UI easier
- fixed plurality naming of arrays in Emulator::Interface
- example: emulator.ports[p].devices[d].inputs[i]
- example: vector<Medium> media
- probably more surprises
Major show-stoppers to the next official release:
- we need to work on GB core improvements: LY=153/0 case, multiple STAT
IRQs case, GBC audio output regs, etc.
- we need to re-add software cursors for light guns (Super Scope,
Justifier)
- after the above, we need to fix the turbo button for the Super Scope
I really have no idea how I want to implement the light guns. Ideally,
we'd want it in higan/video, so we can support the NES Zapper with the
same code. But this isn't going to be easy, because only the SNES knows
when its output is interlaced, and its resolutions can vary as
{256,512}x{224,240,448,480} which requires pixel doubling that was
hard-coded to the SNES-specific behavior, but isn't appropriate to be
exposed in higan/video.
byuu says:
Changelog:
- nall/vector rewritten from scratch
- higan/audio uses nall/vector instead of raw pointers
- higan/sfc/coprocessor/sdd1 updated with new research information
- ruby/video/glx and ruby/video/glx2: fuck salt glXSwapIntervalEXT!
The big change here is definitely nall/vector. The Windows, OS X and Qt
ports won't compile until you change some first/last strings to
left/right, but GTK will compile.
I'd be really grateful if anyone could stress-test nall/vector. Pretty
much everything I do relies on this class. If we introduce a bug, the
worst case scenario is my entire SFC game dump database gets corrupted,
or the byuu.org server gets compromised. So it's really critical that we
test the hell out of this right now.
The S-DD1 changes mean you need to update your installation of icarus
again. Also, even though the Lunar FMV never really worked on the
accuracy core anyway (it didn't initialize the PPU properly), it really
won't work now that we emulate the hard-limit of 16MiB for S-DD1 games.
byuu says:
Changelog:
- emulation cores now refresh video from host thread instead of
cothreads (fix AMD crash)
- SFC: fixed another bug with leap year months in SharpRTC emulation
- SFC: cleaned up camelCase on function names for
armdsp,epsonrtc,hitachidsp,mcc,nss,sharprtc classes
- GB: added MBC1M emulation (requires manually setting mapper=MBC1M in
manifest.bml for now, sorry)
- audio: implemented Emulator::Audio mixer and effects processor
- audio: implemented Emulator::Stream interface
- it is now possible to have more than two audio streams: eg SNES
+ SGB + MSU1 + Voicer-Kun (eventually)
- audio: added reverb delay + reverb level settings; exposed balance
configuration in UI
- video: reworked palette generation to re-enable saturation, gamma,
luminance adjustments
- higan/emulator.cpp is gone since there was nothing left in it
I know you guys are going to say the color adjust/balance/reverb stuff
is pointless. And indeed it mostly is. But I like the idea of allowing
some fun special effects and configurability that isn't system-wide.
Note: there seems to be some kind of added audio lag in the SGB
emulation now, and I don't really understand why. The code should be
effectively identical to what I had before. The only main thing is that
I'm sampling things to 48000hz instead of 32040hz before mixing. There's
no point where I'm intentionally introducing added latency though. I'm
kind of stumped, so if anyone wouldn't mind taking a look at it, it'd be
much appreciated :/
I don't have an MSU1 test ROM, but the latency issue may affect MSU1 as
well, and that would be very bad.
byuu says:
Changelog:
- fixed DAS instruction (Judgment Silversword score)
- fixed [VH]TMR_FREQ writes (Judgement Silversword audio after area 20)
- fixed initialization of SP (fixes seven games that were hanging on
startup)
- added SER_STATUS and SER_DATA stubs (fixes four games that were
hanging on startup)
- initialized IEEP data (fixes Super Robot Taisen Compact 2 series)
- note: you'll need to delete your internal.com in WonderSwan
(Color).sys folders
- fixed CMPS and SCAS termination condition (fixes serious bugs in four
games)
- set read/writeCompleted flags for EEPROM status (fixes Tetsujin 28
Gou)
- major code cleanups to SFC/R65816 and SFC/CPU
- mostly refactored disassembler to output strings instead of using
char* buffer
- unrolled all the subfolders on sfc/cpu to a single directory
- corrected casing for all of sfc/cpu and a large portion of
processor/r65816
I kind of went overboard on the code cleanup with this WIP. Hopefully
nothing broke. Any testing one can do with the SFC accuracy core would
be greatly appreciated.
There's still an absolutely huge amount of work left to go, but I do
want to eventually refresh the entire codebase to my current coding
style, which is extremely different from stuff that's been in higan
mostly untouched since ~2006 or so. It's dangerous and fickle work, but
if I don't do it, then the code will be a jumbled mess of several
different styles.
byuu says:
Changelog: (all WSC unless otherwise noted)
- fixed LINECMP=0 interrupt case (fixes FF4 world map during airship
sequence)
- improved CPU timing (fixes Magical Drop flickering and FF1 battle
music)
- added per-frame OAM caching (fixes sprite glitchiness in Magical Drop,
Riviera, etc.)
- added RTC emulation (fixes Dicing Knight and Judgement Silversword)
- added save state support
- added cheat code support (untested because I don't know of any cheat
codes that exist for this system)
- icarus: can now detect games with RTC chips
- SFC: bugfix to SharpRTC emulation (Dai Kaijuu Monogatari II)
- ( I was adding the extra leap year day to all 12 months instead of
just February ... >_< )
Note that the RTC emulation is very incomplete. It's not really
documented at all, and the two games I've tried that use it never even
ask you to set the date/time (so they're probably just using it to count
seconds.) I'm not even sure if I've implement the level-sensitive
behavior correctly (actually, now that I think about it, I need to mask
the clear bit in INT_ACK for the level-sensitive interrupts ...)
A bit worried about the RTC alarm, because it seems like it'll fire
continuously for a full minute. Or even if you turn it off after it
fires, then that doesn't seem to be lowering the line until the next
second ticks on the RTC, so that likely needs to happen when changing
the alarm flag.
Also not sure on this RTC's weekday byte. On the SharpRTC, it actually
computes this for you. Because it's not at all an easy thing to
calculate yourself in 65816 or V30MZ assembler. About 40 lines of code
to do it in C. For now, I'm requiring the program to calculate the value
itself.
Also note that there's some gibberish tiles in Judgement Silversword,
sadly. Not sure what's up there, but the game's still fully playable at
least.
Finally, no surprise: Beat-Mania doesn't run :P
byuu says:
Changelog:
- WS: fixed 8-bit sign-extended imul (fixes Star Hearts completely,
Final Fantasy world map)
- WS: fixed rcl/rcr carry shifting (fixes Crazy Climber, others)
- WS: added sound DMA emulation (Star Hearts rain sound for one example)
- WS: added OAM caching, but it's forced every line for now because
otherwise there are too many sprite glitches
- WS: use headphoneEnable bit instead of speakerEnable bit (fixes muted
audio in games)
- WS: various code cleanups (I/O mapping, audio channel naming, etc)
The hypervoice channel doesn't sound all that great just yet. But I'm
not sure how it's supposed to sound. I need a better example of some
more complex music.
What's left are some unknown register status bits (especially in the
sound area), keypad interrupts, RTC emulation, CPU prefetch emulation.
And then it's all just bugs. Lots and lots of bugs that need to be
fixed.
EDIT: oops, bad typo in the code.
ws/ppu/ppu.cpp line 20: change range(256) to range(224).
Also, delete the r.speed stuff from channel5.cpp to make the rain sound
a lot better in Star Hearts. Apparently that's outdated and not what the
bits really do.
byuu says:
Changelog:
- WS: added HblankTimer and VblankTimer IRQs; although they don't appear
to have any effect on any games that use them :/
- WS: added sound emulation; works perfectly in some games (eg Riviera);
is completely silent in most games (eg GunPey)
The sound emulation only partially supports the hypervoice (headphone
only) channel. I need to implement the SDMA before it'll actually do
anything useful. I'm a bit confused about how exactly things work. It
looks like the speaker volume shift and clamp only applies to speaker
mode and not headphone mode, which is very weird. Then there's the
software possibility of muting the headphones and/or the speaker.
Preferably, I want to leave the emulator always in headphone mode for
the extra audio channel. If there are games that force-mute the
headphones, but not speakers, then I may need to force headphones back
on but with the hypervoice channel disabled. I guess we'll see how
things go.
Rough guess is probably that the channels default to enabled after the
IPLROM, and games aren't bothering to manually enable them or something.
byuu says:
Changelog:
- WS: fixed bug when IRQs triggered during a rep string instruction
- WS: added sprite attribute caching (per-scanline); absolutely massive
speed-up
- WS: emulated limit of 32 sprites per scanline
- WS: emulated the extended PPU register bit behavior based on the
DISP_CTRL tile bit-depth setting
- WS: added "Rotate" key binding; can be used to flip the WS display
between horizontal and vertical in real-time
The prefix emulation may not be 100% hardware-accurate, but the edge
cases should be extreme enough to not come up in the WS library. No way
to get the emulation 100% down without intensive hardware testing.
trap15 pointed me at a workflow diagram for it, but that diagram is
impossible without a magic internal stack frame that grows with every
IRQ, and can thus grow infinitely large.
The rotation thing isn't exactly the most friendly set-up, but oh well.
I'll see about adding a default rotation setting to manifests, so that
games like GunPey can start in the correct orientation. After that, if
the LCD orientation icon turns out to be reliable, then I'll start using
that. But if there are cases where it's not reliable, then I'll leave it
to manual button presses.
Speaking of icons, I'll need a set of icons to render on the screen.
Going to put them to the top right on vertical orientation, and on the
bottom left for horizontal orientation. Just outside of the video
output, of course.
Overall, WS is getting pretty far along, but still some major bugs in
various games. I really need sound emulation, though. Nobody's going to
use this at all without that.
byuu says:
Changelog:
- emulated SuperDisc $21e1 basic interface (NEC 4-bit MCU); all hardware
tests pass now (but they don't test much)
- WS/V30MZ: fixed inc/dec reg flag calculation
- WS/V30MZ: fixed lds/les instructions
WS/C compatibility should be way up now. SuperDisc BIOS passes all tests
now (but they only test for the presence of the interface, nothing
more.)
byuu says:
Changelog:
- WS: fixed lods, scas instructions
- WS: implemented missing GRP4 instructions
- WS: fixed transparency for screen one
- WSC: added color-mode PPU rendering
- WS+WSC: added packed pixel mode support
- WS+WSC: added dummy sound register reads/writes
- SFC: added threading to SuperDisc (it's hanging for right now; need to
clear IRQ on $21e2 writes)
SuperDisc Timer and Sound Check were failing before due to not turning
off IRQs on $21e4 clear, so I'm happy that's fixed now.
Riviera starts now, and displays the first intro screen before crashing.
Huge, huge amounts of corrupted graphics, though. This game's really
making me work for it :(
No color games seem fully playable yet, but a lot of monochrome and
color games are now at least showing more intro screen graphics before
dying.
This build defaults to horizontal orientation, but I left the inputs
bound to vertical orientation. Whoops. I still need to implement
a screen flip key binding.
byuu says:
Changelog:
- icarus: WS/C detects RAM type/size heuristically now
- icarus: WS/C uses ram type=$type instead of $type
- WS: use back color instead of white for backdrop
- WS: fixed sprite count limit; removes all the garbled sprites from
GunPey
- WS: hopefully fixed sprite priority with screen 2
- WS: implemented keypad polling; GunPey is now fully playable
- SNES: added Super Disc expansion port device (doesn't do anything,
just for testing)
Note: WS is hard-coded to vertical orientation right now. But there's
basic code in there for all the horizontal stuff.
byuu says:
Changelog:
- WS: fixed a major CPU bug where I was using the wrong bits for
ModR/M's memory mode
- WS: added grayscale PPU emulation (exceptionally buggy)
GunPey now runs, as long as you add:
eeprom name=save.ram size=0x800
to the manifest after importing with icarus.
Right now, you can't control the game due to missing keypad polling.
There's also a lot of glitchiness with the sprites. Seems like they're
not getting properly cleared sometimes or something.
Also, the PPU emulation is totally unrealistic bullshit. I decode and
evaluate every single tile and sprite on every single pixel of output.
No way in hell the hardware could ever come close to that. The speed's
around 500fps without the insane sprite evaluations, and around 90fps
with it. Obviously, I'll fix this in time.
Nothing else seems to run that I've tried. Not even far enough to
display any output whatsoever. Tried Langrisser Millenium, Rockman
& Forte and Riviera. I really need to update icarus to try and encode
eeprom/sram sizes, because that's going to break a lot of stuff if it's
missing.
byuu says:
Changelog:
- fixed nall/windows/guard.hpp
- fixed hiro/(windows,gtk)/header.hpp
- fixed Famicom PPU OAM reads (mask the correct bits when writing)
[hex_usr]
- removed the need for (system := system) lines from higan/GNUmakefile
- added "All" option to filetype dropdown for ROM loading
- allows loading GBC games in SGB mode (and technically non-GB(C)
games, which will obviously fail to do anything)
- loki can load and play game folders now (command-line only) (extremely
unimpressive; don't waste your time :P)
- the input is extremely hacked in as a quick placeholder; not sure
how I'm going to do mapping yet for it
byuu says:
Changelog:
- ruby: if DirectSoundCreate fails (no sound device present), return
false from init instead of crashing
- nall: improved edge case return values for
(basename,pathname,dirname,...)
- nall: renamed file_system_object class to inode
- nall: varuint_t replaced with VariadicNatural; which contains
.bit,.bits,.byte ala Natural/Integer
- nall: fixed boolean compilation error on Windows
- WS: popa should not restore SP
- GBA: rewrote the CPU/APU cores to use the .bit,.bits functions;
removed registers.cpp from each
Note that the GBA changes are extremely major. This is about five hours
worth of extremely delicate work. Any slight errors could break
emulation in extremely bad ways. Let's hold off on extensive testing
until the next WIP, after I do the same to the PPU.
So far ... endrift's SOUNDCNT_X I/O test is failing, although that code
didn't change, so clearly I messed up SOUNDCNT_H somehow ...
To compile on Windows:
1. change nall/string/platform.hpp line 47 to
return slice(result, 0, 3);
2. change ruby/video.wgl.cpp line 72 to
auto lock(uint32_t*& data, uint& pitch, uint width, uint height) -> bool {
3. add this line to the very top of hiro/windows/header.cpp:
#define boolean FuckYouMicrosoft
byuu says:
Changelog:
- higan now uses Natural<Size>/Integer<Size> for its internal types
- Super Famicom emulation now uses uint24 instead of uint for bus
addresses (it's a 24-bit bus)
- cleaned up gb/apu MMIO writes
- cleaned up sfc/coprocessor/msu1 MMIO writes
- ~3% speed penalty
I've wanted to do that 24-bit bus thing for so long, but have always
been afraid of the speed impact. It's probably going to hurt
balanced/performance once they compile again, but it wasn't significant
enough to harm the accuracy core's frame rate, thankfully. Only lost one
frame per second.
The GBA core handlers are clearly going to take a lot more work. The
bit-ranges will make it substantially easier to handle, though. Lots of
32-bit registers where certain values span multiple bytes, but we have
to be able to read/write at byte-granularity.
byuu says:
Got it. Wow, that didn't hurt nearly as much as I thought it was going
to.
Dropped from 127.5fps to 123.5fps to use Natural/Integer for
(u)int(8,16,32,64).
That's totally worth the cost.
byuu says:
This is a few days old, but oh well.
This WIP changes nall,hiro,ruby,icarus back to (u)int(8,16,32,64)_t.
I'm slowly pushing for (u)int(8,16,32,64) to use my custom
Integer<Size>/Natural<Size> classes instead. But it's going to be one
hell of a struggle to get that into higan.
byuu says:
Alright, well interrupts are in. At least Vblank is.
I also fixed a bug in vector() indexing, MoDRM mod!=3&®==6 using SS
instead of DS, opcodes a0-a3 allowing segment override, and added the
"irq_disable" stuff to the relevant opcodes to suppress IRQs after
certain instructions.
But unfortunately ... still no go on Riviera. It's not reading any
unmapped ports, and although it enables Vblank IRQs and they set port
$b4's status bit, the game never sets the IE flag, so no interrupts ever
actually fire. The game does indeed appear to be sitting in a rather
huge loop, which is probably dependent upon some RAM variable being set
from the Vblank IRQ, but I don't know how I'm supposed to be triggering
it.
... I'm really quite stumped here >_>
byuu says:
All 256 instructions implemented fully. Fixed a major bug with
instructions that both read and write to ModRM with displacement.
Riviera now runs into an infinite loop ... possibly crashed, possibly
waiting on interrupts or in to return something. Added a bunch of PPU
settings registers, but nothing's actually rendering with them yet.