2016-06-05 05:03:21 +00:00
|
|
|
auto SA1::readIO(uint24 addr, uint8) -> uint8 {
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
cpu.active() ? cpu.synchronize(sa1) : synchronize(cpu);
|
2016-06-05 05:03:21 +00:00
|
|
|
|
|
|
|
switch(0x2300 | addr.bits(0,7)) {
|
|
|
|
|
|
|
|
//(SFR) S-CPU flag read
|
|
|
|
case 0x2300: {
|
|
|
|
uint8 data;
|
|
|
|
data = mmio.cpu_irqfl << 7;
|
|
|
|
data |= mmio.cpu_ivsw << 6;
|
|
|
|
data |= mmio.chdma_irqfl << 5;
|
|
|
|
data |= mmio.cpu_nvsw << 4;
|
|
|
|
data |= mmio.cmeg;
|
|
|
|
return data;
|
2010-08-09 13:28:56 +00:00
|
|
|
}
|
|
|
|
|
2016-06-05 05:03:21 +00:00
|
|
|
//(CFR) SA-1 flag read
|
|
|
|
case 0x2301: {
|
|
|
|
uint8 data;
|
|
|
|
data = mmio.sa1_irqfl << 7;
|
|
|
|
data |= mmio.timer_irqfl << 6;
|
|
|
|
data |= mmio.dma_irqfl << 5;
|
|
|
|
data |= mmio.sa1_nmifl << 4;
|
|
|
|
data |= mmio.smeg;
|
|
|
|
return data;
|
|
|
|
}
|
2010-08-09 13:28:56 +00:00
|
|
|
|
2016-06-05 05:03:21 +00:00
|
|
|
//(HCR) hcounter read
|
|
|
|
case 0x2302: {
|
|
|
|
//latch counters
|
|
|
|
mmio.hcr = status.hcounter >> 2;
|
|
|
|
mmio.vcr = status.vcounter;
|
|
|
|
return mmio.hcr >> 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
case 0x2303: {
|
|
|
|
return mmio.hcr >> 8;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(VCR) vcounter read
|
|
|
|
case 0x2304: return mmio.vcr >> 0;
|
|
|
|
case 0x2305: return mmio.vcr >> 8;
|
|
|
|
|
|
|
|
//(MR) arithmetic result
|
|
|
|
case 0x2306: return mmio.mr >> 0;
|
|
|
|
case 0x2307: return mmio.mr >> 8;
|
|
|
|
case 0x2308: return mmio.mr >> 16;
|
|
|
|
case 0x2309: return mmio.mr >> 24;
|
|
|
|
case 0x230a: return mmio.mr >> 32;
|
|
|
|
|
|
|
|
//(OF) arithmetic overflow flag
|
|
|
|
case 0x230b: return mmio.overflow << 7;
|
|
|
|
|
|
|
|
//(VDPL) variable-length data read port low
|
|
|
|
case 0x230c: {
|
|
|
|
uint24 data;
|
2016-06-17 13:03:54 +00:00
|
|
|
data.byte(0) = vbrRead(mmio.va + 0);
|
|
|
|
data.byte(1) = vbrRead(mmio.va + 1);
|
|
|
|
data.byte(2) = vbrRead(mmio.va + 2);
|
2016-06-05 05:03:21 +00:00
|
|
|
data >>= mmio.vbit;
|
|
|
|
|
|
|
|
return data >> 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(VDPH) variable-length data read port high
|
|
|
|
case 0x230d: {
|
|
|
|
uint24 data;
|
2016-06-17 13:03:54 +00:00
|
|
|
data.byte(0) = vbrRead(mmio.va + 0);
|
|
|
|
data.byte(1) = vbrRead(mmio.va + 1);
|
|
|
|
data.byte(2) = vbrRead(mmio.va + 2);
|
2016-06-05 05:03:21 +00:00
|
|
|
data >>= mmio.vbit;
|
|
|
|
|
|
|
|
if(mmio.hl == 1) {
|
|
|
|
//auto-increment mode
|
|
|
|
mmio.vbit += mmio.vb;
|
|
|
|
mmio.va += (mmio.vbit >> 3);
|
|
|
|
mmio.vbit &= 7;
|
|
|
|
}
|
|
|
|
|
|
|
|
return data >> 8;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(VC) version code register
|
|
|
|
case 0x230e: {
|
|
|
|
return 0x01; //true value unknown
|
|
|
|
}
|
2010-08-09 13:28:56 +00:00
|
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0x00;
|
|
|
|
}
|
|
|
|
|
2016-06-05 05:03:21 +00:00
|
|
|
auto SA1::writeIO(uint24 addr, uint8 data) -> void {
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
cpu.active() ? cpu.synchronize(sa1) : synchronize(cpu);
|
2010-08-09 13:28:56 +00:00
|
|
|
|
2016-06-05 05:03:21 +00:00
|
|
|
switch(0x2200 | addr.bits(0,7)) {
|
2016-06-17 13:03:54 +00:00
|
|
|
|
|
|
|
//(CCNT) SA-1 control
|
|
|
|
case 0x2200: {
|
|
|
|
if(mmio.sa1_resb && !(data & 0x80)) {
|
Update to v102r25 release.
byuu says:
Changelog:
- processor/arm: corrected MUL instruction timings [Jonas Quinn]
- processor/wdc65816: finished phase two of the rewrite
I'm really pleased with the visual results of the wdc65816 core rewrite.
I was able to eliminate all of the weird `{Boolean,Natural}BitRange`
templates, as well as the need to use unions/structs. Registers are now
just simple `uint24` or `uint16` types (technically they're `Natural<T>`
types, but then all of higan uses those), flags are now just bool types.
I also eliminated all of the implicit object state inside of the core
(aa, rd, dp, sp) and instead do all computations on the stack frame with
local variables. Through using macros to reference the registers and
individual parts of them, I was able to reduce the visual tensity of all
of the instructions. And by using normal types without implicit states,
I was able to eliminate about 15% of the instructions necessary, instead
reusing existing ones.
The final third phase of the rewrite will be to recode the disassembler.
That code is probably the oldest code in all of higan right now, still
using sprintf to generate the output. So it is very long overdue for a
cleanup.
And now for the bad news ... as with any large code cleanup, regression
errors have seeped in. Currently, no games are running at all. I've left
the old disassembler in for this reason: we can compare trace logs of
v102r23 against trace logs of v102r25. The second there's any
difference, we've spotted a buggy instruction and can correct it.
With any luck, this will be the last time I ever rewrite the wdc65816
core. My style has changed wildly over the ~10 years since I wrote this
core, but it's really solidifed in recent years.
2017-06-14 15:55:55 +00:00
|
|
|
//reset SA-1 CPU (PC bank set to 0x00)
|
|
|
|
r.pc = mmio.crv;
|
2016-06-17 13:03:54 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
mmio.sa1_irq = (data & 0x80);
|
|
|
|
mmio.sa1_rdyb = (data & 0x40);
|
|
|
|
mmio.sa1_resb = (data & 0x20);
|
|
|
|
mmio.sa1_nmi = (data & 0x10);
|
|
|
|
mmio.smeg = (data & 0x0f);
|
|
|
|
|
|
|
|
if(mmio.sa1_irq) {
|
|
|
|
mmio.sa1_irqfl = true;
|
|
|
|
if(mmio.sa1_irqen) mmio.sa1_irqcl = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
if(mmio.sa1_nmi) {
|
|
|
|
mmio.sa1_nmifl = true;
|
|
|
|
if(mmio.sa1_nmien) mmio.sa1_nmicl = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(SIE) S-CPU interrupt enable
|
|
|
|
case 0x2201: {
|
|
|
|
if(!mmio.cpu_irqen && (data & 0x80)) {
|
|
|
|
if(mmio.cpu_irqfl) {
|
|
|
|
mmio.cpu_irqcl = 0;
|
|
|
|
cpu.r.irq = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if(!mmio.chdma_irqen && (data & 0x20)) {
|
|
|
|
if(mmio.chdma_irqfl) {
|
|
|
|
mmio.chdma_irqcl = 0;
|
|
|
|
cpu.r.irq = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
mmio.cpu_irqen = (data & 0x80);
|
|
|
|
mmio.chdma_irqen = (data & 0x20);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(SIC) S-CPU interrupt clear
|
|
|
|
case 0x2202: {
|
|
|
|
mmio.cpu_irqcl = (data & 0x80);
|
|
|
|
mmio.chdma_irqcl = (data & 0x20);
|
|
|
|
|
|
|
|
if(mmio.cpu_irqcl ) mmio.cpu_irqfl = false;
|
|
|
|
if(mmio.chdma_irqcl) mmio.chdma_irqfl = false;
|
|
|
|
|
|
|
|
if(!mmio.cpu_irqfl && !mmio.chdma_irqfl) cpu.r.irq = 0;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(CRV) SA-1 reset vector
|
|
|
|
case 0x2203: { mmio.crv = (mmio.crv & 0xff00) | data; return; }
|
|
|
|
case 0x2204: { mmio.crv = (data << 8) | (mmio.crv & 0xff); return; }
|
|
|
|
|
|
|
|
//(CNV) SA-1 NMI vector
|
|
|
|
case 0x2205: { mmio.cnv = (mmio.cnv & 0xff00) | data; return; }
|
|
|
|
case 0x2206: { mmio.cnv = (data << 8) | (mmio.cnv & 0xff); return; }
|
|
|
|
|
|
|
|
//(CIV) SA-1 IRQ vector
|
|
|
|
case 0x2207: { mmio.civ = (mmio.civ & 0xff00) | data; return; }
|
|
|
|
case 0x2208: { mmio.civ = (data << 8) | (mmio.civ & 0xff); return; }
|
|
|
|
|
|
|
|
//(SCNT) S-CPU control
|
|
|
|
case 0x2209: {
|
|
|
|
mmio.cpu_irq = (data & 0x80);
|
|
|
|
mmio.cpu_ivsw = (data & 0x40);
|
|
|
|
mmio.cpu_nvsw = (data & 0x10);
|
|
|
|
mmio.cmeg = (data & 0x0f);
|
|
|
|
|
|
|
|
if(mmio.cpu_irq) {
|
|
|
|
mmio.cpu_irqfl = true;
|
|
|
|
if(mmio.cpu_irqen) {
|
|
|
|
mmio.cpu_irqcl = 0;
|
|
|
|
cpu.r.irq = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(CIE) SA-1 interrupt enable
|
|
|
|
case 0x220a: {
|
|
|
|
if(!mmio.sa1_irqen && (data & 0x80) && mmio.sa1_irqfl ) mmio.sa1_irqcl = 0;
|
|
|
|
if(!mmio.timer_irqen && (data & 0x40) && mmio.timer_irqfl) mmio.timer_irqcl = 0;
|
|
|
|
if(!mmio.dma_irqen && (data & 0x20) && mmio.dma_irqfl ) mmio.dma_irqcl = 0;
|
|
|
|
if(!mmio.sa1_nmien && (data & 0x10) && mmio.sa1_nmifl ) mmio.sa1_nmicl = 0;
|
|
|
|
|
|
|
|
mmio.sa1_irqen = (data & 0x80);
|
|
|
|
mmio.timer_irqen = (data & 0x40);
|
|
|
|
mmio.dma_irqen = (data & 0x20);
|
|
|
|
mmio.sa1_nmien = (data & 0x10);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(CIC) SA-1 interrupt clear
|
|
|
|
case 0x220b: {
|
|
|
|
mmio.sa1_irqcl = (data & 0x80);
|
|
|
|
mmio.timer_irqcl = (data & 0x40);
|
|
|
|
mmio.dma_irqcl = (data & 0x20);
|
|
|
|
mmio.sa1_nmicl = (data & 0x10);
|
|
|
|
|
|
|
|
if(mmio.sa1_irqcl) mmio.sa1_irqfl = false;
|
|
|
|
if(mmio.timer_irqcl) mmio.timer_irqfl = false;
|
|
|
|
if(mmio.dma_irqcl) mmio.dma_irqfl = false;
|
|
|
|
if(mmio.sa1_nmicl) mmio.sa1_nmifl = false;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(SNV) S-CPU NMI vector
|
|
|
|
case 0x220c: { mmio.snv = (mmio.snv & 0xff00) | data; return; }
|
|
|
|
case 0x220d: { mmio.snv = (data << 8) | (mmio.snv & 0xff); return; }
|
|
|
|
|
|
|
|
//(SIV) S-CPU IRQ vector
|
|
|
|
case 0x220e: { mmio.siv = (mmio.siv & 0xff00) | data; return; }
|
|
|
|
case 0x220f: { mmio.siv = (data << 8) | (mmio.siv & 0xff); return; }
|
|
|
|
|
|
|
|
//(TMC) H/V timer control
|
|
|
|
case 0x2210: {
|
|
|
|
mmio.hvselb = (data & 0x80);
|
|
|
|
mmio.ven = (data & 0x02);
|
|
|
|
mmio.hen = (data & 0x01);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(CTR) SA-1 timer restart
|
|
|
|
case 0x2211: {
|
|
|
|
status.vcounter = 0;
|
|
|
|
status.hcounter = 0;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(HCNT) H-count
|
|
|
|
case 0x2212: { mmio.hcnt = (mmio.hcnt & 0xff00) | (data << 0); return; }
|
|
|
|
case 0x2213: { mmio.hcnt = (mmio.hcnt & 0x00ff) | (data << 8); return; }
|
|
|
|
|
|
|
|
//(VCNT) V-count
|
|
|
|
case 0x2214: { mmio.vcnt = (mmio.vcnt & 0xff00) | (data << 0); return; }
|
|
|
|
case 0x2215: { mmio.vcnt = (mmio.vcnt & 0x00ff) | (data << 8); return; }
|
|
|
|
|
|
|
|
//(CXB) Super MMC bank C
|
|
|
|
case 0x2220: {
|
|
|
|
mmio.cbmode = (data & 0x80);
|
|
|
|
mmio.cb = (data & 0x07);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(DXB) Super MMC bank D
|
|
|
|
case 0x2221: {
|
|
|
|
mmio.dbmode = (data & 0x80);
|
|
|
|
mmio.db = (data & 0x07);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(EXB) Super MMC bank E
|
|
|
|
case 0x2222: {
|
|
|
|
mmio.ebmode = (data & 0x80);
|
|
|
|
mmio.eb = (data & 0x07);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(FXB) Super MMC bank F
|
|
|
|
case 0x2223: {
|
|
|
|
mmio.fbmode = (data & 0x80);
|
|
|
|
mmio.fb = (data & 0x07);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(BMAPS) S-CPU BW-RAM address mapping
|
|
|
|
case 0x2224: {
|
|
|
|
mmio.sbm = (data & 0x1f);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(BMAP) SA-1 BW-RAM address mapping
|
|
|
|
case 0x2225: {
|
|
|
|
mmio.sw46 = (data & 0x80);
|
|
|
|
mmio.cbm = (data & 0x7f);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(SWBE) S-CPU BW-RAM write enable
|
|
|
|
case 0x2226: {
|
|
|
|
mmio.swen = (data & 0x80);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(CWBE) SA-1 BW-RAM write enable
|
|
|
|
case 0x2227: {
|
|
|
|
mmio.cwen = (data & 0x80);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(BWPA) BW-RAM write-protected area
|
|
|
|
case 0x2228: {
|
|
|
|
mmio.bwp = (data & 0x0f);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(SIWP) S-CPU I-RAM write protection
|
|
|
|
case 0x2229: {
|
|
|
|
mmio.siwp = data;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(CIWP) SA-1 I-RAM write protection
|
|
|
|
case 0x222a: {
|
|
|
|
mmio.ciwp = data;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(DCNT) DMA control
|
|
|
|
case 0x2230: {
|
|
|
|
mmio.dmaen = (data & 0x80);
|
|
|
|
mmio.dprio = (data & 0x40);
|
|
|
|
mmio.cden = (data & 0x20);
|
|
|
|
mmio.cdsel = (data & 0x10);
|
|
|
|
mmio.dd = (data & 0x04);
|
|
|
|
mmio.sd = (data & 0x03);
|
|
|
|
|
|
|
|
if(mmio.dmaen == 0) dma.line = 0;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(CDMA) character conversion DMA parameters
|
|
|
|
case 0x2231: {
|
|
|
|
mmio.chdend = (data & 0x80);
|
|
|
|
mmio.dmasize = (data >> 2) & 7;
|
|
|
|
mmio.dmacb = (data & 0x03);
|
|
|
|
|
|
|
|
if(mmio.chdend) cpubwram.dma = false;
|
|
|
|
if(mmio.dmasize > 5) mmio.dmasize = 5;
|
|
|
|
if(mmio.dmacb > 2) mmio.dmacb = 2;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(SDA) DMA source device start address
|
|
|
|
case 0x2232: { mmio.dsa = (mmio.dsa & 0xffff00) | (data << 0); return; }
|
|
|
|
case 0x2233: { mmio.dsa = (mmio.dsa & 0xff00ff) | (data << 8); return; }
|
|
|
|
case 0x2234: { mmio.dsa = (mmio.dsa & 0x00ffff) | (data << 16); return; }
|
|
|
|
|
|
|
|
//(DDA) DMA destination start address
|
|
|
|
case 0x2235: { mmio.dda = (mmio.dda & 0xffff00) | (data << 0); return; }
|
|
|
|
case 0x2236: { mmio.dda = (mmio.dda & 0xff00ff) | (data << 8);
|
|
|
|
if(mmio.dmaen) {
|
|
|
|
if(mmio.cden == 0 && mmio.dd == DMA::DestIRAM) {
|
|
|
|
dmaNormal();
|
|
|
|
} else if(mmio.cden == 1 && mmio.cdsel == 1) {
|
|
|
|
dmaCC1();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
case 0x2237: { mmio.dda = (mmio.dda & 0x00ffff) | (data << 16);
|
|
|
|
if(mmio.dmaen) {
|
|
|
|
if(mmio.cden == 0 && mmio.dd == DMA::DestBWRAM) {
|
|
|
|
dmaNormal();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(DTC) DMA terminal counter
|
|
|
|
case 0x2238: { mmio.dtc = (mmio.dtc & 0xff00) | (data << 0); return; }
|
|
|
|
case 0x2239: { mmio.dtc = (mmio.dtc & 0x00ff) | (data << 8); return; }
|
|
|
|
|
|
|
|
//(BBF) BW-RAM bitmap format
|
|
|
|
case 0x223f: { mmio.bbf = (data & 0x80); return; }
|
|
|
|
|
|
|
|
//(BRF) bitmap register files
|
|
|
|
case 0x2240: { mmio.brf[ 0] = data; return; }
|
|
|
|
case 0x2241: { mmio.brf[ 1] = data; return; }
|
|
|
|
case 0x2242: { mmio.brf[ 2] = data; return; }
|
|
|
|
case 0x2243: { mmio.brf[ 3] = data; return; }
|
|
|
|
case 0x2244: { mmio.brf[ 4] = data; return; }
|
|
|
|
case 0x2245: { mmio.brf[ 5] = data; return; }
|
|
|
|
case 0x2246: { mmio.brf[ 6] = data; return; }
|
|
|
|
case 0x2247: { mmio.brf[ 7] = data;
|
|
|
|
if(mmio.dmaen) {
|
|
|
|
if(mmio.cden == 1 && mmio.cdsel == 0) {
|
|
|
|
dmaCC2();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
case 0x2248: { mmio.brf[ 8] = data; return; }
|
|
|
|
case 0x2249: { mmio.brf[ 9] = data; return; }
|
|
|
|
case 0x224a: { mmio.brf[10] = data; return; }
|
|
|
|
case 0x224b: { mmio.brf[11] = data; return; }
|
|
|
|
case 0x224c: { mmio.brf[12] = data; return; }
|
|
|
|
case 0x224d: { mmio.brf[13] = data; return; }
|
|
|
|
case 0x224e: { mmio.brf[14] = data; return; }
|
|
|
|
case 0x224f: { mmio.brf[15] = data;
|
|
|
|
if(mmio.dmaen) {
|
|
|
|
if(mmio.cden == 1 && mmio.cdsel == 0) {
|
|
|
|
dmaCC2();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(MCNT) arithmetic control
|
|
|
|
case 0x2250: {
|
|
|
|
mmio.acm = (data & 0x02);
|
|
|
|
mmio.md = (data & 0x01);
|
|
|
|
|
|
|
|
if(mmio.acm) mmio.mr = 0;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(MAL) multiplicand / dividend low
|
|
|
|
case 0x2251: {
|
|
|
|
mmio.ma = (mmio.ma & 0xff00) | data;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(MAH) multiplicand / dividend high
|
|
|
|
case 0x2252: {
|
|
|
|
mmio.ma = (data << 8) | (mmio.ma & 0x00ff);
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(MBL) multiplier / divisor low
|
|
|
|
case 0x2253: {
|
|
|
|
mmio.mb = (mmio.mb & 0xff00) | data;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(MBH) multiplier / divisor high
|
|
|
|
//multiplication / cumulative sum only resets MB
|
|
|
|
//division resets both MA and MB
|
|
|
|
case 0x2254: {
|
|
|
|
mmio.mb = (data << 8) | (mmio.mb & 0x00ff);
|
|
|
|
|
|
|
|
if(mmio.acm == 0) {
|
|
|
|
if(mmio.md == 0) {
|
|
|
|
//signed multiplication
|
|
|
|
mmio.mr = (int16)mmio.ma * (int16)mmio.mb;
|
|
|
|
mmio.mb = 0;
|
|
|
|
} else {
|
|
|
|
//unsigned division
|
|
|
|
if(mmio.mb == 0) {
|
|
|
|
mmio.mr = 0;
|
|
|
|
} else {
|
|
|
|
int16 quotient = (int16)mmio.ma / (uint16)mmio.mb;
|
|
|
|
uint16 remainder = (int16)mmio.ma % (uint16)mmio.mb;
|
|
|
|
mmio.mr = (remainder << 16) | quotient;
|
|
|
|
}
|
|
|
|
mmio.ma = 0;
|
|
|
|
mmio.mb = 0;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
//sigma (accumulative multiplication)
|
|
|
|
mmio.mr += (int16)mmio.ma * (int16)mmio.mb;
|
|
|
|
mmio.overflow = (mmio.mr >= (1ULL << 40));
|
|
|
|
mmio.mr &= (1ULL << 40) - 1;
|
|
|
|
mmio.mb = 0;
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(VBD) variable-length bit processing
|
|
|
|
case 0x2258: {
|
|
|
|
mmio.hl = (data & 0x80);
|
|
|
|
mmio.vb = (data & 0x0f);
|
|
|
|
if(mmio.vb == 0) mmio.vb = 16;
|
|
|
|
|
|
|
|
if(mmio.hl == 0) {
|
|
|
|
//fixed mode
|
|
|
|
mmio.vbit += mmio.vb;
|
|
|
|
mmio.va += (mmio.vbit >> 3);
|
|
|
|
mmio.vbit &= 7;
|
|
|
|
}
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
//(VDA) variable-length bit game pak ROM start address
|
|
|
|
case 0x2259: { mmio.va = (mmio.va & 0xffff00) | (data << 0); return; }
|
|
|
|
case 0x225a: { mmio.va = (mmio.va & 0xff00ff) | (data << 8); return; }
|
|
|
|
case 0x225b: { mmio.va = (mmio.va & 0x00ffff) | (data << 16); mmio.vbit = 0; return; }
|
|
|
|
|
2010-08-09 13:28:56 +00:00
|
|
|
}
|
|
|
|
}
|