2016-07-12 10:19:31 +00:00
|
|
|
auto M68K::testCondition(uint4 condition) -> bool {
|
|
|
|
switch(condition) {
|
2016-07-22 12:03:25 +00:00
|
|
|
case 0: return true; //T
|
|
|
|
case 1: return false; //F
|
2016-07-12 10:19:31 +00:00
|
|
|
case 2: return !r.c && !r.z; //HI
|
|
|
|
case 3: return r.c || r.z; //LS
|
|
|
|
case 4: return !r.c; //CC,HS
|
|
|
|
case 5: return r.c; //CS,LO
|
|
|
|
case 6: return !r.z; //NE
|
|
|
|
case 7: return r.z; //EQ
|
|
|
|
case 8: return !r.v; //VC
|
|
|
|
case 9: return r.v; //VS
|
|
|
|
case 10: return !r.n; //PL
|
|
|
|
case 11: return r.n; //MI
|
|
|
|
case 12: return r.n == r.v; //GE
|
|
|
|
case 13: return r.n != r.v; //LT
|
|
|
|
case 14: return r.n == r.v && !r.z; //GT
|
|
|
|
case 15: return r.n != r.v || r.z; //LE
|
|
|
|
}
|
|
|
|
unreachable;
|
|
|
|
}
|
|
|
|
|
|
|
|
//
|
|
|
|
|
2016-07-25 13:15:54 +00:00
|
|
|
template<> auto M68K::bytes<Byte>() -> uint { return 1; }
|
|
|
|
template<> auto M68K::bytes<Word>() -> uint { return 2; }
|
|
|
|
template<> auto M68K::bytes<Long>() -> uint { return 4; }
|
|
|
|
|
2016-07-22 12:03:25 +00:00
|
|
|
template<> auto M68K::bits<Byte>() -> uint { return 8; }
|
|
|
|
template<> auto M68K::bits<Word>() -> uint { return 16; }
|
|
|
|
template<> auto M68K::bits<Long>() -> uint { return 32; }
|
|
|
|
|
2016-07-25 13:15:54 +00:00
|
|
|
template<uint Size> auto M68K::lsb() -> uint32 { return 1; }
|
|
|
|
|
|
|
|
template<> auto M68K::msb<Byte>() -> uint32 { return 0x80; }
|
|
|
|
template<> auto M68K::msb<Word>() -> uint32 { return 0x8000; }
|
|
|
|
template<> auto M68K::msb<Long>() -> uint32 { return 0x80000000; }
|
|
|
|
|
2016-07-22 12:03:25 +00:00
|
|
|
template<> auto M68K::mask<Byte>() -> uint32 { return 0xff; }
|
|
|
|
template<> auto M68K::mask<Word>() -> uint32 { return 0xffff; }
|
|
|
|
template<> auto M68K::mask<Long>() -> uint32 { return 0xffffffff; }
|
|
|
|
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
template<> auto M68K::clip<Byte>(uint32 data) -> uint32 { return data & 0xff; }
|
|
|
|
template<> auto M68K::clip<Word>(uint32 data) -> uint32 { return data & 0xffff; }
|
|
|
|
template<> auto M68K::clip<Long>(uint32 data) -> uint32 { return data & 0xffffffff; }
|
|
|
|
|
|
|
|
template<> auto M68K::sign<Byte>(uint32 data) -> int32 { return (int8)data; }
|
|
|
|
template<> auto M68K::sign<Word>(uint32 data) -> int32 { return (int16)data; }
|
|
|
|
template<> auto M68K::sign<Long>(uint32 data) -> int32 { return (int32)data; }
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
auto M68K::instructionABCD(EffectiveAddress with, EffectiveAddress from) -> void {
|
|
|
|
auto source = read<Byte>(from);
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Byte, Hold>(with);
|
2016-08-10 22:02:02 +00:00
|
|
|
auto result = source + target + r.x;
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
bool c = false;
|
2016-08-10 22:02:02 +00:00
|
|
|
bool v = false;
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
if(((target ^ source ^ result) & 0x10) || (result & 0x0f) >= 0x0a) {
|
|
|
|
auto previous = result;
|
|
|
|
result += 0x06;
|
|
|
|
v |= ((~previous & 0x80) & (result & 0x80));
|
|
|
|
}
|
|
|
|
|
|
|
|
if(result >= 0xa0) {
|
|
|
|
auto previous = result;
|
|
|
|
result += 0x60;
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
c = true;
|
2016-08-10 22:02:02 +00:00
|
|
|
v |= ((~previous & 0x80) & (result & 0x80));
|
|
|
|
}
|
|
|
|
|
|
|
|
write<Byte>(with, result);
|
|
|
|
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
r.c = c;
|
2016-08-10 22:02:02 +00:00
|
|
|
r.v = v;
|
2016-08-17 12:31:22 +00:00
|
|
|
r.z = clip<Byte>(result) ? 0 : r.z;
|
2016-08-10 22:02:02 +00:00
|
|
|
r.n = sign<Byte>(result) < 0;
|
|
|
|
r.x = r.c;
|
|
|
|
}
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
|
2017-10-05 06:13:03 +00:00
|
|
|
template<uint Size, bool extend> auto M68K::ADD(uint32 source, uint32 target) -> uint32 {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = (uint64)source + target;
|
2017-10-05 06:13:03 +00:00
|
|
|
if(extend) result += r.x;
|
2016-07-26 10:46:43 +00:00
|
|
|
|
|
|
|
r.c = sign<Size>(result >> 1) < 0;
|
|
|
|
r.v = sign<Size>(~(target ^ source) & (target ^ result)) < 0;
|
2017-10-05 06:13:03 +00:00
|
|
|
r.z = clip<Size>(result) ? 0 : (extend ? r.z : 1);
|
2016-07-26 10:46:43 +00:00
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
r.x = r.c;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
template<uint Size> auto M68K::instructionADD(EffectiveAddress from, DataRegister with) -> void {
|
|
|
|
auto source = read<Size>(from);
|
|
|
|
auto target = read<Size>(with);
|
|
|
|
auto result = ADD<Size>(source, target);
|
|
|
|
write<Size>(with, result);
|
2016-07-26 10:46:43 +00:00
|
|
|
}
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
template<uint Size> auto M68K::instructionADD(DataRegister from, EffectiveAddress with) -> void {
|
|
|
|
auto source = read<Size>(from);
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(with);
|
2016-07-26 10:46:43 +00:00
|
|
|
auto result = ADD<Size>(source, target);
|
2016-08-08 10:12:03 +00:00
|
|
|
write<Size>(with, result);
|
2016-07-26 10:46:43 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionADDA(AddressRegister ar, EffectiveAddress ea) -> void {
|
2016-08-17 22:04:50 +00:00
|
|
|
auto source = sign<Size>(read<Size>(ea));
|
2016-08-21 22:11:24 +00:00
|
|
|
auto target = read<Long>(ar);
|
2016-07-26 10:46:43 +00:00
|
|
|
write<Long>(ar, source + target);
|
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
template<uint Size> auto M68K::instructionADDI(EffectiveAddress modify) -> void {
|
|
|
|
auto source = readPC<Size>();
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(modify);
|
2016-08-08 10:12:03 +00:00
|
|
|
auto result = ADD<Size>(source, target);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
2016-08-17 22:04:50 +00:00
|
|
|
template<uint Size> auto M68K::instructionADDQ(uint4 immediate, EffectiveAddress with) -> void {
|
|
|
|
auto source = immediate;
|
|
|
|
auto target = read<Size, Hold>(with);
|
2016-07-26 10:46:43 +00:00
|
|
|
auto result = ADD<Size>(source, target);
|
2016-08-17 22:04:50 +00:00
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
2016-08-19 14:11:26 +00:00
|
|
|
//Size is ignored: always uses Long
|
2016-08-17 22:04:50 +00:00
|
|
|
template<uint Size> auto M68K::instructionADDQ(uint4 immediate, AddressRegister with) -> void {
|
2016-08-19 14:11:26 +00:00
|
|
|
auto result = read<Long>(with) + immediate;
|
2016-08-17 22:04:50 +00:00
|
|
|
write<Long>(with, result);
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
}
|
|
|
|
|
2016-08-17 12:31:22 +00:00
|
|
|
template<uint Size> auto M68K::instructionADDX(EffectiveAddress with, EffectiveAddress from) -> void {
|
|
|
|
auto source = read<Size>(from);
|
|
|
|
auto target = read<Size, Hold>(with);
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
auto result = ADD<Size, Extend>(source, target);
|
2016-08-17 12:31:22 +00:00
|
|
|
write<Size>(with, result);
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
template<uint Size> auto M68K::AND(uint32 source, uint32 target) -> uint32 {
|
|
|
|
uint32 result = target & source;
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionAND(EffectiveAddress from, DataRegister with) -> void {
|
|
|
|
auto source = read<Size>(from);
|
|
|
|
auto target = read<Size>(with);
|
|
|
|
auto result = AND<Size>(source, target);
|
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionAND(DataRegister from, EffectiveAddress with) -> void {
|
|
|
|
auto source = read<Size>(from);
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(with);
|
2016-08-08 10:12:03 +00:00
|
|
|
auto result = AND<Size>(source, target);
|
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
2016-08-17 12:31:22 +00:00
|
|
|
template<uint Size> auto M68K::instructionANDI(EffectiveAddress with) -> void {
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
auto source = readPC<Size>();
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(with);
|
2016-08-08 10:12:03 +00:00
|
|
|
auto result = AND<Size>(source, target);
|
2016-08-17 12:31:22 +00:00
|
|
|
write<Size>(with, result);
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
auto M68K::instructionANDI_TO_CCR() -> void {
|
|
|
|
auto data = readPC<Word>();
|
|
|
|
writeCCR(readCCR() & data);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionANDI_TO_SR() -> void {
|
|
|
|
if(!supervisor()) return;
|
|
|
|
|
|
|
|
auto data = readPC<Word>();
|
|
|
|
writeSR(readSR() & data);
|
|
|
|
}
|
|
|
|
|
2016-07-25 13:15:54 +00:00
|
|
|
template<uint Size> auto M68K::ASL(uint32 result, uint shift) -> uint32 {
|
|
|
|
bool carry = false;
|
|
|
|
uint32 overflow = 0;
|
|
|
|
for(auto _ : range(shift)) {
|
|
|
|
carry = result & msb<Size>();
|
|
|
|
uint32 before = result;
|
|
|
|
result <<= 1;
|
|
|
|
overflow |= before ^ result;
|
|
|
|
}
|
|
|
|
|
|
|
|
r.c = carry;
|
|
|
|
r.v = sign<Size>(overflow) < 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
if(shift) r.x = r.c;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionASL(uint4 shift, DataRegister modify) -> void {
|
|
|
|
auto result = ASL<Size>(read<Size>(modify), shift);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionASL(DataRegister shift, DataRegister modify) -> void {
|
|
|
|
auto count = read<Long>(shift) & 63;
|
|
|
|
auto result = ASL<Size>(read<Size>(modify), count);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionASL(EffectiveAddress modify) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = ASL<Word>(read<Word, Hold>(modify), 1);
|
2016-07-25 13:15:54 +00:00
|
|
|
write<Word>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::ASR(uint32 result, uint shift) -> uint32 {
|
|
|
|
bool carry = false;
|
|
|
|
uint32 overflow = 0;
|
|
|
|
for(auto _ : range(shift)) {
|
|
|
|
carry = result & lsb<Size>();
|
|
|
|
uint32 before = result;
|
|
|
|
result = sign<Size>(result) >> 1;
|
|
|
|
overflow |= before ^ result;
|
|
|
|
}
|
|
|
|
|
|
|
|
r.c = carry;
|
|
|
|
r.v = sign<Size>(overflow) < 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
if(shift) r.x = r.c;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionASR(uint4 shift, DataRegister modify) -> void {
|
|
|
|
auto result = ASR<Size>(read<Size>(modify), shift);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionASR(DataRegister shift, DataRegister modify) -> void {
|
|
|
|
auto count = read<Long>(shift) & 63;
|
|
|
|
auto result = ASR<Size>(read<Size>(modify), count);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionASR(EffectiveAddress modify) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = ASR<Word>(read<Word, Hold>(modify), 1);
|
2016-07-25 13:15:54 +00:00
|
|
|
write<Word>(modify, result);
|
|
|
|
}
|
|
|
|
|
2016-07-17 03:24:28 +00:00
|
|
|
auto M68K::instructionBCC(uint4 condition, uint8 displacement) -> void {
|
2016-07-25 13:15:54 +00:00
|
|
|
auto extension = readPC<Word>();
|
2016-08-12 23:47:30 +00:00
|
|
|
if(displacement) r.pc -= 2;
|
|
|
|
if(condition >= 2 && !testCondition(condition)) return;
|
2016-07-22 12:03:25 +00:00
|
|
|
if(condition == 1) push<Long>(r.pc);
|
2016-08-12 23:47:30 +00:00
|
|
|
r.pc += displacement ? (int8_t)displacement : (int16_t)extension - 2;
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
template<uint Size> auto M68K::instructionBCHG(DataRegister bit, EffectiveAddress with) -> void {
|
|
|
|
auto index = read<Size>(bit) & bits<Size>() - 1;
|
2016-08-17 12:31:22 +00:00
|
|
|
auto test = read<Size, Hold>(with);
|
2016-08-09 11:07:18 +00:00
|
|
|
r.z = test.bit(index) == 0;
|
|
|
|
test.bit(index) ^= 1;
|
|
|
|
write<Size>(with, test);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionBCHG(EffectiveAddress with) -> void {
|
|
|
|
auto index = readPC<Word>() & bits<Size>() - 1;
|
2016-08-17 12:31:22 +00:00
|
|
|
auto test = read<Size, Hold>(with);
|
2016-08-09 11:07:18 +00:00
|
|
|
r.z = test.bit(index) == 0;
|
|
|
|
test.bit(index) ^= 1;
|
|
|
|
write<Size>(with, test);
|
|
|
|
}
|
2016-07-22 12:03:25 +00:00
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
template<uint Size> auto M68K::instructionBCLR(DataRegister bit, EffectiveAddress with) -> void {
|
|
|
|
auto index = read<Size>(bit) & bits<Size>() - 1;
|
2016-08-17 12:31:22 +00:00
|
|
|
auto test = read<Size, Hold>(with);
|
2016-08-09 11:07:18 +00:00
|
|
|
r.z = test.bit(index) == 0;
|
|
|
|
test.bit(index) = 0;
|
|
|
|
write<Size>(with, test);
|
2016-07-22 12:03:25 +00:00
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
template<uint Size> auto M68K::instructionBCLR(EffectiveAddress with) -> void {
|
|
|
|
auto index = readPC<Word>() & bits<Size>() - 1;
|
2016-08-17 12:31:22 +00:00
|
|
|
auto test = read<Size, Hold>(with);
|
2016-08-09 11:07:18 +00:00
|
|
|
r.z = test.bit(index) == 0;
|
|
|
|
test.bit(index) = 0;
|
|
|
|
write<Size>(with, test);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionBSET(DataRegister bit, EffectiveAddress with) -> void {
|
|
|
|
auto index = read<Size>(bit) & bits<Size>() - 1;
|
2016-08-17 12:31:22 +00:00
|
|
|
auto test = read<Size, Hold>(with);
|
2016-08-09 11:07:18 +00:00
|
|
|
r.z = test.bit(index) == 0;
|
|
|
|
test.bit(index) = 1;
|
|
|
|
write<Size>(with, test);
|
|
|
|
}
|
2016-07-22 12:03:25 +00:00
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
template<uint Size> auto M68K::instructionBSET(EffectiveAddress with) -> void {
|
|
|
|
auto index = readPC<Word>() & bits<Size>() - 1;
|
2016-08-17 12:31:22 +00:00
|
|
|
auto test = read<Size, Hold>(with);
|
2016-08-09 11:07:18 +00:00
|
|
|
r.z = test.bit(index) == 0;
|
|
|
|
test.bit(index) = 1;
|
|
|
|
write<Size>(with, test);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionBTST(DataRegister bit, EffectiveAddress with) -> void {
|
|
|
|
auto index = read<Size>(bit) & bits<Size>() - 1;
|
|
|
|
auto test = read<Size>(with);
|
|
|
|
r.z = test.bit(index) == 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionBTST(EffectiveAddress with) -> void {
|
|
|
|
auto index = readPC<Word>() & bits<Size>() - 1;
|
|
|
|
auto test = read<Size>(with);
|
|
|
|
r.z = test.bit(index) == 0;
|
2016-07-22 12:03:25 +00:00
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
auto M68K::instructionCHK(DataRegister compare, EffectiveAddress maximum) -> void {
|
|
|
|
auto source = read<Word>(maximum);
|
|
|
|
auto target = read<Word>(compare);
|
|
|
|
|
|
|
|
r.z = clip<Word>(target) == 0;
|
|
|
|
r.n = sign<Word>(target) < 0;
|
|
|
|
if(r.n) return exception(Exception::BoundsCheck, Vector::BoundsCheck);
|
|
|
|
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = (uint64)target - source;
|
2016-08-10 22:02:02 +00:00
|
|
|
r.c = sign<Word>(result >> 1) < 0;
|
|
|
|
r.v = sign<Word>((target ^ source) & (target ^ result)) < 0;
|
|
|
|
r.z = clip<Word>(result) == 0;
|
|
|
|
r.n = sign<Word>(result) < 0;
|
|
|
|
if(r.n == r.v && !r.z) return exception(Exception::BoundsCheck, Vector::BoundsCheck);
|
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
template<uint Size> auto M68K::instructionCLR(EffectiveAddress ea) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
read<Size, Hold>(ea);
|
2016-07-22 12:03:25 +00:00
|
|
|
write<Size>(ea, 0);
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = 1;
|
|
|
|
r.n = 0;
|
|
|
|
}
|
|
|
|
|
2016-07-26 10:46:43 +00:00
|
|
|
template<uint Size> auto M68K::CMP(uint32 source, uint32 target) -> uint32 {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = (uint64)target - source;
|
2016-07-25 13:15:54 +00:00
|
|
|
|
|
|
|
r.c = sign<Size>(result >> 1) < 0;
|
|
|
|
r.v = sign<Size>((target ^ source) & (target ^ result)) < 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
2016-07-26 10:46:43 +00:00
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionCMP(DataRegister dr, EffectiveAddress ea) -> void {
|
|
|
|
auto source = read<Size>(ea);
|
|
|
|
auto target = read<Size>(dr);
|
|
|
|
CMP<Size>(source, target);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionCMPA(AddressRegister ar, EffectiveAddress ea) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto source = sign<Size>(read<Size>(ea));
|
2016-08-21 22:11:24 +00:00
|
|
|
auto target = read<Long>(ar);
|
|
|
|
CMP<Long>(source, target);
|
2016-07-26 10:46:43 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionCMPI(EffectiveAddress ea) -> void {
|
|
|
|
auto source = readPC<Size>();
|
|
|
|
auto target = read<Size>(ea);
|
|
|
|
CMP<Size>(source, target);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionCMPM(EffectiveAddress ax, EffectiveAddress ay) -> void {
|
|
|
|
auto source = read<Size>(ay);
|
|
|
|
auto target = read<Size>(ax);
|
|
|
|
CMP<Size>(source, target);
|
2016-07-22 12:03:25 +00:00
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
auto M68K::instructionDBCC(uint4 condition, DataRegister dr) -> void {
|
2016-07-25 13:15:54 +00:00
|
|
|
auto displacement = readPC<Word>();
|
2016-07-22 12:03:25 +00:00
|
|
|
if(!testCondition(condition)) {
|
2016-07-23 02:32:35 +00:00
|
|
|
uint16 result = read<Word>(dr);
|
|
|
|
write<Word>(dr, result - 1);
|
2016-07-25 13:15:54 +00:00
|
|
|
if(result) r.pc -= 2, r.pc += sign<Word>(displacement);
|
2016-07-22 12:03:25 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
template<bool Sign> auto M68K::DIV(uint16 divisor, DataRegister with) -> void {
|
|
|
|
auto dividend = read<Long>(with);
|
|
|
|
bool negativeQuotient = false;
|
|
|
|
bool negativeRemainder = false;
|
|
|
|
bool overflow = false;
|
|
|
|
|
|
|
|
if(divisor == 0) return exception(Exception::DivisionByZero, Vector::DivisionByZero);
|
|
|
|
|
|
|
|
if(Sign) {
|
|
|
|
negativeQuotient = (dividend >> 31) ^ (divisor >> 15);
|
|
|
|
if(dividend >> 31) dividend = -dividend, negativeRemainder = true;
|
|
|
|
if(divisor >> 15) divisor = -divisor;
|
|
|
|
}
|
|
|
|
|
|
|
|
auto result = dividend;
|
|
|
|
|
|
|
|
for(auto _ : range(16)) {
|
|
|
|
bool lb = false;
|
|
|
|
if(result >= (uint32)divisor << 15) result -= divisor << 15, lb = true;
|
|
|
|
|
|
|
|
bool ob = result >> 31;
|
|
|
|
result = result << 1 | lb;
|
|
|
|
|
|
|
|
if(ob) overflow = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if(Sign) {
|
|
|
|
if((uint16)result > 0x7fff + negativeQuotient) overflow = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
if(result >> 16 >= divisor) overflow = true;
|
|
|
|
|
|
|
|
if(Sign && !overflow) {
|
|
|
|
if(negativeQuotient) result = ((-result) & 0xffff) | (result & 0xffff0000);
|
|
|
|
if(negativeRemainder) result = (((-(result >> 16)) << 16) & 0xffff0000) | (result & 0xffff);
|
|
|
|
}
|
|
|
|
|
|
|
|
if(!overflow) write<Long>(with, result);
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = overflow;
|
|
|
|
r.z = clip<Word>(result) == 0;
|
|
|
|
r.n = sign<Word>(result) < 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionDIVS(DataRegister with, EffectiveAddress from) -> void {
|
|
|
|
auto divisor = read<Word>(from);
|
|
|
|
DIV<1>(divisor, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionDIVU(DataRegister with, EffectiveAddress from) -> void {
|
|
|
|
auto divisor = read<Word>(from);
|
|
|
|
DIV<0>(divisor, with);
|
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
template<uint Size> auto M68K::EOR(uint32 source, uint32 target) -> uint32 {
|
|
|
|
uint32 result = target ^ source;
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionEOR(DataRegister from, EffectiveAddress with) -> void {
|
|
|
|
auto source = read<Size>(from);
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(with);
|
2016-08-08 10:12:03 +00:00
|
|
|
auto result = EOR<Size>(source, target);
|
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionEORI(EffectiveAddress with) -> void {
|
|
|
|
auto source = readPC<Size>();
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(with);
|
2016-08-08 10:12:03 +00:00
|
|
|
auto result = EOR<Size>(source, target);
|
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
auto M68K::instructionEORI_TO_CCR() -> void {
|
|
|
|
auto data = readPC<Word>();
|
|
|
|
writeCCR(readCCR() ^ data);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionEORI_TO_SR() -> void {
|
|
|
|
if(!supervisor()) return;
|
|
|
|
|
|
|
|
auto data = readPC<Word>();
|
|
|
|
writeSR(readSR() ^ data);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
auto M68K::instructionEXG(DataRegister x, DataRegister y) -> void {
|
|
|
|
auto z = read<Long>(x);
|
|
|
|
write<Long>(x, read<Long>(y));
|
|
|
|
write<Long>(y, z);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionEXG(AddressRegister x, AddressRegister y) -> void {
|
|
|
|
auto z = read<Long>(x);
|
|
|
|
write<Long>(x, read<Long>(y));
|
|
|
|
write<Long>(y, z);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionEXG(DataRegister x, AddressRegister y) -> void {
|
|
|
|
auto z = read<Long>(x);
|
|
|
|
write<Long>(x, read<Long>(y));
|
|
|
|
write<Long>(y, z);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<> auto M68K::instructionEXT<Word>(DataRegister with) -> void {
|
|
|
|
auto result = (int8)read<Byte>(with);
|
|
|
|
write<Word>(with, result);
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Word>(result) == 0;
|
|
|
|
r.n = sign<Word>(result) < 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
template<> auto M68K::instructionEXT<Long>(DataRegister with) -> void {
|
|
|
|
auto result = (int16)read<Word>(with);
|
|
|
|
write<Long>(with, result);
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Long>(result) == 0;
|
|
|
|
r.n = sign<Long>(result) < 0;
|
|
|
|
}
|
|
|
|
|
Update to v104r07 release.
byuu says:
Changelog:
- md/vdp: added VIP bit to status register; fixes Cliffhanger
- processor/m68k/disassembler: added modes 7 and 8 to LEA address
disassembly
- processor/m68k/disassembler: enhanced ILLEGAL to display LINEA/LINEF
$xxx variants
- processor/m68k: ILLEGAL/LINEA/LINEF do not modify the stack
register; fixes Caeser no Yabou II
- icarus/sfc: request sgb1.boot.rom and sgb2.boot.rom separately; as
they are different
- icarus/sfc: removed support for external firmware when loading ROM
images
The hack to run Mega Drive Ballz 3D isn't in place, as I don't know if
it's correct, and the graphics were corrupted anyway.
The SGB boot ROM change is going to require updating the icarus database
as well. I will add that in when I start dumping more cartridges here
soon.
Finally ... I explained this already, but I'll do so here as well: I
removed icarus' support for loading SNES coprocessor firmware games with
external firmware files (eg dsp1.program.rom + dsp1.data.rom in the same
path as supermariokart.sfc, for example.)
I realize most are going to see this as an antagonizing/stubborn move
given the recent No-Intro discussion, and I won't deny that said thread
is why this came to the forefront of my mind. But on my word, I honestly
believe this was an ineffective solution for many reasons not related to
our disagreements:
1. No-Intro distributes SNES coprocessor firmware as a merged file, eg
"DSP1 (World).zip/DSP1 (World).bin" -- icarus can't possibly know
about every ROM distribution set's naming conventions for firmware.
(Right now, it appears GoodSNES and NSRT are mostly dead; but there
may be more DATs in the future -- including my own.)
2. Even if the user obtains the firmware and tries to rename it, it
won't work: icarus parses manifests generated by the heuristics
module and sees two ROM files: dsp1.program.rom and dsp1.data.rom.
icarus cannot identify a file named dsp1.rom as containing both
of these sub-files. Users are going to have to know how to split
files, which there is no way to do on stock Windows. Merging files,
however, can be done via `copy /b supermariokart.sfc+dsp1.rom
supermariokartdsp.sfc`; - and dsp1.rom can be named whatever now.
I am not saying this will be easy for the average user, but it's
easier than splitting files.
3. Separate firmware breaks icarus' database lookup. If you have
pilotwings.sfc but without firmware, icarus will not find a match
for it in the database lookup phase. It will then fall back on
heuristics. The heuristics will pick DSP1B for compatibility with
Ballz 3D which requires it. And so it will try to pull in the
wrong firmware, and the game's intro will not work correctly.
Furthermore, the database information will be unavailable, resulting
in inaccurate mirroring.
So for these reasons, I have removed said support. You must now load
SNES coprocessor games into higan in one of two ways: 1) game paks with
split files; or 2) SFC images with merged firmware.
If and when No-Intro deploys a method I can actually use, I give you all
my word I will give it a fair shot and if it's reasonable, I'll support
it in icarus.
2017-08-28 12:46:14 +00:00
|
|
|
auto M68K::instructionILLEGAL(uint16 code) -> void {
|
2016-08-10 22:02:02 +00:00
|
|
|
r.pc -= 2;
|
Update to v104r07 release.
byuu says:
Changelog:
- md/vdp: added VIP bit to status register; fixes Cliffhanger
- processor/m68k/disassembler: added modes 7 and 8 to LEA address
disassembly
- processor/m68k/disassembler: enhanced ILLEGAL to display LINEA/LINEF
$xxx variants
- processor/m68k: ILLEGAL/LINEA/LINEF do not modify the stack
register; fixes Caeser no Yabou II
- icarus/sfc: request sgb1.boot.rom and sgb2.boot.rom separately; as
they are different
- icarus/sfc: removed support for external firmware when loading ROM
images
The hack to run Mega Drive Ballz 3D isn't in place, as I don't know if
it's correct, and the graphics were corrupted anyway.
The SGB boot ROM change is going to require updating the icarus database
as well. I will add that in when I start dumping more cartridges here
soon.
Finally ... I explained this already, but I'll do so here as well: I
removed icarus' support for loading SNES coprocessor firmware games with
external firmware files (eg dsp1.program.rom + dsp1.data.rom in the same
path as supermariokart.sfc, for example.)
I realize most are going to see this as an antagonizing/stubborn move
given the recent No-Intro discussion, and I won't deny that said thread
is why this came to the forefront of my mind. But on my word, I honestly
believe this was an ineffective solution for many reasons not related to
our disagreements:
1. No-Intro distributes SNES coprocessor firmware as a merged file, eg
"DSP1 (World).zip/DSP1 (World).bin" -- icarus can't possibly know
about every ROM distribution set's naming conventions for firmware.
(Right now, it appears GoodSNES and NSRT are mostly dead; but there
may be more DATs in the future -- including my own.)
2. Even if the user obtains the firmware and tries to rename it, it
won't work: icarus parses manifests generated by the heuristics
module and sees two ROM files: dsp1.program.rom and dsp1.data.rom.
icarus cannot identify a file named dsp1.rom as containing both
of these sub-files. Users are going to have to know how to split
files, which there is no way to do on stock Windows. Merging files,
however, can be done via `copy /b supermariokart.sfc+dsp1.rom
supermariokartdsp.sfc`; - and dsp1.rom can be named whatever now.
I am not saying this will be easy for the average user, but it's
easier than splitting files.
3. Separate firmware breaks icarus' database lookup. If you have
pilotwings.sfc but without firmware, icarus will not find a match
for it in the database lookup phase. It will then fall back on
heuristics. The heuristics will pick DSP1B for compatibility with
Ballz 3D which requires it. And so it will try to pull in the
wrong firmware, and the game's intro will not work correctly.
Furthermore, the database information will be unavailable, resulting
in inaccurate mirroring.
So for these reasons, I have removed said support. You must now load
SNES coprocessor games into higan in one of two ways: 1) game paks with
split files; or 2) SFC images with merged firmware.
If and when No-Intro deploys a method I can actually use, I give you all
my word I will give it a fair shot and if it's reasonable, I'll support
it in icarus.
2017-08-28 12:46:14 +00:00
|
|
|
if(code.bits(12,15) == 0xa) return exception(Exception::Illegal, Vector::IllegalLineA);
|
|
|
|
if(code.bits(12,15) == 0xf) return exception(Exception::Illegal, Vector::IllegalLineF);
|
2016-08-10 22:02:02 +00:00
|
|
|
return exception(Exception::Illegal, Vector::Illegal);
|
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
auto M68K::instructionJMP(EffectiveAddress target) -> void {
|
|
|
|
r.pc = fetch<Long>(target);
|
|
|
|
}
|
|
|
|
|
2016-07-26 10:46:43 +00:00
|
|
|
auto M68K::instructionJSR(EffectiveAddress target) -> void {
|
2016-08-17 22:04:50 +00:00
|
|
|
auto pc = fetch<Long>(target);
|
2016-07-26 10:46:43 +00:00
|
|
|
push<Long>(r.pc);
|
2016-08-17 22:04:50 +00:00
|
|
|
r.pc = pc;
|
2016-07-26 10:46:43 +00:00
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
auto M68K::instructionLEA(AddressRegister ar, EffectiveAddress ea) -> void {
|
|
|
|
write<Long>(ar, fetch<Long>(ea));
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
auto M68K::instructionLINK(AddressRegister with) -> void {
|
|
|
|
auto displacement = (int16)readPC<Word>();
|
|
|
|
auto sp = AddressRegister{7};
|
|
|
|
push<Long>(read<Long>(with));
|
|
|
|
write<Long>(with, read<Long>(sp));
|
|
|
|
write<Long>(sp, read<Long>(sp) + displacement);
|
|
|
|
}
|
|
|
|
|
2016-07-25 13:15:54 +00:00
|
|
|
template<uint Size> auto M68K::LSL(uint32 result, uint shift) -> uint32 {
|
|
|
|
bool carry = false;
|
|
|
|
for(auto _ : range(shift)) {
|
|
|
|
carry = result & msb<Size>();
|
|
|
|
result <<= 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
r.c = carry;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
if(shift) r.x = r.c;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionLSL(uint4 immediate, DataRegister dr) -> void {
|
|
|
|
auto result = LSL<Size>(read<Size>(dr), immediate);
|
|
|
|
write<Size>(dr, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionLSL(DataRegister sr, DataRegister dr) -> void {
|
|
|
|
auto shift = read<Long>(sr) & 63;
|
|
|
|
auto result = LSL<Size>(read<Size>(dr), shift);
|
|
|
|
write<Size>(dr, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionLSL(EffectiveAddress ea) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = LSL<Word>(read<Word, Hold>(ea), 1);
|
2016-07-25 13:15:54 +00:00
|
|
|
write<Word>(ea, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::LSR(uint32 result, uint shift) -> uint32 {
|
|
|
|
bool carry = false;
|
|
|
|
for(auto _ : range(shift)) {
|
|
|
|
carry = result & lsb<Size>();
|
|
|
|
result >>= 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
r.c = carry;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
if(shift) r.x = r.c;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionLSR(uint4 immediate, DataRegister dr) -> void {
|
|
|
|
auto result = LSR<Size>(read<Size>(dr), immediate);
|
|
|
|
write<Size>(dr, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionLSR(DataRegister shift, DataRegister dr) -> void {
|
|
|
|
auto count = read<Long>(shift) & 63;
|
|
|
|
auto result = LSR<Size>(read<Size>(dr), count);
|
|
|
|
write<Size>(dr, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionLSR(EffectiveAddress ea) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = LSR<Word>(read<Word, Hold>(ea), 1);
|
2016-07-25 13:15:54 +00:00
|
|
|
write<Word>(ea, result);
|
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
template<uint Size> auto M68K::instructionMOVE(EffectiveAddress to, EffectiveAddress from) -> void {
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
auto data = read<Size>(from);
|
|
|
|
write<Size>(to, data);
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
2016-08-10 22:02:02 +00:00
|
|
|
r.z = clip<Size>(data) == 0;
|
|
|
|
r.n = sign<Size>(data) < 0;
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
template<uint Size> auto M68K::instructionMOVEA(AddressRegister ar, EffectiveAddress ea) -> void {
|
2016-08-21 22:11:24 +00:00
|
|
|
auto data = sign<Size>(read<Size>(ea));
|
2016-07-23 02:32:35 +00:00
|
|
|
write<Long>(ar, data);
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
template<uint Size> auto M68K::instructionMOVEM_TO_MEM(EffectiveAddress to) -> void {
|
|
|
|
auto list = readPC<Word>();
|
|
|
|
auto addr = fetch<Long>(to);
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
|
2016-07-25 13:15:54 +00:00
|
|
|
for(uint n : range(16)) {
|
|
|
|
if(!list.bit(n)) continue;
|
|
|
|
//pre-decrement mode traverses registers in reverse order {A7-A0, D7-D0}
|
2016-08-10 22:02:02 +00:00
|
|
|
uint index = to.mode == AddressRegisterIndirectWithPreDecrement ? 15 - n : n;
|
|
|
|
|
|
|
|
if(to.mode == AddressRegisterIndirectWithPreDecrement) addr -= bytes<Size>();
|
|
|
|
auto data = index < 8 ? read<Size>(DataRegister{index}) : read<Size>(AddressRegister{index});
|
|
|
|
write<Size>(addr, data);
|
2016-08-21 22:11:24 +00:00
|
|
|
if(to.mode != AddressRegisterIndirectWithPreDecrement) addr += bytes<Size>();
|
2016-08-10 22:02:02 +00:00
|
|
|
}
|
|
|
|
|
2016-08-21 22:11:24 +00:00
|
|
|
AddressRegister with{to.reg};
|
|
|
|
if(to.mode == AddressRegisterIndirectWithPreDecrement ) write<Long>(with, addr);
|
|
|
|
if(to.mode == AddressRegisterIndirectWithPostIncrement) write<Long>(with, addr);
|
2016-08-10 22:02:02 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionMOVEM_TO_REG(EffectiveAddress from) -> void {
|
|
|
|
auto list = readPC<Word>();
|
|
|
|
auto addr = fetch<Long>(from);
|
|
|
|
|
|
|
|
for(uint n : range(16)) {
|
|
|
|
if(!list.bit(n)) continue;
|
|
|
|
uint index = from.mode == AddressRegisterIndirectWithPreDecrement ? 15 - n : n;
|
2016-07-25 13:15:54 +00:00
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
if(from.mode == AddressRegisterIndirectWithPreDecrement) addr -= bytes<Size>();
|
|
|
|
auto data = read<Size>(addr);
|
|
|
|
data = sign<Size>(data);
|
|
|
|
index < 8 ? write<Long>(DataRegister{index}, data) : write<Long>(AddressRegister{index}, data);
|
2016-08-21 22:11:24 +00:00
|
|
|
if(from.mode != AddressRegisterIndirectWithPreDecrement) addr += bytes<Size>();
|
2016-08-10 22:02:02 +00:00
|
|
|
}
|
2016-07-25 13:15:54 +00:00
|
|
|
|
2016-08-21 22:11:24 +00:00
|
|
|
AddressRegister with{from.reg};
|
|
|
|
if(from.mode == AddressRegisterIndirectWithPreDecrement ) write<Long>(with, addr);
|
|
|
|
if(from.mode == AddressRegisterIndirectWithPostIncrement) write<Long>(with, addr);
|
2016-08-10 22:02:02 +00:00
|
|
|
}
|
2016-07-23 02:32:35 +00:00
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
template<uint Size> auto M68K::instructionMOVEP(DataRegister from, EffectiveAddress to) -> void {
|
|
|
|
auto address = fetch<Size>(to);
|
|
|
|
auto data = read<Long>(from);
|
|
|
|
uint shift = bits<Size>();
|
|
|
|
for(auto _ : range(bytes<Size>())) {
|
|
|
|
shift -= 8;
|
|
|
|
write<Byte>(address, data >> shift);
|
|
|
|
address += 2;
|
2016-07-12 10:19:31 +00:00
|
|
|
}
|
2016-08-10 22:02:02 +00:00
|
|
|
}
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
template<uint Size> auto M68K::instructionMOVEP(EffectiveAddress from, DataRegister to) -> void {
|
|
|
|
auto address = fetch<Size>(from);
|
|
|
|
auto data = read<Long>(to);
|
|
|
|
uint shift = bits<Size>();
|
|
|
|
for(auto _ : range(bytes<Size>())) {
|
|
|
|
shift -= 8;
|
|
|
|
data &= ~(0xff << shift);
|
|
|
|
data |= read<Byte>(address) << shift;
|
|
|
|
address += 2;
|
|
|
|
}
|
|
|
|
write<Long>(to, data);
|
2016-07-12 10:19:31 +00:00
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
auto M68K::instructionMOVEQ(DataRegister dr, uint8 immediate) -> void {
|
2016-08-19 14:11:26 +00:00
|
|
|
write<Long>(dr, sign<Byte>(immediate));
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
2016-08-10 22:02:02 +00:00
|
|
|
r.z = clip<Byte>(immediate) == 0;
|
|
|
|
r.n = sign<Byte>(immediate) < 0;
|
2016-07-12 22:47:04 +00:00
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
auto M68K::instructionMOVE_FROM_SR(EffectiveAddress ea) -> void {
|
|
|
|
auto data = readSR();
|
|
|
|
write<Word>(ea, data);
|
2016-07-22 12:03:25 +00:00
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
auto M68K::instructionMOVE_TO_CCR(EffectiveAddress ea) -> void {
|
Update to v102r08 release.
byuu says:
Changelog:
- PCE: restructured VCE, VDCs to run one scanline at a time
- PCE: bound VDCs to 1365x262 timing (in order to decouple the VDCs
from the VCE)
- PCE: the two changes above allow save states to function; also
grants a minor speed boost
- PCE: added cheat code support (uses 21-bit bus addressing; compare
byte will be useful here)
- 68K: fixed `mov *,ccr` to read two bytes instead of one [Cydrak]
- Z80: emulated /BUSREQ, /BUSACK; allows 68K to suspend the Z80
[Cydrak]
- MD: emulated the Z80 executing instructions [Cydrak]
- MD: emulated Z80 interrupts (triggered during each Vblank period)
[Cydrak]
- MD: emulated Z80 memory map [Cydrak]
- MD: added stubs for PSG, YM2612 accesses [Cydrak]
- MD: improved bus emulation [Cydrak]
The PCE core is pretty much ready to go. The only major feature missing
is FM modulation.
The Mega Drive improvements let us start to see the splash screens for
Langrisser II, Shining Force, Shining in the Darkness. I was hoping I
could get them in-game, but no such luck. My Z80 implementation is
probably flawed in some way ... now that I think about it, I believe I
missed the BusAPU::reset() check for having been granted access to the
Z80 first. But I doubt that's the problem.
Next step is to implement Cydrak's PSG core into the Master System
emulator. Once that's in, I'm going to add save states and cheat code
support to the Master System core.
Next, I'll add the PSG core into the Mega Drive. Then I'll add the
'easy' PCM part of the YM2612. Then the rest of the beastly YM2612 core.
Then finally, cap things off with save state and cheat code support.
Should be nearing a new release at that point.
2017-02-20 08:13:10 +00:00
|
|
|
auto data = read<Word>(ea);
|
2016-07-23 02:32:35 +00:00
|
|
|
writeCCR(data);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionMOVE_TO_SR(EffectiveAddress ea) -> void {
|
2016-07-22 12:03:25 +00:00
|
|
|
if(!supervisor()) return;
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
auto data = read<Word>(ea);
|
|
|
|
writeSR(data);
|
2016-07-22 12:03:25 +00:00
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
auto M68K::instructionMOVE_FROM_USP(AddressRegister to) -> void {
|
2016-07-22 12:03:25 +00:00
|
|
|
if(!supervisor()) return;
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
write<Long>(to, r.sp);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionMOVE_TO_USP(AddressRegister from) -> void {
|
|
|
|
if(!supervisor()) return;
|
|
|
|
|
|
|
|
r.sp = read<Long>(from);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
auto M68K::instructionMULS(DataRegister with, EffectiveAddress from) -> void {
|
|
|
|
auto source = read<Word>(from);
|
|
|
|
auto target = read<Word>(with);
|
|
|
|
auto result = (int16)source * (int16)target;
|
|
|
|
write<Long>(with, result);
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Long>(result) == 0;
|
|
|
|
r.n = sign<Long>(result) < 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionMULU(DataRegister with, EffectiveAddress from) -> void {
|
|
|
|
auto source = read<Word>(from);
|
|
|
|
auto target = read<Word>(with);
|
|
|
|
auto result = source * target;
|
|
|
|
write<Long>(with, result);
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Long>(result) == 0;
|
|
|
|
r.n = sign<Long>(result) < 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionNBCD(EffectiveAddress with) -> void {
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
auto source = read<Byte, Hold>(with);
|
|
|
|
auto target = 0u;
|
2016-08-17 22:04:50 +00:00
|
|
|
auto result = target - source - r.x;
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
bool c = false;
|
2016-08-10 22:02:02 +00:00
|
|
|
bool v = false;
|
|
|
|
|
|
|
|
const bool adjustLo = (target ^ source ^ result) & 0x10;
|
|
|
|
const bool adjustHi = result & 0x100;
|
|
|
|
|
|
|
|
if(adjustLo) {
|
|
|
|
auto previous = result;
|
|
|
|
result -= 0x06;
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
c = (~previous & 0x80) & ( result & 0x80);
|
|
|
|
v |= ( previous & 0x80) & (~result & 0x80);
|
2016-08-10 22:02:02 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if(adjustHi) {
|
|
|
|
auto previous = result;
|
|
|
|
result -= 0x60;
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
c = true;
|
2016-08-10 22:02:02 +00:00
|
|
|
v |= (previous & 0x80) & (~result & 0x80);
|
|
|
|
}
|
|
|
|
|
|
|
|
write<Byte>(with, result);
|
|
|
|
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
r.c = c;
|
2016-08-10 22:02:02 +00:00
|
|
|
r.v = v;
|
2016-08-17 12:31:22 +00:00
|
|
|
r.z = clip<Byte>(result) ? 0 : r.z;
|
2016-08-10 22:02:02 +00:00
|
|
|
r.n = sign<Byte>(result) < 0;
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
r.x = r.c;
|
2016-08-10 22:02:02 +00:00
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
template<uint Size> auto M68K::instructionNEG(EffectiveAddress with) -> void {
|
Update to v101r11 release.
byuu says:
Changelog:
- 68K: fixed NEG/NEGX operand order
- 68K: fixed bug in disassembler that was breaking trace logging
- VDP: improved sprite rendering (still 100% broken)
- VDP: added horizontal/vertical scrolling (90% broken)
Forgot:
- 68K: fix extension word sign bit on indexed modes for disassembler
as well
- 68K: emulate STOP properly (use r.stop flag; clear on IRQs firing)
I'm really wearing out fast here. The Genesis documentation is somehow
even worse than Game Boy documentation, but this is a far more complex
system.
It's a massive time sink to sit here banging away at every possible
combination of how things could work, only to see no positive
improvements. Nothing I do seems to get sprites to do a goddamn thing.
squee says the sprite Y field is 10-bits, X field is 9-bits. genvdp says
they're both 10-bits. BlastEm treats them like they're both 10-bits,
then masks off the upper bit so it's effectively 9-bits anyway.
Nothing ever bothers to tell you whether the horizontal scroll values
are supposed to add or subtract from the current X position. Probably
the most basic detail you could imagine for explaining horizontal
scrolling and yet ... nope. Nothing.
I can't even begin to understand how the VDP FIFO functionality works,
or what the fuck is meant by "slots".
I'm completely at a loss as how how in the holy hell the 68K works with
8-bit accesses. I don't know whether I need byte/word handlers for every
device, or if I can just hook it right into the 68K core itself. This
one's probably the most major design detail. I need to know this before
I go and implement the PSG/YM2612/IO ports-\>gamepads/Z80/etc.
Trying to debug the 68K is murder because basically every game likes to
start with a 20,000,000-instruction reset phase of checksumming entire
games, and clearing out the memory as agonizingly slowly as humanly
possible. And like the ARM, there's too many registers so I'd need three
widescreen monitors to comfortably view the entire debugger output lines
onscreen.
I can't get any test ROMs to debug functionality outside of full games
because every **goddamned** test ROM coder thinks it's acceptable to tell
people to go fetch some toolchain from a link that died in the late '90s
and only works on MS-DOS 6.22 to build their fucking shit, because god
forbid you include a 32KiB assembled ROM image in your fucking archives.
... I may have to take a break for a while. We'll see.
2016-08-21 02:50:05 +00:00
|
|
|
auto result = SUB<Size>(read<Size, Hold>(with), 0);
|
2016-08-09 11:07:18 +00:00
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionNEGX(EffectiveAddress with) -> void {
|
Update to v101r11 release.
byuu says:
Changelog:
- 68K: fixed NEG/NEGX operand order
- 68K: fixed bug in disassembler that was breaking trace logging
- VDP: improved sprite rendering (still 100% broken)
- VDP: added horizontal/vertical scrolling (90% broken)
Forgot:
- 68K: fix extension word sign bit on indexed modes for disassembler
as well
- 68K: emulate STOP properly (use r.stop flag; clear on IRQs firing)
I'm really wearing out fast here. The Genesis documentation is somehow
even worse than Game Boy documentation, but this is a far more complex
system.
It's a massive time sink to sit here banging away at every possible
combination of how things could work, only to see no positive
improvements. Nothing I do seems to get sprites to do a goddamn thing.
squee says the sprite Y field is 10-bits, X field is 9-bits. genvdp says
they're both 10-bits. BlastEm treats them like they're both 10-bits,
then masks off the upper bit so it's effectively 9-bits anyway.
Nothing ever bothers to tell you whether the horizontal scroll values
are supposed to add or subtract from the current X position. Probably
the most basic detail you could imagine for explaining horizontal
scrolling and yet ... nope. Nothing.
I can't even begin to understand how the VDP FIFO functionality works,
or what the fuck is meant by "slots".
I'm completely at a loss as how how in the holy hell the 68K works with
8-bit accesses. I don't know whether I need byte/word handlers for every
device, or if I can just hook it right into the 68K core itself. This
one's probably the most major design detail. I need to know this before
I go and implement the PSG/YM2612/IO ports-\>gamepads/Z80/etc.
Trying to debug the 68K is murder because basically every game likes to
start with a 20,000,000-instruction reset phase of checksumming entire
games, and clearing out the memory as agonizingly slowly as humanly
possible. And like the ARM, there's too many registers so I'd need three
widescreen monitors to comfortably view the entire debugger output lines
onscreen.
I can't get any test ROMs to debug functionality outside of full games
because every **goddamned** test ROM coder thinks it's acceptable to tell
people to go fetch some toolchain from a link that died in the late '90s
and only works on MS-DOS 6.22 to build their fucking shit, because god
forbid you include a 32KiB assembled ROM image in your fucking archives.
... I may have to take a break for a while. We'll see.
2016-08-21 02:50:05 +00:00
|
|
|
auto result = SUB<Size, Extend>(read<Size, Hold>(with), 0);
|
2016-08-09 11:07:18 +00:00
|
|
|
write<Size>(with, result);
|
2016-07-12 22:47:04 +00:00
|
|
|
}
|
|
|
|
|
2016-07-12 10:19:31 +00:00
|
|
|
auto M68K::instructionNOP() -> void {
|
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
template<uint Size> auto M68K::instructionNOT(EffectiveAddress with) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = ~read<Size, Hold>(with);
|
2016-08-09 11:07:18 +00:00
|
|
|
write<Size>(with, result);
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
template<uint Size> auto M68K::OR(uint32 source, uint32 target) -> uint32 {
|
|
|
|
auto result = target | source;
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionOR(EffectiveAddress from, DataRegister with) -> void {
|
|
|
|
auto source = read<Size>(from);
|
|
|
|
auto target = read<Size>(with);
|
|
|
|
auto result = OR<Size>(source, target);
|
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionOR(DataRegister from, EffectiveAddress with) -> void {
|
|
|
|
auto source = read<Size>(from);
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(with);
|
2016-08-08 10:12:03 +00:00
|
|
|
auto result = OR<Size>(source, target);
|
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionORI(EffectiveAddress with) -> void {
|
|
|
|
auto source = readPC<Size>();
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(with);
|
2016-08-08 10:12:03 +00:00
|
|
|
auto result = OR<Size>(source, target);
|
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
auto M68K::instructionORI_TO_CCR() -> void {
|
|
|
|
auto data = readPC<Word>();
|
|
|
|
writeCCR(readCCR() | data);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionORI_TO_SR() -> void {
|
|
|
|
if(!supervisor()) return;
|
|
|
|
|
|
|
|
auto data = readPC<Word>();
|
|
|
|
writeSR(readSR() | data);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
auto M68K::instructionPEA(EffectiveAddress from) -> void {
|
|
|
|
auto data = fetch<Long>(from);
|
|
|
|
push<Long>(data);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionRESET() -> void {
|
|
|
|
if(!supervisor()) return;
|
|
|
|
|
|
|
|
r.reset = true;
|
|
|
|
}
|
|
|
|
|
2016-07-25 13:15:54 +00:00
|
|
|
template<uint Size> auto M68K::ROL(uint32 result, uint shift) -> uint32 {
|
|
|
|
bool carry = false;
|
|
|
|
for(auto _ : range(shift)) {
|
|
|
|
carry = result & msb<Size>();
|
|
|
|
result = result << 1 | carry;
|
|
|
|
}
|
|
|
|
|
|
|
|
r.c = carry;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionROL(uint4 shift, DataRegister modify) -> void {
|
|
|
|
auto result = ROL<Size>(read<Size>(modify), shift);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionROL(DataRegister shift, DataRegister modify) -> void {
|
|
|
|
auto count = read<Long>(shift) & 63;
|
|
|
|
auto result = ROL<Size>(read<Size>(modify), count);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionROL(EffectiveAddress modify) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = ROL<Word>(read<Word, Hold>(modify), 1);
|
2016-07-25 13:15:54 +00:00
|
|
|
write<Word>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::ROR(uint32 result, uint shift) -> uint32 {
|
|
|
|
bool carry = false;
|
|
|
|
for(auto _ : range(shift)) {
|
|
|
|
carry = result & lsb<Size>();
|
|
|
|
result >>= 1;
|
|
|
|
if(carry) result |= msb<Size>();
|
|
|
|
}
|
|
|
|
|
|
|
|
r.c = carry;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionROR(uint4 shift, DataRegister modify) -> void {
|
|
|
|
auto result = ROR<Size>(read<Size>(modify), shift);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionROR(DataRegister shift, DataRegister modify) -> void {
|
|
|
|
auto count = read<Long>(shift) & 63;
|
|
|
|
auto result = ROR<Size>(read<Size>(modify), count);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionROR(EffectiveAddress modify) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = ROR<Word>(read<Word, Hold>(modify), 1);
|
2016-07-25 13:15:54 +00:00
|
|
|
write<Word>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::ROXL(uint32 result, uint shift) -> uint32 {
|
|
|
|
bool carry = r.x;
|
|
|
|
for(auto _ : range(shift)) {
|
|
|
|
bool extend = carry;
|
|
|
|
carry = result & msb<Size>();
|
|
|
|
result = result << 1 | extend;
|
|
|
|
}
|
|
|
|
|
|
|
|
r.c = carry;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
r.x = r.c;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionROXL(uint4 shift, DataRegister modify) -> void {
|
|
|
|
auto result = ROXL<Size>(read<Size>(modify), shift);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionROXL(DataRegister shift, DataRegister modify) -> void {
|
|
|
|
auto count = read<Long>(shift) & 63;
|
|
|
|
auto result = ROXL<Size>(read<Size>(modify), count);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionROXL(EffectiveAddress modify) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = ROXL<Word>(read<Word, Hold>(modify), 1);
|
2016-07-25 13:15:54 +00:00
|
|
|
write<Word>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::ROXR(uint32 result, uint shift) -> uint32 {
|
|
|
|
bool carry = r.x;
|
|
|
|
for(auto _ : range(shift)) {
|
|
|
|
bool extend = carry;
|
|
|
|
carry = result & lsb<Size>();
|
|
|
|
result >>= 1;
|
|
|
|
if(extend) result |= msb<Size>();
|
|
|
|
}
|
|
|
|
|
|
|
|
r.c = carry;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Size>(result) == 0;
|
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
r.x = r.c;
|
|
|
|
|
|
|
|
return clip<Size>(result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionROXR(uint4 shift, DataRegister modify) -> void {
|
|
|
|
auto result = ROXR<Size>(read<Size>(modify), shift);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionROXR(DataRegister shift, DataRegister modify) -> void {
|
|
|
|
auto count = read<Long>(shift) & 63;
|
|
|
|
auto result = ROXR<Size>(read<Size>(modify), count);
|
|
|
|
write<Size>(modify, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionROXR(EffectiveAddress modify) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = ROXR<Word>(read<Word, Hold>(modify), 1);
|
2016-07-25 13:15:54 +00:00
|
|
|
write<Word>(modify, result);
|
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
auto M68K::instructionRTE() -> void {
|
|
|
|
if(!supervisor()) return;
|
|
|
|
|
|
|
|
auto sr = pop<Word>();
|
|
|
|
r.pc = pop<Long>();
|
|
|
|
writeSR(sr);
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionRTR() -> void {
|
|
|
|
writeCCR(pop<Word>());
|
|
|
|
r.pc = pop<Long>();
|
|
|
|
}
|
|
|
|
|
2016-07-22 12:03:25 +00:00
|
|
|
auto M68K::instructionRTS() -> void {
|
|
|
|
r.pc = pop<Long>();
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
auto M68K::instructionSBCD(EffectiveAddress with, EffectiveAddress from) -> void {
|
|
|
|
auto source = read<Byte>(from);
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Byte, Hold>(with);
|
2016-08-17 22:04:50 +00:00
|
|
|
auto result = target - source - r.x;
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
bool c = false;
|
2016-08-10 22:02:02 +00:00
|
|
|
bool v = false;
|
|
|
|
|
|
|
|
const bool adjustLo = (target ^ source ^ result) & 0x10;
|
|
|
|
const bool adjustHi = result & 0x100;
|
|
|
|
|
|
|
|
if(adjustLo) {
|
|
|
|
auto previous = result;
|
|
|
|
result -= 0x06;
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
c = (~previous & 0x80) & ( result & 0x80);
|
|
|
|
v |= ( previous & 0x80) & (~result & 0x80);
|
2016-08-10 22:02:02 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
if(adjustHi) {
|
|
|
|
auto previous = result;
|
|
|
|
result -= 0x60;
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
c = true;
|
2016-08-10 22:02:02 +00:00
|
|
|
v |= (previous & 0x80) & (~result & 0x80);
|
|
|
|
}
|
|
|
|
|
|
|
|
write<Byte>(with, result);
|
|
|
|
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
r.c = c;
|
2016-08-10 22:02:02 +00:00
|
|
|
r.v = v;
|
2016-08-17 12:31:22 +00:00
|
|
|
r.z = clip<Byte>(result) ? 0 : r.z;
|
2016-08-10 22:02:02 +00:00
|
|
|
r.n = sign<Byte>(result) < 0;
|
Update to v103r03 release.
byuu says:
Changelog:
- md/psg: fixed output frequency rate regression from v103r02
- processor/m68k: fixed calculations for ABCD, NBCD, SBCD [hex\_usr,
SuperMikeMan]
- processor/spc700: renamed abbreviated instructions to functional
descriptions (eg `XCN` → `ExchangeNibble`)
- processor/spc700: removed memory.cpp shorthand functions (fetch,
load, store, pull, push)
- processor/spc700: updated all instructions to follow cycle behavior
as documented by Overload with a logic analyzer
Once again, the changes to the SPC700 core are really quite massive. And
this time it's not just cosmetic: the idle cycles have been updated to
pull from various memory addresses. This is why I removed the shorthand
functions -- so that I could handle the at-times very bizarre addresses
the SPC700 has on its address bus during its idle cycles.
There is one behavior Overload mentioned that I don't emulate ... one of
the cycles of the (X) transfer functions seems to not actually access
the $f0-ff internal SMP registers? I don't fully understand what
Overload is getting at, so I haven't tried to support it just yet.
Also, there are limits to logic analyzers. In many cases the same
address is read from twice consecutively. It is unclear which of the two
reads the SPC700 actually utilizes. I tried to choose the most logical
values (usually the first one), but ... I don't know that we'll be able
to figure this one out. It's going to be virtually impossible to test
this through software, because the PC can't really execute out of
registers that have side effects on reads.
2017-06-28 07:24:46 +00:00
|
|
|
r.x = r.c;
|
2016-08-10 22:02:02 +00:00
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
auto M68K::instructionSCC(uint4 condition, EffectiveAddress to) -> void {
|
|
|
|
uint8 result = testCondition(condition) ? ~0 : 0;
|
|
|
|
write<Byte>(to, result);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
auto M68K::instructionSTOP() -> void {
|
|
|
|
if(!supervisor()) return;
|
|
|
|
|
|
|
|
auto sr = readPC<Word>();
|
|
|
|
writeSR(sr);
|
|
|
|
r.stop = true;
|
|
|
|
}
|
|
|
|
|
2017-10-05 06:13:03 +00:00
|
|
|
template<uint Size, bool extend> auto M68K::SUB(uint32 source, uint32 target) -> uint32 {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto result = (uint64)target - source;
|
2017-10-05 06:13:03 +00:00
|
|
|
if(extend) result -= r.x;
|
2016-07-25 13:15:54 +00:00
|
|
|
|
|
|
|
r.c = sign<Size>(result >> 1) < 0;
|
|
|
|
r.v = sign<Size>((target ^ source) & (target ^ result)) < 0;
|
2017-10-05 06:13:03 +00:00
|
|
|
r.z = clip<Size>(result) ? 0 : (extend ? r.z : 1);
|
2016-07-25 13:15:54 +00:00
|
|
|
r.n = sign<Size>(result) < 0;
|
|
|
|
r.x = r.c;
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionSUB(EffectiveAddress source_, DataRegister target_) -> void {
|
|
|
|
auto source = read<Size>(source_);
|
|
|
|
auto target = read<Size>(target_);
|
|
|
|
auto result = SUB<Size>(source, target);
|
|
|
|
write<Size>(target_, result);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionSUB(DataRegister source_, EffectiveAddress target_) -> void {
|
|
|
|
auto source = read<Size>(source_);
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(target_);
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
auto result = SUB<Size>(source, target);
|
|
|
|
write<Size>(target_, result);
|
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
template<uint Size> auto M68K::instructionSUBA(AddressRegister to, EffectiveAddress from) -> void {
|
2016-08-17 12:31:22 +00:00
|
|
|
auto source = sign<Size>(read<Size>(from));
|
2016-08-21 22:11:24 +00:00
|
|
|
auto target = read<Long>(to);
|
2016-08-08 10:12:03 +00:00
|
|
|
write<Long>(to, target - source);
|
|
|
|
}
|
|
|
|
|
|
|
|
template<uint Size> auto M68K::instructionSUBI(EffectiveAddress with) -> void {
|
|
|
|
auto source = readPC<Size>();
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(with);
|
2016-08-08 10:12:03 +00:00
|
|
|
auto result = SUB<Size>(source, target);
|
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
2016-08-17 22:04:50 +00:00
|
|
|
template<uint Size> auto M68K::instructionSUBQ(uint4 immediate, EffectiveAddress with) -> void {
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
auto source = immediate;
|
2016-08-17 22:04:50 +00:00
|
|
|
auto target = read<Size, Hold>(with);
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
auto result = SUB<Size>(source, target);
|
2016-08-17 22:04:50 +00:00
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
2016-08-19 14:11:26 +00:00
|
|
|
//Size is ignored: always uses Long
|
2016-08-17 22:04:50 +00:00
|
|
|
template<uint Size> auto M68K::instructionSUBQ(uint4 immediate, AddressRegister with) -> void {
|
2016-08-19 14:11:26 +00:00
|
|
|
auto result = read<Long>(with) - immediate;
|
|
|
|
write<Long>(with, result);
|
2016-07-25 13:15:54 +00:00
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
template<uint Size> auto M68K::instructionSUBX(EffectiveAddress with, EffectiveAddress from) -> void {
|
|
|
|
auto source = read<Size>(from);
|
2016-08-17 12:31:22 +00:00
|
|
|
auto target = read<Size, Hold>(with);
|
2016-08-08 10:12:03 +00:00
|
|
|
auto result = SUB<Size, Extend>(source, target);
|
|
|
|
write<Size>(with, result);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
auto M68K::instructionSWAP(DataRegister with) -> void {
|
|
|
|
auto result = read<Long>(with);
|
|
|
|
result = result >> 16 | result << 16;
|
|
|
|
write<Long>(with, result);
|
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Long>(result) == 0;
|
|
|
|
r.n = sign<Long>(result) < 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionTAS(EffectiveAddress with) -> void {
|
Update to v103r24 release.
byuu says:
Changelog:
- gb/mbc6: mapper is now functional, but Net de Get has some text
corruption¹
- gb/mbc7: mapper is now functional²
- gb/cpu: HDMA syncs other components after each byte transfer now
- gb/ppu: LY,LX forced to zero when LCDC.d7 is lowered (eg disabled),
not when it's raised (eg enabled)
- gb/ppu: the LCD does not run at all when LCDC.d7 is clear³
- fixes graphical corruption between scene transitions in Legend
of Zelda - Oracle of Ages
- thanks to Cydrak, Shonumi, gekkio for their input on the cause
of this issue
- md/controller: renamed "Gamepad" to "Control Pad" per official
terminology
- md/controller: added "Fighting Pad" (6-button controller) emulation
[hex\_usr]
- processor/m68k: fixed TAS to set data.d7 when
EA.mode==DataRegisterDirect; fixes Asterix
- hiro/windows: removed carriage returns from mouse.cpp and
desktop.cpp
- ruby/audio/alsa: added device driver selection [SuperMikeMan]
- ruby/audio/ao: set format.matrix=nullptr to prevent a crash on some
systems [SuperMikeMan]
- ruby/video/cgl: rename term() to terminate() to fix a crash on macOS
[Sintendo]
¹: The observation that this mapper split $4000-7fff into two banks
came from MAME's implementation. But their implementation was quite
broken and incomplete, so I didn't actually use any of it. The
observation that this mapper split $a000-bfff into two banks came from
Tauwasser, and I did directly use that information, plus the knowledge
that $0400/$0800 are the RAM bank select registers.
The text corruption is due to a race condition with timing. The game is
transferring font letters via HDMA, but the game code ends up setting
the bank# with the font a bit too late after the HDMA has already
occurred. I'm not sure how to fix this ... as a whole, I assumed my Game
Boy timing was pretty good, but apparently it's not that good.
²: The entire design of this mapper comes from endrift's notes.
endrift gets full credit for higan being able to emulate this mapper.
Note that the accelerometer implementation is still not tested, and
probably won't work right until I tweak the sensitivity a lot.
³: So the fun part of this is ... it breaks the strict 60fps rate of
the Game Boy. This was always inevitable: certain timing conditions can
stretch frames, too. But this is pretty much an absolute deal breaker
for something like Vsync timing. This pretty much requires adaptive sync
to run well without audio stuttering during the transition.
There's currently one very important detail missing: when the LCD is
turned off, presumably the image on the screen fades to white. I do not
know how long this process takes, or how to really go about emulating
it. Right now as an incomplete patch, I'm simply leaving the last
displayed image on the screen until the LCD is turned on again. But I
will have to output white, as well as add code to break out of the
emulation loop periodically when the LCD is left off eg indefinitely, or
bad things would happen. I'll work something out and then implement.
Another detail is I'm not sure how long it takes for the LCD to start
rendering again once enabled. Right now, it's immediate. I've heard it's
as long as 1/60th of a second, but that really seems incredibly
excessive? I'd like to know at least a reasonably well-supported
estimate before I implement that.
2017-08-01 11:41:27 +00:00
|
|
|
uint32 data;
|
|
|
|
|
|
|
|
if(with.mode == DataRegisterDirect) {
|
|
|
|
data = read<Byte, Hold>(with);
|
|
|
|
write<Byte>(with, data | 0x80);
|
|
|
|
} else {
|
|
|
|
//Mega Drive models 1&2 have a bug that prevents TAS write from taking effect
|
|
|
|
//this bugged behavior is required for certain software to function correctly
|
|
|
|
data = read<Byte>(with);
|
|
|
|
step(4);
|
|
|
|
}
|
2016-08-10 22:02:02 +00:00
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
|
|
|
r.z = clip<Byte>(data) == 0;
|
|
|
|
r.n = sign<Byte>(data) < 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionTRAP(uint4 vector) -> void {
|
Update to v102r17 release.
byuu says:
Changelog:
- GBA: process audio at 2MHz instead of 32KHz¹
- MD: do not allow the 68K to stop the Z80, unless it has been granted
bus access first
- MD: do not reset bus requested/granted signals when the 68K resets
the Z80
- the above two fix The Lost Vikings
- MD: clean up the bus address decoding to be more readable
- MD: add support for a13000-a130ff (#TIME) region; pass to cartridge
I/O²
- MD: emulate SRAM mapping used by >16mbit games; bank mapping used
by >32mbit games³
- MD: add 'reset pending' flag so that loading save states won't
reload 68K PC, SP registers
- this fixes save state support ... mostly⁴
- MD: if DMA is not enabled, do not allow CD5 to be set [Cydrak]
- this fixes in-game graphics for Ristar. Title screen still
corrupted on first run
- MD: detect and break sprite lists that form an infinite loop
[Cydrak]
- this fixes the emulator from dead-locking on certain games
- MD: add DC offset to sign DAC PCM samples [Cydrak]
- this improves audio in Sonic 3
- MD: 68K TAS has a hardware bug that prevents writing the result back
to RAM
- this fixes Gargoyles
- MD: 68K TRAP should not change CPU interrupt level
- this fixes Shining Force II, Shining in the Darkness, etc
- icarus: better SRAM heuristics for Mega Drive games
Todo:
- need to serialize the new cartridge ramEnable, ramWritable, bank
variables
¹: so technically, the GBA has its FIFO queue (raw PCM), plus a GB
chipset. The GB audio runs at 2MHz. However, I was being lazy and
running the sequencer 64 times in a row, thus decimating the audio to
32KHz. But simply discarding 63 out of every 64 samples resorts in
muddier sound with more static in it.
However ... increasing the audio thread processing intensity 64-fold,
and requiring heavy-duty three-chain lowpass and highpass filters is not
cheap. For this bump in sound quality, we're eating a loss of about 30%
of previous performance.
Also note that the GB audio emulation in the GBA core still lacks many
of the improvements made to the GB core. I was hoping to complete the GB
enhancements, but it seems like I'm never going to pass blargg's
psychotic edge case tests. So, first I want to clean up the GB audio to
my current coding standards, and then I'll port that over to the GBA,
which should further increase sound quality. At that point, it sound
exceed mGBA's audio quality (due to the ridiculously high sampling rate
and strong-attenuation audio filtering.)
²: word writes are probably not handled correctly ... but games are
only supposed to do byte writes here.
³: the SRAM mapping is used by games like "Story of Thor" and
"Phantasy Star IV." Unfortunately, the former wasn't released in the US
and is region protected. So you'll need to change the NTSU to NTSCJ in
md/system/system.cpp in order to boot it. But it does work nicely now.
The write protection bit is cleared in the game, and then it fails to
write to SRAM (soooooooo many games with SRAM write protection do this),
so for now I've had to disable checking that bit. Phantasy Star IV has a
US release, but sadly the game doesn't boot yet. Hitting some other bug.
The bank mapping is pretty much just for the 40mbit Super Street Fighter
game. It shows the Sega and Capcom logos now, but is hitting yet another
bug and deadlocking.
For now, I emulate the SRAM/bank mapping registers on all cartridges,
and set sane defaults. So long as games don't write to $a130XX, they
should all continue to work. But obviously, we need to get to a point
where higan/icarus can selectively enable these registers on a per-game
basis.
⁴: so, the Mega Drive has various ways to lock a chip until another
chip releases it. The VDP can lock the 68K, the 68K can lock the Z80,
etc. If this happens when you save a state, it'll dead-lock the
emulator. So that's obviously a problem that needs to be fixed. The fix
will be nasty ... basically, bypassing the dead-lock, creating a
miniature, one-instruction-long race condition. Extremely unlikely to
cause any issues in practice (it's only a little worse than the SNES
CPU/SMP desync), but ... there's nothing I can do about it. So you'll
have to take it or leave it. But yeah, for now, save states may lock up
the emulator. I need to add code to break the loops when in the process
of creating a save state still.
2017-03-10 10:23:29 +00:00
|
|
|
exception(Exception::Trap, 32 + vector, r.i);
|
2016-08-10 22:02:02 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionTRAPV() -> void {
|
|
|
|
if(r.v) exception(Exception::Overflow, Vector::Overflow);
|
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
template<uint Size> auto M68K::instructionTST(EffectiveAddress ea) -> void {
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
auto data = read<Size>(ea);
|
2016-07-12 10:19:31 +00:00
|
|
|
|
|
|
|
r.c = 0;
|
|
|
|
r.v = 0;
|
2016-08-10 22:02:02 +00:00
|
|
|
r.z = clip<Size>(data) == 0;
|
|
|
|
r.n = sign<Size>(data) < 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
auto M68K::instructionUNLK(AddressRegister with) -> void {
|
|
|
|
auto sp = AddressRegister{7};
|
|
|
|
write<Long>(sp, read<Long>(with));
|
|
|
|
write<Long>(with, pop<Long>());
|
2016-07-12 10:19:31 +00:00
|
|
|
}
|