2016-07-12 10:19:31 +00:00
|
|
|
auto M68K::instruction() -> void {
|
|
|
|
instructionsExecuted++;
|
2016-07-23 02:32:35 +00:00
|
|
|
|
2016-08-12 23:47:30 +00:00
|
|
|
#if 0
|
|
|
|
if(instructionsExecuted >= 10000010) while(true) step(1);
|
|
|
|
if(instructionsExecuted >= 10000000) {
|
|
|
|
print(disassembleRegisters(), "\n");
|
|
|
|
print(disassemble(r.pc), "\n");
|
|
|
|
print("\n");
|
|
|
|
}
|
|
|
|
#endif
|
2016-07-12 10:19:31 +00:00
|
|
|
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
opcode = readPC();
|
2016-07-12 22:47:04 +00:00
|
|
|
return instructionTable[opcode]();
|
|
|
|
}
|
2016-07-12 10:19:31 +00:00
|
|
|
|
2016-07-12 22:47:04 +00:00
|
|
|
M68K::M68K() {
|
2016-07-22 12:03:25 +00:00
|
|
|
#define bind(id, name, ...) { \
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
assert(!instructionTable[id]); \
|
|
|
|
instructionTable[id] = [=] { return instruction##name(__VA_ARGS__); }; \
|
2016-07-22 12:03:25 +00:00
|
|
|
disassembleTable[id] = [=] { return disassemble##name(__VA_ARGS__); }; \
|
|
|
|
}
|
|
|
|
|
|
|
|
#define unbind(id) { \
|
|
|
|
instructionTable[id].reset(); \
|
|
|
|
disassembleTable[id].reset(); \
|
|
|
|
}
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
|
|
|
|
#define pattern(s) \
|
|
|
|
std::integral_constant<uint16_t, bit::test(s)>::value
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//ABCD
|
|
|
|
for(uint3 xreg : range(8))
|
|
|
|
for(uint3 yreg : range(8)) {
|
|
|
|
auto opcode = pattern("1100 ---1 0000 ----") | xreg << 9 | yreg << 0;
|
|
|
|
|
|
|
|
EffectiveAddress dataWith{DataRegisterDirect, xreg};
|
|
|
|
EffectiveAddress dataFrom{DataRegisterDirect, yreg};
|
|
|
|
bind(opcode | 0 << 3, ABCD, dataWith, dataFrom);
|
|
|
|
|
|
|
|
EffectiveAddress addressWith{AddressRegisterIndirectWithPreDecrement, xreg};
|
|
|
|
EffectiveAddress addressFrom{AddressRegisterIndirectWithPreDecrement, yreg};
|
|
|
|
bind(opcode | 1 << 3, ABCD, addressWith, addressFrom);
|
|
|
|
}
|
|
|
|
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
//ADD
|
2016-08-08 10:12:03 +00:00
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1101 ---0 ++-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 7 && reg >= 5) continue;
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
EffectiveAddress from{mode, reg};
|
|
|
|
DataRegister with{dreg};
|
|
|
|
bind(opcode | 0 << 6, ADD<Byte>, from, with);
|
|
|
|
bind(opcode | 1 << 6, ADD<Word>, from, with);
|
|
|
|
bind(opcode | 2 << 6, ADD<Long>, from, with);
|
|
|
|
|
|
|
|
if(mode == 1) unbind(opcode | 0 << 6);
|
|
|
|
}
|
2016-07-22 12:03:25 +00:00
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
//ADD
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1101 ---1 ++-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
DataRegister from{dreg};
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, ADD<Byte>, from, with);
|
|
|
|
bind(opcode | 1 << 6, ADD<Word>, from, with);
|
|
|
|
bind(opcode | 2 << 6, ADD<Long>, from, with);
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
}
|
|
|
|
|
2016-07-26 10:46:43 +00:00
|
|
|
//ADDA
|
|
|
|
for(uint3 areg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1101 ---+ 11-- ----") | areg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 7 && reg >= 5) continue;
|
|
|
|
|
|
|
|
AddressRegister ar{areg};
|
|
|
|
EffectiveAddress ea{mode, reg};
|
|
|
|
bind(opcode | 0 << 8, ADDA<Word>, ar, ea);
|
|
|
|
bind(opcode | 1 << 8, ADDA<Long>, ar, ea);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ADDI
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 0110 ++-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress modify{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, ADDI<Byte>, modify);
|
|
|
|
bind(opcode | 1 << 6, ADDI<Word>, modify);
|
|
|
|
bind(opcode | 2 << 6, ADDI<Long>, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ADDQ
|
|
|
|
for(uint3 data : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0101 ---0 ++-- ----") | data << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 7 && reg >= 2) continue;
|
|
|
|
|
|
|
|
uint4 immediate = data ? (uint4)data : (uint4)8;
|
2016-08-17 22:04:50 +00:00
|
|
|
if(mode != 1) {
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, ADDQ<Byte>, immediate, with);
|
|
|
|
bind(opcode | 1 << 6, ADDQ<Word>, immediate, with);
|
|
|
|
bind(opcode | 2 << 6, ADDQ<Long>, immediate, with);
|
|
|
|
} else {
|
|
|
|
AddressRegister with{reg};
|
|
|
|
bind(opcode | 1 << 6, ADDQ<Word>, immediate, with);
|
|
|
|
bind(opcode | 2 << 6, ADDQ<Long>, immediate, with);
|
|
|
|
}
|
2016-07-26 10:46:43 +00:00
|
|
|
}
|
|
|
|
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
//ADDX
|
2016-08-17 12:31:22 +00:00
|
|
|
for(uint3 xreg : range(8))
|
|
|
|
for(uint3 yreg : range(8)) {
|
|
|
|
auto opcode = pattern("1101 ---1 ++00 ----") | xreg << 9 | yreg << 0;
|
|
|
|
|
|
|
|
EffectiveAddress dataWith{DataRegisterDirect, xreg};
|
|
|
|
EffectiveAddress dataFrom{DataRegisterDirect, yreg};
|
|
|
|
bind(opcode | 0 << 6 | 0 << 3, ADDX<Byte>, dataWith, dataFrom);
|
|
|
|
bind(opcode | 1 << 6 | 0 << 3, ADDX<Word>, dataWith, dataFrom);
|
|
|
|
bind(opcode | 2 << 6 | 0 << 3, ADDX<Long>, dataWith, dataFrom);
|
|
|
|
|
|
|
|
EffectiveAddress addressWith{AddressRegisterIndirectWithPreDecrement, xreg};
|
|
|
|
EffectiveAddress addressFrom{AddressRegisterIndirectWithPreDecrement, yreg};
|
|
|
|
bind(opcode | 0 << 6 | 1 << 3, ADDX<Byte>, addressWith, addressFrom);
|
|
|
|
bind(opcode | 1 << 6 | 1 << 3, ADDX<Word>, addressWith, addressFrom);
|
|
|
|
bind(opcode | 2 << 6 | 1 << 3, ADDX<Long>, addressWith, addressFrom);
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
//AND
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1100 ---0 ++-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 5)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress from{mode, reg};
|
|
|
|
DataRegister with{dreg};
|
|
|
|
bind(opcode | 0 << 6, AND<Byte>, from, with);
|
|
|
|
bind(opcode | 1 << 6, AND<Word>, from, with);
|
|
|
|
bind(opcode | 2 << 6, AND<Long>, from, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//AND
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1100 ---1 ++-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
DataRegister from{dreg};
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, AND<Byte>, from, with);
|
|
|
|
bind(opcode | 1 << 6, AND<Word>, from, with);
|
|
|
|
bind(opcode | 2 << 6, AND<Long>, from, with);
|
|
|
|
}
|
|
|
|
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
//ANDI
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 0010 ++-- ----") | mode << 3 | reg << 0;
|
2016-07-22 12:03:25 +00:00
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
Update to v100r08 release.
byuu says:
Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!
For building the table, I've decided to move from:
for(uint opcode : range(65536)) {
if(match(...)) bind(opNAME, ...);
}
To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.
And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.
This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.
When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.
The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-17 22:11:29 +00:00
|
|
|
|
2016-08-17 12:31:22 +00:00
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, ANDI<Byte>, with);
|
|
|
|
bind(opcode | 1 << 6, ANDI<Word>, with);
|
|
|
|
bind(opcode | 2 << 6, ANDI<Long>, with);
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
}
|
2016-07-12 22:47:04 +00:00
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
//ANDI_TO_CCR
|
|
|
|
{ auto opcode = pattern("0000 0010 0011 1100");
|
|
|
|
|
|
|
|
bind(opcode, ANDI_TO_CCR);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ANDI_TO_SR
|
|
|
|
{ auto opcode = pattern("0000 0010 0111 1100");
|
|
|
|
|
|
|
|
bind(opcode, ANDI_TO_SR);
|
|
|
|
}
|
|
|
|
|
2016-07-25 13:15:54 +00:00
|
|
|
//ASL (immediate)
|
|
|
|
for(uint3 count : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---1 ++00 0---") | count << 9 | dreg << 0;
|
|
|
|
|
|
|
|
auto shift = count ? (uint4)count : (uint4)8;
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ASL<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ASL<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ASL<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ASL (register)
|
|
|
|
for(uint3 sreg : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---1 ++10 0---") | sreg << 9 | dreg << 0;
|
|
|
|
|
|
|
|
DataRegister shift{sreg};
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ASL<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ASL<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ASL<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ASL (effective address)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 0001 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress modify{mode, reg};
|
|
|
|
bind(opcode, ASL, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ASR (immediate)
|
|
|
|
for(uint3 count : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---0 ++00 0---") | count << 9 | dreg << 0;
|
|
|
|
|
|
|
|
auto shift = count ? (uint4)count : (uint4)8;
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ASR<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ASR<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ASR<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ASR (register)
|
|
|
|
for(uint3 sreg : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---0 ++10 0---") | sreg << 9 | dreg << 0;
|
|
|
|
|
|
|
|
DataRegister shift{sreg};
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ASR<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ASR<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ASR<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ASR (effective address)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 0000 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress modify{mode, reg};
|
|
|
|
bind(opcode, ASR, modify);
|
|
|
|
}
|
|
|
|
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
//BCC
|
|
|
|
for(uint4 condition : range( 16))
|
|
|
|
for(uint8 displacement : range(256)) {
|
|
|
|
auto opcode = pattern("0110 ---- ---- ----") | condition << 8 | displacement << 0;
|
2016-07-12 22:47:04 +00:00
|
|
|
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
bind(opcode, BCC, condition, displacement);
|
|
|
|
}
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
//BCHG (register)
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 ---1 01-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
DataRegister bit{dreg};
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
if(mode == 0) bind(opcode, BCHG<Long>, bit, with);
|
|
|
|
if(mode != 0) bind(opcode, BCHG<Byte>, bit, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//BCHG (immediate)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 1000 01-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
if(mode == 0) bind(opcode, BCHG<Long>, with);
|
|
|
|
if(mode != 0) bind(opcode, BCHG<Byte>, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//BCLR (register)
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 ---1 10-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
DataRegister bit{dreg};
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
if(mode == 0) bind(opcode, BCLR<Long>, bit, with);
|
|
|
|
if(mode != 0) bind(opcode, BCLR<Byte>, bit, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//BCLR (immediate)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 1000 10-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
if(mode == 0) bind(opcode, BCLR<Long>, with);
|
|
|
|
if(mode != 0) bind(opcode, BCLR<Byte>, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//BSET (register)
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 ---1 11-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
DataRegister bit{dreg};
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
if(mode == 0) bind(opcode, BSET<Long>, bit, with);
|
|
|
|
if(mode != 0) bind(opcode, BSET<Byte>, bit, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//BSET (immediate)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 1000 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
if(mode == 0) bind(opcode, BSET<Long>, with);
|
|
|
|
if(mode != 0) bind(opcode, BSET<Byte>, with);
|
|
|
|
}
|
|
|
|
|
2016-07-22 12:03:25 +00:00
|
|
|
//BTST (register)
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 ---1 00-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
2016-07-25 13:15:54 +00:00
|
|
|
if(mode == 1 || (mode == 7 && reg >= 5)) continue;
|
2016-07-22 12:03:25 +00:00
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
DataRegister dr{dreg};
|
|
|
|
EffectiveAddress ea{mode, reg};
|
|
|
|
if(mode == 0) bind(opcode, BTST<Long>, dr, ea);
|
|
|
|
if(mode != 0) bind(opcode, BTST<Byte>, dr, ea);
|
2016-07-22 12:03:25 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
//BTST (immediate)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 1000 00-- ----") | mode << 3 | reg << 0;
|
2016-08-09 11:07:18 +00:00
|
|
|
if(mode == 1 || (mode == 7 && reg >= 4)) continue;
|
2016-07-22 12:03:25 +00:00
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
EffectiveAddress ea{mode, reg};
|
2016-07-22 12:03:25 +00:00
|
|
|
if(mode == 0) bind(opcode, BTST<Long>, ea);
|
|
|
|
if(mode != 0) bind(opcode, BTST<Byte>, ea);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//CHK
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 ---1 10-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 5)) continue;
|
|
|
|
|
|
|
|
DataRegister compare{dreg};
|
|
|
|
EffectiveAddress maximum{mode, reg};
|
|
|
|
bind(opcode, CHK, compare, maximum);
|
|
|
|
}
|
|
|
|
|
2016-07-22 12:03:25 +00:00
|
|
|
//CLR
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 0010 ++-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
EffectiveAddress ea{mode, reg};
|
2016-07-22 12:03:25 +00:00
|
|
|
bind(opcode | 0 << 6, CLR<Byte>, ea);
|
|
|
|
bind(opcode | 1 << 6, CLR<Word>, ea);
|
|
|
|
bind(opcode | 2 << 6, CLR<Long>, ea);
|
|
|
|
}
|
|
|
|
|
|
|
|
//CMP
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1011 ---0 ++-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
2016-07-25 13:15:54 +00:00
|
|
|
if(mode == 7 && reg >= 5) continue;
|
2016-07-22 12:03:25 +00:00
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
DataRegister dr{dreg};
|
|
|
|
EffectiveAddress ea{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, CMP<Byte>, dr, ea);
|
|
|
|
bind(opcode | 1 << 6, CMP<Word>, dr, ea);
|
|
|
|
bind(opcode | 2 << 6, CMP<Long>, dr, ea);
|
2016-07-22 12:03:25 +00:00
|
|
|
|
|
|
|
if(mode == 1) unbind(opcode | 0 << 6);
|
|
|
|
}
|
|
|
|
|
2016-07-26 10:46:43 +00:00
|
|
|
//CMPA
|
|
|
|
for(uint3 areg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1011 ---+ 11-- ----") | areg << 9 | mode << 3 | reg << 0;
|
|
|
|
|
|
|
|
AddressRegister ar{areg};
|
|
|
|
EffectiveAddress ea{mode, reg};
|
|
|
|
bind(opcode | 0 << 8, CMPA<Word>, ar, ea);
|
|
|
|
bind(opcode | 1 << 8, CMPA<Long>, ar, ea);
|
|
|
|
}
|
|
|
|
|
|
|
|
//CMPI
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 1100 ++-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress ea{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, CMPI<Byte>, ea);
|
|
|
|
bind(opcode | 1 << 6, CMPI<Word>, ea);
|
|
|
|
bind(opcode | 2 << 6, CMPI<Long>, ea);
|
|
|
|
}
|
|
|
|
|
|
|
|
//CMPM
|
|
|
|
for(uint3 xreg : range(8))
|
|
|
|
for(uint3 yreg : range(8)) {
|
|
|
|
auto opcode = pattern("1011 ---1 ++00 1---") | xreg << 9 | yreg << 0;
|
|
|
|
|
|
|
|
EffectiveAddress ax{AddressRegisterIndirectWithPostIncrement, xreg};
|
|
|
|
EffectiveAddress ay{AddressRegisterIndirectWithPostIncrement, yreg};
|
|
|
|
bind(opcode | 0 << 6, CMPM<Byte>, ax, ay);
|
|
|
|
bind(opcode | 1 << 6, CMPM<Word>, ax, ay);
|
|
|
|
bind(opcode | 2 << 6, CMPM<Long>, ax, ay);
|
|
|
|
}
|
|
|
|
|
2016-07-22 12:03:25 +00:00
|
|
|
//DBCC
|
|
|
|
for(uint4 condition : range(16))
|
|
|
|
for(uint3 dreg : range( 8)) {
|
|
|
|
auto opcode = pattern("0101 ---- 1100 1---") | condition << 8 | dreg << 0;
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
DataRegister dr{dreg};
|
|
|
|
bind(opcode, DBCC, condition, dr);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//DIVS
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1000 ---1 11-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 5)) continue;
|
|
|
|
|
|
|
|
DataRegister with{dreg};
|
|
|
|
EffectiveAddress from{mode, reg};
|
|
|
|
bind(opcode, DIVS, with, from);
|
|
|
|
}
|
|
|
|
|
|
|
|
//DIVU
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1000 ---0 11-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 5)) continue;
|
|
|
|
|
|
|
|
DataRegister with{dreg};
|
|
|
|
EffectiveAddress from{mode, reg};
|
|
|
|
bind(opcode, DIVU, with, from);
|
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
//EOR
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1011 ---1 ++-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
DataRegister from{dreg};
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, EOR<Byte>, from, with);
|
|
|
|
bind(opcode | 1 << 6, EOR<Word>, from, with);
|
|
|
|
bind(opcode | 2 << 6, EOR<Long>, from, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//EORI
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 1010 ++-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, EORI<Byte>, with);
|
|
|
|
bind(opcode | 1 << 6, EORI<Word>, with);
|
|
|
|
bind(opcode | 2 << 6, EORI<Long>, with);
|
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
//EORI_TO_CCR
|
|
|
|
{ auto opcode = pattern("0000 1010 0011 1100");
|
|
|
|
|
|
|
|
bind(opcode, EORI_TO_CCR);
|
|
|
|
}
|
|
|
|
|
|
|
|
//EORI_TO_SR
|
|
|
|
{ auto opcode = pattern("0000 1010 0111 1100");
|
|
|
|
|
|
|
|
bind(opcode, EORI_TO_SR);
|
2016-07-22 12:03:25 +00:00
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//EXG
|
|
|
|
for(uint3 xreg : range(8))
|
|
|
|
for(uint3 yreg : range(8)) {
|
|
|
|
auto opcode = pattern("1100 ---1 0100 0---") | xreg << 9 | yreg << 0;
|
|
|
|
|
|
|
|
DataRegister x{xreg};
|
|
|
|
DataRegister y{yreg};
|
|
|
|
bind(opcode, EXG, x, y);
|
|
|
|
}
|
|
|
|
|
|
|
|
//EXG
|
|
|
|
for(uint3 xreg : range(8))
|
|
|
|
for(uint3 yreg : range(8)) {
|
|
|
|
auto opcode = pattern("1100 ---1 0100 1---") | xreg << 9 | yreg << 0;
|
|
|
|
|
|
|
|
AddressRegister x{xreg};
|
|
|
|
AddressRegister y{yreg};
|
|
|
|
bind(opcode, EXG, x, y);
|
|
|
|
}
|
|
|
|
|
|
|
|
//EXG
|
|
|
|
for(uint3 xreg : range(8))
|
|
|
|
for(uint3 yreg : range(8)) {
|
|
|
|
auto opcode = pattern("1100 ---1 1000 1---") | xreg << 9 | yreg << 0;
|
|
|
|
|
|
|
|
DataRegister x{xreg};
|
|
|
|
AddressRegister y{yreg};
|
|
|
|
bind(opcode, EXG, x, y);
|
|
|
|
}
|
|
|
|
|
|
|
|
//EXT
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1000 1+00 0---") | dreg << 0;
|
|
|
|
|
|
|
|
DataRegister with{dreg};
|
|
|
|
bind(opcode | 0 << 6, EXT<Word>, with);
|
|
|
|
bind(opcode | 1 << 6, EXT<Long>, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ILLEGAL
|
|
|
|
{ auto opcode = pattern("0100 1010 1111 1100");
|
|
|
|
|
|
|
|
bind(opcode, ILLEGAL);
|
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
//JMP
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1110 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || mode == 3 || mode == 4 || (mode == 7 && reg >= 4)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress target{mode, reg};
|
|
|
|
bind(opcode, JMP, target);
|
|
|
|
}
|
|
|
|
|
2016-07-26 10:46:43 +00:00
|
|
|
//JSR
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1110 10-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || mode == 3 || mode == 4 || (mode == 7 && reg >= 4)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress target{mode, reg};
|
|
|
|
bind(opcode, JSR, target);
|
|
|
|
}
|
|
|
|
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
//LEA
|
|
|
|
for(uint3 areg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 ---1 11-- ----") | areg << 9 | mode << 3 | reg << 0;
|
2016-07-25 13:15:54 +00:00
|
|
|
if(mode <= 1 || mode == 3 || mode == 4 || (mode == 7 && reg >= 4)) continue;
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
AddressRegister ar{areg};
|
|
|
|
EffectiveAddress ea{mode, reg};
|
|
|
|
bind(opcode, LEA, ar, ea);
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
}
|
2016-07-12 10:19:31 +00:00
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//LINK
|
|
|
|
for(uint3 areg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1110 0101 0---") | areg << 0;
|
|
|
|
|
|
|
|
AddressRegister with{areg};
|
|
|
|
bind(opcode, LINK, with);
|
|
|
|
}
|
|
|
|
|
2016-07-25 13:15:54 +00:00
|
|
|
//LSL (immediate)
|
|
|
|
for(uint3 data : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---1 ++00 1---") | data << 9 | dreg << 0;
|
|
|
|
|
|
|
|
auto immediate = data ? (uint4)data : (uint4)8;
|
|
|
|
DataRegister dr{dreg};
|
|
|
|
bind(opcode | 0 << 6, LSL<Byte>, immediate, dr);
|
|
|
|
bind(opcode | 1 << 6, LSL<Word>, immediate, dr);
|
|
|
|
bind(opcode | 2 << 6, LSL<Long>, immediate, dr);
|
|
|
|
}
|
|
|
|
|
|
|
|
//LSL (register)
|
|
|
|
for(uint3 sreg : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---1 ++10 1---") | sreg << 9 | dreg << 0;
|
|
|
|
|
|
|
|
DataRegister sr{sreg};
|
|
|
|
DataRegister dr{dreg};
|
|
|
|
bind(opcode | 0 << 6, LSL<Byte>, sr, dr);
|
|
|
|
bind(opcode | 1 << 6, LSL<Word>, sr, dr);
|
|
|
|
bind(opcode | 2 << 6, LSL<Long>, sr, dr);
|
|
|
|
}
|
|
|
|
|
|
|
|
//LSL (effective address)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 0011 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress ea{mode, reg};
|
|
|
|
bind(opcode, LSL, ea);
|
|
|
|
}
|
|
|
|
|
|
|
|
//LSR (immediate)
|
|
|
|
for(uint3 data : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---0 ++00 1---") | data << 9 | dreg << 0;
|
|
|
|
|
|
|
|
auto immediate = data ? (uint4)data : (uint4)8;
|
|
|
|
DataRegister dr{dreg};
|
|
|
|
bind(opcode | 0 << 6, LSR<Byte>, immediate, dr);
|
|
|
|
bind(opcode | 1 << 6, LSR<Word>, immediate, dr);
|
|
|
|
bind(opcode | 2 << 6, LSR<Long>, immediate, dr);
|
|
|
|
}
|
|
|
|
|
|
|
|
//LSR (register)
|
|
|
|
for(uint3 count : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---0 ++10 1---") | count << 9 | dreg << 0;
|
|
|
|
|
|
|
|
DataRegister shift{count};
|
|
|
|
DataRegister dr{dreg};
|
|
|
|
bind(opcode | 0 << 6, LSR<Byte>, shift, dr);
|
|
|
|
bind(opcode | 1 << 6, LSR<Word>, shift, dr);
|
|
|
|
bind(opcode | 2 << 6, LSR<Long>, shift, dr);
|
|
|
|
}
|
|
|
|
|
|
|
|
//LSR (effective address)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 0010 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress ea{mode, reg};
|
|
|
|
bind(opcode, LSR, ea);
|
|
|
|
}
|
|
|
|
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
//MOVE
|
|
|
|
for(uint3 toReg : range(8))
|
|
|
|
for(uint3 toMode : range(8))
|
|
|
|
for(uint3 fromMode : range(8))
|
|
|
|
for(uint3 fromReg : range(8)) {
|
|
|
|
auto opcode = pattern("00++ ---- ---- ----") | toReg << 9 | toMode << 6 | fromMode << 3 | fromReg << 0;
|
2016-07-22 12:03:25 +00:00
|
|
|
if(toMode == 1 || (toMode == 7 && toReg >= 2)) continue;
|
2016-07-25 13:15:54 +00:00
|
|
|
if(fromMode == 7 && fromReg >= 5) continue;
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
EffectiveAddress to{toMode, toReg};
|
|
|
|
EffectiveAddress from{fromMode, fromReg};
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
bind(opcode | 1 << 12, MOVE<Byte>, to, from);
|
|
|
|
bind(opcode | 3 << 12, MOVE<Word>, to, from);
|
|
|
|
bind(opcode | 2 << 12, MOVE<Long>, to, from);
|
2016-07-22 12:03:25 +00:00
|
|
|
|
|
|
|
if(fromMode == 1) unbind(opcode | 1 << 12);
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
//MOVEA
|
|
|
|
for(uint3 areg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("00++ ---0 01-- ----") | areg << 9 | mode << 3 | reg << 0;
|
2016-07-25 13:15:54 +00:00
|
|
|
if(mode == 7 && reg >= 5) continue;
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
AddressRegister ar{areg};
|
|
|
|
EffectiveAddress ea{mode, reg};
|
|
|
|
bind(opcode | 3 << 12, MOVEA<Word>, ar, ea);
|
|
|
|
bind(opcode | 2 << 12, MOVEA<Long>, ar, ea);
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
//MOVEM
|
2016-08-10 22:02:02 +00:00
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1000 1+-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || mode == 3 || (mode == 7 && reg >= 2)) continue;
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
EffectiveAddress to{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, MOVEM_TO_MEM<Word>, to);
|
|
|
|
bind(opcode | 1 << 6, MOVEM_TO_MEM<Long>, to);
|
|
|
|
}
|
|
|
|
|
|
|
|
//MOVEM
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1100 1+-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || mode == 4 || (mode == 7 && reg >= 4)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress from{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, MOVEM_TO_REG<Word>, from);
|
|
|
|
bind(opcode | 1 << 6, MOVEM_TO_REG<Long>, from);
|
|
|
|
}
|
|
|
|
|
|
|
|
//MOVEP
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 areg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 ---1 1+00 1---") | dreg << 9 | areg << 0;
|
|
|
|
|
|
|
|
DataRegister from{dreg};
|
|
|
|
EffectiveAddress to{AddressRegisterIndirectWithDisplacement, areg};
|
|
|
|
bind(opcode | 0 << 6, MOVEP<Word>, from, to);
|
|
|
|
bind(opcode | 1 << 6, MOVEP<Long>, from, to);
|
|
|
|
}
|
|
|
|
|
|
|
|
//MOVEP
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 areg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 ---1 0+00 1---") | dreg << 9 | areg << 0;
|
|
|
|
|
|
|
|
DataRegister to{dreg};
|
|
|
|
EffectiveAddress from{AddressRegisterIndirectWithDisplacement, areg};
|
|
|
|
bind(opcode | 0 << 6, MOVEP<Word>, from, to);
|
|
|
|
bind(opcode | 1 << 6, MOVEP<Long>, from, to);
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
//MOVEQ
|
|
|
|
for(uint3 dreg : range( 8))
|
|
|
|
for(uint8 immediate : range(256)) {
|
|
|
|
auto opcode = pattern("0111 ---0 ---- ----") | dreg << 9 | immediate << 0;
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
DataRegister dr{dreg};
|
|
|
|
bind(opcode, MOVEQ, dr, immediate);
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
}
|
|
|
|
|
2016-07-22 12:03:25 +00:00
|
|
|
//MOVE_FROM_SR
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 0000 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
EffectiveAddress ea{mode, reg};
|
2016-07-22 12:03:25 +00:00
|
|
|
bind(opcode, MOVE_FROM_SR, ea);
|
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
//MOVE_TO_CCR
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 0100 11-- ----") | mode << 3 | reg << 0;
|
2016-07-25 13:15:54 +00:00
|
|
|
if(mode == 1 || (mode == 7 && reg >= 5)) continue;
|
2016-07-23 02:32:35 +00:00
|
|
|
|
|
|
|
EffectiveAddress ea{mode, reg};
|
|
|
|
bind(opcode, MOVE_TO_CCR, ea);
|
|
|
|
}
|
|
|
|
|
2016-07-22 12:03:25 +00:00
|
|
|
//MOVE_TO_SR
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 0110 11-- ----") | mode << 3 | reg << 0;
|
2016-07-25 13:15:54 +00:00
|
|
|
if(mode == 1 || (mode == 7 && reg >= 5)) continue;
|
2016-07-22 12:03:25 +00:00
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
EffectiveAddress ea{mode, reg};
|
2016-07-22 12:03:25 +00:00
|
|
|
bind(opcode, MOVE_TO_SR, ea);
|
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
//MOVE_FROM_USP
|
|
|
|
for(uint3 areg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1110 0110 1---") | areg << 0;
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
AddressRegister to{areg};
|
|
|
|
bind(opcode, MOVE_FROM_USP, to);
|
|
|
|
}
|
|
|
|
|
|
|
|
//MOVE_TO_USP
|
|
|
|
for(uint3 areg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1110 0110 0---") | areg << 0;
|
|
|
|
|
|
|
|
AddressRegister from{areg};
|
|
|
|
bind(opcode, MOVE_TO_USP, from);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//MULS
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1100 ---1 11-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 5)) continue;
|
|
|
|
|
|
|
|
DataRegister with{dreg};
|
|
|
|
EffectiveAddress from{mode, reg};
|
|
|
|
bind(opcode, MULS, with, from);
|
|
|
|
}
|
|
|
|
|
|
|
|
//MULU
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1100 ---0 11-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 5)) continue;
|
|
|
|
|
|
|
|
DataRegister with{dreg};
|
|
|
|
EffectiveAddress from{mode, reg};
|
|
|
|
bind(opcode, MULU, with, from);
|
|
|
|
}
|
|
|
|
|
|
|
|
//NBCD
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1000 00-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode, NBCD, with);
|
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
//NEG
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 0100 ++-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, NEG<Byte>, with);
|
|
|
|
bind(opcode | 1 << 6, NEG<Word>, with);
|
|
|
|
bind(opcode | 2 << 6, NEG<Long>, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//NEGX
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 0000 ++-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, NEGX<Byte>, with);
|
|
|
|
bind(opcode | 1 << 6, NEGX<Word>, with);
|
|
|
|
bind(opcode | 2 << 6, NEGX<Long>, with);
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
//NOP
|
|
|
|
{ auto opcode = pattern("0100 1110 0111 0001");
|
|
|
|
|
|
|
|
bind(opcode, NOP);
|
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
//NOT
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 0110 ++-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, NOT<Byte>, with);
|
|
|
|
bind(opcode | 1 << 6, NOT<Word>, with);
|
|
|
|
bind(opcode | 2 << 6, NOT<Long>, with);
|
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
//OR
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1000 ---0 ++-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 5)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress from{mode, reg};
|
|
|
|
DataRegister with{dreg};
|
|
|
|
bind(opcode | 0 << 6, OR<Byte>, from, with);
|
|
|
|
bind(opcode | 1 << 6, OR<Word>, from, with);
|
|
|
|
bind(opcode | 2 << 6, OR<Long>, from, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//OR
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1000 ---1 ++-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
DataRegister from{dreg};
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, OR<Byte>, from, with);
|
|
|
|
bind(opcode | 1 << 6, OR<Word>, from, with);
|
|
|
|
bind(opcode | 2 << 6, OR<Long>, from, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ORI
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 0000 ++-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, ORI<Byte>, with);
|
|
|
|
bind(opcode | 1 << 6, ORI<Word>, with);
|
|
|
|
bind(opcode | 2 << 6, ORI<Long>, with);
|
|
|
|
}
|
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
//ORI_TO_CCR
|
|
|
|
{ auto opcode = pattern("0000 0000 0011 1100");
|
|
|
|
|
|
|
|
bind(opcode, ORI_TO_CCR);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ORI_TO_SR
|
|
|
|
{ auto opcode = pattern("0000 0000 0111 1100");
|
|
|
|
|
|
|
|
bind(opcode, ORI_TO_SR);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//PEA
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1000 01-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || mode == 3 || mode == 4 || (mode == 7 && reg >= 4)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress from{mode, reg};
|
|
|
|
bind(opcode, PEA, from);
|
|
|
|
}
|
|
|
|
|
|
|
|
//RESET
|
|
|
|
{ auto opcode = pattern("0100 1110 0111 0000");
|
|
|
|
|
|
|
|
bind(opcode, RESET);
|
|
|
|
}
|
|
|
|
|
2016-07-25 13:15:54 +00:00
|
|
|
//ROL (immediate)
|
|
|
|
for(uint3 count : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---1 ++01 1---") | count << 9 | dreg << 0;
|
|
|
|
|
|
|
|
auto shift = count ? (uint4)count : (uint4)8;
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ROL<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ROL<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ROL<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROL (register)
|
|
|
|
for(uint3 sreg : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---1 ++11 1---") | sreg << 9 | dreg << 0;
|
|
|
|
|
|
|
|
DataRegister shift{sreg};
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ROL<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ROL<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ROL<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROL (effective address)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 0111 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress modify{mode, reg};
|
|
|
|
bind(opcode, ROL, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROR (immediate)
|
|
|
|
for(uint3 count : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---0 ++01 1---") | count << 9 | dreg << 0;
|
|
|
|
|
|
|
|
auto shift = count ? (uint4)count : (uint4)8;
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ROR<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ROR<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ROR<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROR (register)
|
|
|
|
for(uint3 sreg : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---0 ++11 1---") | sreg << 9 | dreg << 0;
|
|
|
|
|
|
|
|
DataRegister shift{sreg};
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ROR<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ROR<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ROR<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROR (effective address)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 0110 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress modify{mode, reg};
|
|
|
|
bind(opcode, ROR, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROXL (immediate)
|
|
|
|
for(uint3 count : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---1 ++01 0---") | count << 9 | dreg << 0;
|
|
|
|
|
|
|
|
auto shift = count ? (uint4)count : (uint4)8;
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ROXL<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ROXL<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ROXL<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROXL (register)
|
|
|
|
for(uint3 sreg : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---1 ++11 0---") | sreg << 9 | dreg << 0;
|
|
|
|
|
|
|
|
DataRegister shift{sreg};
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ROXL<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ROXL<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ROXL<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROXL (effective address)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 0101 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress modify{mode, reg};
|
|
|
|
bind(opcode, ROXL, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROXR (immediate)
|
|
|
|
for(uint3 count : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---0 ++01 0---") | count << 9 | dreg << 0;
|
|
|
|
|
|
|
|
auto shift = count ? (uint4)count : (uint4)8;
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ROXR<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ROXR<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ROXR<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROXR (register)
|
|
|
|
for(uint3 sreg : range(8))
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 ---0 ++11 0---") | sreg << 9 | dreg << 0;
|
|
|
|
|
|
|
|
DataRegister shift{sreg};
|
|
|
|
DataRegister modify{dreg};
|
|
|
|
bind(opcode | 0 << 6, ROXR<Byte>, shift, modify);
|
|
|
|
bind(opcode | 1 << 6, ROXR<Word>, shift, modify);
|
|
|
|
bind(opcode | 2 << 6, ROXR<Long>, shift, modify);
|
|
|
|
}
|
|
|
|
|
|
|
|
//ROXR (effective address)
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1110 0100 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress modify{mode, reg};
|
|
|
|
bind(opcode, ROXR, modify);
|
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
//RTE
|
|
|
|
{ auto opcode = pattern("0100 1110 0111 0011");
|
|
|
|
|
|
|
|
bind(opcode, RTE);
|
|
|
|
}
|
|
|
|
|
|
|
|
//RTR
|
|
|
|
{ auto opcode = pattern("0100 1110 0111 0111");
|
|
|
|
|
|
|
|
bind(opcode, RTR);
|
|
|
|
}
|
|
|
|
|
2016-07-22 12:03:25 +00:00
|
|
|
//RTS
|
|
|
|
{ auto opcode = pattern("0100 1110 0111 0101");
|
|
|
|
|
|
|
|
bind(opcode, RTS);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//SBCD
|
|
|
|
for(uint3 xreg : range(8))
|
|
|
|
for(uint3 yreg : range(8)) {
|
|
|
|
auto opcode = pattern("1000 ---1 0000 ----") | xreg << 9 | yreg << 0;
|
|
|
|
|
|
|
|
EffectiveAddress dataWith{DataRegisterDirect, xreg};
|
|
|
|
EffectiveAddress dataFrom{DataRegisterDirect, yreg};
|
|
|
|
bind(opcode | 0 << 3, SBCD, dataWith, dataFrom);
|
|
|
|
|
|
|
|
EffectiveAddress addressWith{AddressRegisterIndirectWithPreDecrement, xreg};
|
|
|
|
EffectiveAddress addressFrom{AddressRegisterIndirectWithPreDecrement, yreg};
|
|
|
|
bind(opcode | 1 << 3, SBCD, addressWith, addressFrom);
|
|
|
|
}
|
|
|
|
|
2016-08-09 11:07:18 +00:00
|
|
|
//SCC
|
|
|
|
for(uint4 condition : range(16))
|
|
|
|
for(uint3 mode : range( 8))
|
|
|
|
for(uint3 reg : range( 8)) {
|
|
|
|
auto opcode = pattern("0101 ---- 11-- ----") | condition << 8 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress to{mode, reg};
|
|
|
|
bind(opcode, SCC, condition, to);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//STOP
|
|
|
|
{ auto opcode = pattern("0100 1110 0111 0010");
|
|
|
|
|
|
|
|
bind(opcode, STOP);
|
|
|
|
}
|
|
|
|
|
Update to v100r15 release.
byuu wrote:
Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece
Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.
Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.
[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:
Unfortunately, 64-bits just wasn't enough precision (we were
getting misalignments ~230 times a second on 21/24MHz clocks), so
I had to move to 128-bit counters. This of course doesn't exist on
32-bit architectures (and probably not on all 64-bit ones either),
so for now ... higan's only going to compile on 64-bit machines
until we figure something out. Maybe we offer a "lower precision"
fallback for machines that lack uint128_t or something. Using the
booth algorithm would be way too slow.
Anyway, the precision is now 2^-96, which is roughly 10^-29. That
puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
referring to it as the byuusecond. The other 32-bits of precision
allows a 1Hz clock to run up to one full second before all clocks
need to be normalized to prevent overflow.
I fixed a serious wobbling issue where I was using clock > other.clock
for synchronization instead of clock >= other.clock; and also another
aliasing issue when two threads share a common frequency, but don't
run in lock-step. The latter I don't even fully understand, but I
did observe it in testing.
nall/serialization.hpp has been extended to support 128-bit integers,
but without explicitly naming them (yay generic code), so nall will
still compile on 32-bit platforms for all other applications.
Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.
The "longer explanation" in the linked hastebin is:
Okay, so the idea is that we can have an arbitrary number of
oscillators. Take the SNES:
- CPU/PPU clock = 21477272.727272hz
- SMP/DSP clock = 24576000hz
- Cartridge DSP1 clock = 8000000hz
- Cartridge MSU1 clock = 44100hz
- Controller Port 1 modem controller clock = 57600hz
- Controller Port 2 barcode battler clock = 115200hz
- Expansion Port exercise bike clock = 192000hz
Is this a pathological case? Of course it is, but it's possible. The
first four do exist in the wild already: see Rockman X2 MSU1
patch. Manifest files with higan let you specify any frequency you
want for any component.
The old trick higan used was to hold an int64 counter for each
thread:thread synchronization, and adjust it like so:
- if thread A steps X clocks; then clock += X * threadB.frequency
- if clock >= 0; switch to threadB
- if thread B steps X clocks; then clock -= X * threadA.frequency
- if clock < 0; switch to threadA
But there are also system configurations where one processor has to
synchronize with more than one other processor. Take the Genesis:
- the 68K has to sync with the Z80 and PSG and YM2612 and VDP
- the Z80 has to sync with the 68K and PSG and YM2612
- the PSG has to sync with the 68K and Z80 and YM2612
Now I could do this by having an int64 clock value for every
association. But these clock values would have to be outside the
individual Thread class objects, and we would have to update every
relationship's clock value. So the 68K would have to update the Z80,
PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
per clock step event instead of one.
As such, we have to account for both possibilities. The only way to
do this is with a single time base. We do this like so:
- setup: scalar = timeBase / frequency
- step: clock += scalar * clocks
Once per second, we look at every thread, find the smallest clock
value. Then subtract that value from all threads. This prevents the
clock counters from overflowing.
Unfortunately, these oscillator values are psychotic, unpredictable,
and often times repeating fractions. Even with a timeBase of
1,000,000,000,000,000,000 (one attosecond); we get rounding errors
every ~16,300 synchronizations. Specifically, this happens with a CPU
running at 21477273hz (rounded) and SMP running at 24576000hz. That
may be good enough for most emulators, but ... you know how I am.
Plus, even at the attosecond level, we're really pushing against the
limits of 64-bit integers. Given the reciprocal inverse, a frequency
of 1Hz (which does exist in higan!) would have a scalar that consumes
1/18th of the entire range of a uint64 on every single step. Yes, I
could raise the frequency, and then step by that amount, I know. But
I don't want to have weird gotchas like that in the scheduler core.
Until I increase the accuracy to about 100 times greater than a
yoctosecond, the rounding errors are too great. And since the only
choice above 64-bit values is 128-bit values; we might as well use
all the extra headroom. 2^-96 as a timebase gives me the ability to
have both a 1Hz and 4GHz clock; and run them both for a full second;
before an overflow event would occur.
Another hastebin includes demonstration code:
#include <libco/libco.h>
#include <nall/nall.hpp>
using namespace nall;
//
cothread_t mainThread = nullptr;
const uint iterations = 100'000'000;
const uint cpuFreq = 21477272.727272 + 0.5;
const uint smpFreq = 24576000.000000 + 0.5;
const uint cpuStep = 4;
const uint smpStep = 5;
//
struct ThreadA {
cothread_t handle = nullptr;
uint64 frequency = 0;
int64 clock = 0;
auto create(auto (*entrypoint)() -> void, uint frequency) {
this->handle = co_create(65536, entrypoint);
this->frequency = frequency;
this->clock = 0;
}
};
struct CPUA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
CPUA() { create(&CPUA::Enter, cpuFreq); }
} cpuA;
struct SMPA : ThreadA {
static auto Enter() -> void;
auto main() -> void;
SMPA() { create(&SMPA::Enter, smpFreq); }
} smpA;
uint8 queueA[iterations];
uint offsetA;
cothread_t resumeA = cpuA.handle;
auto EnterA() -> void {
offsetA = 0;
co_switch(resumeA);
}
auto QueueA(uint value) -> void {
queueA[offsetA++] = value;
if(offsetA >= iterations) {
resumeA = co_active();
co_switch(mainThread);
}
}
auto CPUA::Enter() -> void { while(true) cpuA.main(); }
auto CPUA::main() -> void {
QueueA(1);
smpA.clock -= cpuStep * smpA.frequency;
if(smpA.clock < 0) co_switch(smpA.handle);
}
auto SMPA::Enter() -> void { while(true) smpA.main(); }
auto SMPA::main() -> void {
QueueA(2);
smpA.clock += smpStep * cpuA.frequency;
if(smpA.clock >= 0) co_switch(cpuA.handle);
}
//
struct ThreadB {
cothread_t handle = nullptr;
uint128_t scalar = 0;
uint128_t clock = 0;
auto print128(uint128_t value) {
string s;
while(value) {
s.append((char)('0' + value % 10));
value /= 10;
}
s.reverse();
print(s, "\n");
}
//femtosecond (10^15) = 16306
//attosecond (10^18) = 688838
//zeptosecond (10^21) = 13712691
//yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
//byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)
auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
this->handle = co_create(65536, entrypoint);
uint128_t unitOfTime = 1;
//for(uint n : range(29)) unitOfTime *= 10;
unitOfTime <<= 96; //2^96 time units ...
this->scalar = unitOfTime / frequency;
print128(this->scalar);
this->clock = 0;
}
auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
};
struct CPUB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
CPUB() { create(&CPUB::Enter, cpuFreq); }
} cpuB;
struct SMPB : ThreadB {
static auto Enter() -> void;
auto main() -> void;
SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
} smpB;
auto correct() -> void {
auto minimum = min(cpuB.clock, smpB.clock);
cpuB.clock -= minimum;
smpB.clock -= minimum;
}
uint8 queueB[iterations];
uint offsetB;
cothread_t resumeB = cpuB.handle;
auto EnterB() -> void {
correct();
offsetB = 0;
co_switch(resumeB);
}
auto QueueB(uint value) -> void {
queueB[offsetB++] = value;
if(offsetB >= iterations) {
resumeB = co_active();
co_switch(mainThread);
}
}
auto CPUB::Enter() -> void { while(true) cpuB.main(); }
auto CPUB::main() -> void {
QueueB(1);
step(cpuStep);
synchronize(smpB);
}
auto SMPB::Enter() -> void { while(true) smpB.main(); }
auto SMPB::main() -> void {
QueueB(2);
step(smpStep);
synchronize(cpuB);
}
//
#include <nall/main.hpp>
auto nall::main(string_vector) -> void {
mainThread = co_active();
uint masterCounter = 0;
while(true) {
print(masterCounter++, " ...\n");
auto A = clock();
EnterA();
auto B = clock();
print((double)(B - A) / CLOCKS_PER_SEC, "s\n");
auto C = clock();
EnterB();
auto D = clock();
print((double)(D - C) / CLOCKS_PER_SEC, "s\n");
for(uint n : range(iterations)) {
if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
}
}
}
...and that's everything.]
2016-07-31 02:11:20 +00:00
|
|
|
//SUB
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1001 ---0 ++-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 7 && reg >= 5) continue;
|
|
|
|
|
|
|
|
EffectiveAddress source{mode, reg};
|
|
|
|
DataRegister target{dreg};
|
|
|
|
bind(opcode | 0 << 6, SUB<Byte>, source, target);
|
|
|
|
bind(opcode | 1 << 6, SUB<Word>, source, target);
|
|
|
|
bind(opcode | 2 << 6, SUB<Long>, source, target);
|
|
|
|
|
|
|
|
if(mode == 1) unbind(opcode | 0 << 6);
|
|
|
|
}
|
|
|
|
|
|
|
|
//SUB
|
|
|
|
for(uint3 dreg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1001 ---1 ++-- ----") | dreg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode <= 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
DataRegister source{dreg};
|
|
|
|
EffectiveAddress target{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, SUB<Byte>, source, target);
|
|
|
|
bind(opcode | 1 << 6, SUB<Word>, source, target);
|
|
|
|
bind(opcode | 2 << 6, SUB<Long>, source, target);
|
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
//SUBA
|
|
|
|
for(uint3 areg : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("1001 ---+ 11-- ----") | areg << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 7 && reg >= 5) continue;
|
|
|
|
|
|
|
|
AddressRegister to{areg};
|
|
|
|
EffectiveAddress from{mode, reg};
|
|
|
|
bind(opcode | 0 << 8, SUBA<Word>, to, from);
|
|
|
|
bind(opcode | 1 << 8, SUBA<Long>, to, from);
|
|
|
|
}
|
|
|
|
|
|
|
|
//SUBI
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0000 0100 ++-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, SUBI<Byte>, with);
|
|
|
|
bind(opcode | 1 << 6, SUBI<Word>, with);
|
|
|
|
bind(opcode | 2 << 6, SUBI<Long>, with);
|
|
|
|
}
|
|
|
|
|
2016-07-25 13:15:54 +00:00
|
|
|
//SUBQ
|
|
|
|
for(uint3 data : range(8))
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0101 ---1 ++-- ----") | data << 9 | mode << 3 | reg << 0;
|
|
|
|
if(mode == 7 && reg >= 2) continue;
|
|
|
|
|
|
|
|
auto immediate = data ? (uint4)data : (uint4)8;
|
2016-08-17 22:04:50 +00:00
|
|
|
if(mode != 1) {
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode | 0 << 6, SUBQ<Byte>, immediate, with);
|
|
|
|
bind(opcode | 1 << 6, SUBQ<Word>, immediate, with);
|
|
|
|
bind(opcode | 2 << 6, SUBQ<Long>, immediate, with);
|
|
|
|
} else {
|
|
|
|
AddressRegister with{reg};
|
|
|
|
bind(opcode | 1 << 6, SUBQ<Word>, immediate, with);
|
|
|
|
bind(opcode | 2 << 6, SUBQ<Long>, immediate, with);
|
|
|
|
}
|
2016-07-25 13:15:54 +00:00
|
|
|
}
|
|
|
|
|
2016-08-08 10:12:03 +00:00
|
|
|
//SUBX
|
|
|
|
for(uint3 treg : range(8))
|
|
|
|
for(uint3 sreg : range(8)) {
|
|
|
|
auto opcode = pattern("1001 ---1 ++00 ----") | treg << 9 | sreg << 0;
|
|
|
|
|
|
|
|
EffectiveAddress dataTarget{DataRegisterDirect, treg};
|
|
|
|
EffectiveAddress dataSource{DataRegisterDirect, sreg};
|
|
|
|
bind(opcode | 0 << 6 | 0 << 3, SUBX<Byte>, dataTarget, dataSource);
|
|
|
|
bind(opcode | 1 << 6 | 0 << 3, SUBX<Word>, dataTarget, dataSource);
|
|
|
|
bind(opcode | 2 << 6 | 0 << 3, SUBX<Long>, dataTarget, dataSource);
|
|
|
|
|
|
|
|
EffectiveAddress addressTarget{AddressRegisterIndirectWithPreDecrement, treg};
|
|
|
|
EffectiveAddress addressSource{AddressRegisterIndirectWithPreDecrement, sreg};
|
|
|
|
bind(opcode | 0 << 6 | 1 << 3, SUBX<Byte>, addressTarget, addressSource);
|
|
|
|
bind(opcode | 1 << 6 | 1 << 3, SUBX<Word>, addressTarget, addressSource);
|
|
|
|
bind(opcode | 2 << 6 | 1 << 3, SUBX<Long>, addressTarget, addressSource);
|
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//SWAP
|
|
|
|
for(uint3 dreg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1000 0100 0---") | dreg << 0;
|
|
|
|
|
|
|
|
DataRegister with{dreg};
|
|
|
|
bind(opcode, SWAP, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//TAS
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1010 11-- ----") | mode << 3 | reg << 0;
|
|
|
|
if(mode == 1 || (mode == 7 && reg >= 2)) continue;
|
|
|
|
|
|
|
|
EffectiveAddress with{mode, reg};
|
|
|
|
bind(opcode, TAS, with);
|
|
|
|
}
|
|
|
|
|
|
|
|
//TRAP
|
|
|
|
for(uint4 vector : range(16)) {
|
|
|
|
auto opcode = pattern("0100 1110 0100 ----") | vector << 0;
|
|
|
|
|
|
|
|
bind(opcode, TRAP, vector);
|
|
|
|
}
|
|
|
|
|
|
|
|
//TRAPV
|
|
|
|
{ auto opcode = pattern("0100 1110 0111 0110");
|
|
|
|
|
|
|
|
bind(opcode, TRAPV);
|
|
|
|
}
|
|
|
|
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
//TST
|
|
|
|
for(uint3 mode : range(8))
|
|
|
|
for(uint3 reg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1010 ++-- ----") | mode << 3 | reg << 0;
|
2016-07-22 12:03:25 +00:00
|
|
|
if(mode == 7 && reg >= 2) continue;
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
|
2016-07-23 02:32:35 +00:00
|
|
|
EffectiveAddress ea{mode, reg};
|
Update to v100r09 release.
byuu says:
Another six hours in ...
I have all of the opcodes, memory access functions, disassembler mnemonics
and table building converted over to the new template<uint Size> format.
Certainly, it would be quite easy for this nightmare chip to throw me
another curveball, but so far I can handle:
- MOVE (EA to, EA from) case
- read(from) has to update register index for +/-(aN) mode
- MOVEM (EA from) case
- when using +/-(aN), RA can't actually be updated until the transfer
is completed
- LEA (EA from) case
- doesn't actually perform the final read; just returns the address
to be read from
- ANDI (EA from-and-to) case
- same EA has to be read from and written to
- for -(aN), the read has to come from aN-2, but can't update aN yet;
so that the write also goes to aN-2
- no opcode can ever fetch the extension words more than once
- manually control the order of extension word fetching order for proper
opcode decoding
To do all of that without a whole lot of duplicated code (or really
bloating out every single instruction with red tape), I had to bring
back the "bool valid / uint32 address" variables inside the EA struct =(
If weird exceptions creep in like timing constraints only on certain
opcodes, I can use template flags to the EA read/write functions to
handle that.
2016-07-19 09:12:05 +00:00
|
|
|
bind(opcode | 0 << 6, TST<Byte>, ea);
|
|
|
|
bind(opcode | 1 << 6, TST<Word>, ea);
|
|
|
|
bind(opcode | 2 << 6, TST<Long>, ea);
|
2016-07-22 12:03:25 +00:00
|
|
|
|
|
|
|
if(mode == 1) unbind(opcode | 0 << 6);
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
}
|
|
|
|
|
2016-08-10 22:02:02 +00:00
|
|
|
//UNLK
|
|
|
|
for(uint3 areg : range(8)) {
|
|
|
|
auto opcode = pattern("0100 1110 0101 1---") | areg << 0;
|
|
|
|
|
|
|
|
AddressRegister with{areg};
|
|
|
|
bind(opcode, UNLK, with);
|
|
|
|
}
|
|
|
|
|
Update to v101r04 release.
byuu says:
Changelog:
- pulled the (u)intN type aliases into higan instead of leaving them
in nall
- added 68K LINEA, LINEF hooks for illegal instructions
- filled the rest of the 68K lambda table with generic instance of
ILLEGAL
- completed the 68K disassembler effective addressing modes
- still unsure whether I should use An to decode absolute
addresses or not
- pro: way easier to read where accesses are taking place
- con: requires An to be valid; so as a disassembler it does a
poor job
- making it optional: too much work; ick
- added I/O decoding for the VDP command-port registers
- added skeleton timing to all five processor cores
- output at 1280x480 (needed for mixed 256/320 widths; and to handle
interlace modes)
The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and
syncing; which is the pathological worst case for libco. But they also
have no logic inside of them. With all the above, I'm averaging around
250fps with just the 68K core actually functional, and the VDP doing a
dumb "draw white pixels" loop. Still way too early to tell how this
emulator is going to perform.
Also, the 320x240 mode of the Genesis means that we don't need an aspect
correction ratio. But we do need to ensure the output window is a
multiple 320x240 so that the scale values work correctly. I was
hard-coding aspect correction to stretch the window an additional \*8/7.
But that won't work anymore so ... the main higan window is now 640x480,
960x720, or 1280x960. Toggling aspect correction only changes the video
width inside the window.
It's a bit jarring ... the window is a lot wider, more black space now
for most modes. But for now, it is what it is.
2016-08-12 01:07:04 +00:00
|
|
|
//ILLEGAL
|
Update to v100r06 release.
byuu says:
Up to ten 68K instructions out of somewhere between 61 and 88, depending
upon which PDF you look at. Of course, some of them aren't 100% completed
yet, either. Lots of craziness with MOVEM, and BCC has a BSR variant
that needs stack push/pop functions.
This WIP actually took over eight hours to make, going through every
possible permutation on how to design the core itself. The updated design
now builds both the instruction decoder+dispatcher and the disassembler
decoder into the same main loop during M68K's constructor.
The special cases are also really psychotic on this processor, and
I'm afraid of missing something via the fallthrough cases. So instead,
I'm ordering the instructions alphabetically, and including exclusion
cases to ignore binding invalid cases. If I end up remapping an existing
register, then it'll throw a run-time assertion at program startup.
I wanted very much to get rid of struct EA (EffectiveAddress), but
it's too difficult to keep track of the internal effective address
without it. So I split out the size to a separate parameter, since
every opcode only has one size parameter, and otherwise it was getting
duplicated in opcodes that take two EAs, and was also awkward with the
flag testing. It's a bit more typing, but I feel it's more clean this way.
Overall, I'm really worried this is going to be too slow. I don't want
to turn the EA stuff into templates, because that will massively bloat
out compilation times and object sizes, and will also need a special DSL
preprocessor since C++ doesn't have a static for loop. I can definitely
optimize a lot of EA's address/read/write functions away once the core
is completed, but it's never going to hold a candle to a templatized
68K core.
----
Forgot to include the SA-1 regression fix. I always remember immediately
after I upload and archive the WIP. Will try to get that in next time,
I guess.
2016-07-16 08:39:44 +00:00
|
|
|
for(uint16 opcode : range(65536)) {
|
|
|
|
if(instructionTable[opcode]) continue;
|
Update to v101r04 release.
byuu says:
Changelog:
- pulled the (u)intN type aliases into higan instead of leaving them
in nall
- added 68K LINEA, LINEF hooks for illegal instructions
- filled the rest of the 68K lambda table with generic instance of
ILLEGAL
- completed the 68K disassembler effective addressing modes
- still unsure whether I should use An to decode absolute
addresses or not
- pro: way easier to read where accesses are taking place
- con: requires An to be valid; so as a disassembler it does a
poor job
- making it optional: too much work; ick
- added I/O decoding for the VDP command-port registers
- added skeleton timing to all five processor cores
- output at 1280x480 (needed for mixed 256/320 widths; and to handle
interlace modes)
The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and
syncing; which is the pathological worst case for libco. But they also
have no logic inside of them. With all the above, I'm averaging around
250fps with just the 68K core actually functional, and the VDP doing a
dumb "draw white pixels" loop. Still way too early to tell how this
emulator is going to perform.
Also, the 320x240 mode of the Genesis means that we don't need an aspect
correction ratio. But we do need to ensure the output window is a
multiple 320x240 so that the scale values work correctly. I was
hard-coding aspect correction to stretch the window an additional \*8/7.
But that won't work anymore so ... the main higan window is now 640x480,
960x720, or 1280x960. Toggling aspect correction only changes the video
width inside the window.
It's a bit jarring ... the window is a lot wider, more black space now
for most modes. But for now, it is what it is.
2016-08-12 01:07:04 +00:00
|
|
|
bind(opcode, ILLEGAL);
|
2016-07-12 10:19:31 +00:00
|
|
|
}
|
Update to v101r04 release.
byuu says:
Changelog:
- pulled the (u)intN type aliases into higan instead of leaving them
in nall
- added 68K LINEA, LINEF hooks for illegal instructions
- filled the rest of the 68K lambda table with generic instance of
ILLEGAL
- completed the 68K disassembler effective addressing modes
- still unsure whether I should use An to decode absolute
addresses or not
- pro: way easier to read where accesses are taking place
- con: requires An to be valid; so as a disassembler it does a
poor job
- making it optional: too much work; ick
- added I/O decoding for the VDP command-port registers
- added skeleton timing to all five processor cores
- output at 1280x480 (needed for mixed 256/320 widths; and to handle
interlace modes)
The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and
syncing; which is the pathological worst case for libco. But they also
have no logic inside of them. With all the above, I'm averaging around
250fps with just the 68K core actually functional, and the VDP doing a
dumb "draw white pixels" loop. Still way too early to tell how this
emulator is going to perform.
Also, the 320x240 mode of the Genesis means that we don't need an aspect
correction ratio. But we do need to ensure the output window is a
multiple 320x240 so that the scale values work correctly. I was
hard-coding aspect correction to stretch the window an additional \*8/7.
But that won't work anymore so ... the main higan window is now 640x480,
960x720, or 1280x960. Toggling aspect correction only changes the video
width inside the window.
It's a bit jarring ... the window is a lot wider, more black space now
for most modes. But for now, it is what it is.
2016-08-12 01:07:04 +00:00
|
|
|
|
|
|
|
#undef bind
|
|
|
|
#undef unbind
|
|
|
|
#undef pattern
|
2016-07-12 10:19:31 +00:00
|
|
|
}
|