bsnes/higan/processor/arm7tdmi/disassembler.cpp

414 lines
14 KiB
C++
Raw Normal View History

static const string _r[] = {
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
"r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
"r8", "r9", "r10", "r11", "r12", "sp", "lr", "pc"
};
static const string _conditions[] = {
"eq", "ne", "cs", "cc", "mi", "pl", "vs", "vc",
"hi", "ls", "ge", "lt", "gt", "le", "", "nv",
};
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
#define _s save ? "s" : ""
#define _move(mode) (mode == 13 || mode == 15)
#define _comp(mode) (mode >= 8 && mode <= 11)
#define _math(mode) (mode <= 7 || mode == 12 || mode == 14)
Update to v103r31 release. byuu says: Changelog: - gba/cpu: slight speedup to CPU::step() - processor/arm7tdmi: fixed about ten bugs, ST018 and GBA games are now playable once again - processor/arm: removed core from codebase - processor/v30mz: code cleanup (renamed functions; updated instruction() for consistency with other cores) It turns out on my much faster system, the new ARM7TDMI core is very slightly slower than the old one (by about 2% or so FPS.) But the CPU::step() improvement basically made it a wash. So yeah, I'm in really serious trouble with how slow my GBA core is now. Sigh. As for higan/processor ... this concludes the first phase of major cleanups and rewrites. There will always be work to do, and I have two more phases in mind. One is that a lot of the instruction disassemblers are very old. One even uses sprintf still. I'd like to modernize them all. Also, the ARM7TDMI core (and the ARM core before it) can't really disassemble because the PC address used for instruction execution is not known prior to calling instruction(), due to pipeline reload fetches that may occur inside of said function. I had a nasty hack for debugging the new core, but I'd like to come up with a clean way to allow tracing the new ARM7TDMI core. Another is that I'd still like to rename a lot of instruction function names in various cores to be more descriptive. I really liked how the LR35902 core came out there, and would like to get that level of detail in with the other cores as well.
2017-08-10 11:26:02 +00:00
auto ARM7TDMI::disassemble(maybe<uint32> pc, maybe<boolean> thumb) -> string {
if(!pc) pc = pipeline.execute.address;
if(!thumb) thumb = cpsr().t;
_pc = pc();
if(!thumb()) {
uint32 opcode = read(Word | Nonsequential, _pc & ~3);
uint12 index = (opcode & 0x0ff00000) >> 16 | (opcode & 0x000000f0) >> 4;
_c = _conditions[opcode >> 28];
return {hex(_pc, 8L), " ", armDisassemble[index](opcode)};
} else {
uint16 opcode = read(Half | Nonsequential, _pc & ~1);
return {hex(_pc, 8L), " ", thumbDisassemble[opcode]()};
}
}
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
auto ARM7TDMI::disassembleRegisters() -> string {
string output;
for(uint n : range(16)) {
output.append(_r[n], ":", hex(r(n), 8L), " ");
}
output.append("cpsr:");
output.append(cpsr().n ? "N" : "n");
output.append(cpsr().z ? "Z" : "z");
output.append(cpsr().c ? "C" : "c");
output.append(cpsr().v ? "V" : "v", "/");
output.append(cpsr().i ? "I" : "i");
output.append(cpsr().f ? "F" : "f");
output.append(cpsr().t ? "T" : "t", "/");
output.append(hex(cpsr().m, 2L));
if(cpsr().m == PSR::USR || cpsr().m == PSR::SYS) return output;
Update to v103r31 release. byuu says: Changelog: - gba/cpu: slight speedup to CPU::step() - processor/arm7tdmi: fixed about ten bugs, ST018 and GBA games are now playable once again - processor/arm: removed core from codebase - processor/v30mz: code cleanup (renamed functions; updated instruction() for consistency with other cores) It turns out on my much faster system, the new ARM7TDMI core is very slightly slower than the old one (by about 2% or so FPS.) But the CPU::step() improvement basically made it a wash. So yeah, I'm in really serious trouble with how slow my GBA core is now. Sigh. As for higan/processor ... this concludes the first phase of major cleanups and rewrites. There will always be work to do, and I have two more phases in mind. One is that a lot of the instruction disassemblers are very old. One even uses sprintf still. I'd like to modernize them all. Also, the ARM7TDMI core (and the ARM core before it) can't really disassemble because the PC address used for instruction execution is not known prior to calling instruction(), due to pipeline reload fetches that may occur inside of said function. I had a nasty hack for debugging the new core, but I'd like to come up with a clean way to allow tracing the new ARM7TDMI core. Another is that I'd still like to rename a lot of instruction function names in various cores to be more descriptive. I really liked how the LR35902 core came out there, and would like to get that level of detail in with the other cores as well.
2017-08-10 11:26:02 +00:00
output.append(" spsr:");
output.append(spsr().n ? "N" : "n");
output.append(spsr().z ? "Z" : "z");
output.append(spsr().c ? "C" : "c");
output.append(spsr().v ? "V" : "v", "/");
output.append(spsr().i ? "I" : "i");
output.append(spsr().f ? "F" : "f");
output.append(spsr().t ? "T" : "t", "/");
output.append(hex(spsr().m, 2L));
return output;
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
}
//
auto ARM7TDMI::armDisassembleBranch
(int24 displacement, uint1 link) -> string {
return {"b", link ? "l" : "", _c, " 0x", hex(_pc + 8 + displacement * 4, 8L)};
}
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
auto ARM7TDMI::armDisassembleBranchExchangeRegister
(uint4 m) -> string {
return {"bx", _c, " ", _r[m]};
}
auto ARM7TDMI::armDisassembleDataImmediate
(uint8 immediate, uint4 shift, uint4 d, uint4 n, uint1 save, uint4 mode) -> string {
static const string opcode[] = {
"and", "eor", "sub", "rsb", "add", "adc", "sbc", "rsc",
"tst", "teq", "cmp", "cmn", "orr", "mov", "bic", "mvn",
};
uint32 data = immediate >> (shift << 1) | immediate << 32 - (shift << 1);
return {opcode[mode], _c,
_move(mode) ? string{_s, " ", _r[d]} : string{},
_comp(mode) ? string{" ", _r[n]} : string{},
_math(mode) ? string{_s, " ", _r[d], ",", _r[n]} : string{},
",#0x", hex(data, 8L)};
}
auto ARM7TDMI::armDisassembleDataImmediateShift
(uint4 m, uint2 type, uint5 shift, uint4 d, uint4 n, uint1 save, uint4 mode) -> string {
static const string opcode[] = {
"and", "eor", "sub", "rsb", "add", "adc", "sbc", "rsc",
"tst", "teq", "cmp", "cmn", "orr", "mov", "bic", "mvn",
};
return {opcode[mode], _c,
_move(mode) ? string{_s, " ", _r[d]} : string{},
_comp(mode) ? string{" ", _r[n]} : string{},
_math(mode) ? string{_s, " ", _r[d], ",", _r[n]} : string{},
",", _r[m],
type == 0 && shift ? string{" lsl #", shift} : string{},
type == 1 ? string{" lsr #", shift ? (uint)shift : 32} : string{},
type == 2 ? string{" asr #", shift ? (uint)shift : 32} : string{},
type == 3 && shift ? string{" ror #", shift} : string{},
type == 3 && !shift ? " rrx" : ""};
}
auto ARM7TDMI::armDisassembleDataRegisterShift
(uint4 m, uint2 type, uint4 s, uint4 d, uint4 n, uint1 save, uint4 mode) -> string {
static const string opcode[] = {
"and", "eor", "sub", "rsb", "add", "adc", "sbc", "rsc",
"tst", "teq", "cmp", "cmn", "orr", "mov", "bic", "mvn",
};
return {opcode[mode], _c,
_move(mode) ? string{_s, " ", _r[d]} : string{},
_comp(mode) ? string{" ", _r[n]} : string{},
_math(mode) ? string{_s, " ", _r[d], ",", _r[n]} : string{},
",", _r[m], " ",
type == 0 ? "lsl" : "",
type == 1 ? "lsr" : "",
type == 2 ? "asr" : "",
type == 3 ? "ror" : "",
" ", _r[s]};
}
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
auto ARM7TDMI::armDisassembleLoadImmediate
(uint8 immediate, uint1 half, uint4 d, uint4 n, uint1 writeback, uint1 up, uint1 pre) -> string {
string data;
if(n == 15) data = {" =0x", hex(read((half ? Half : Byte) | Nonsequential,
_pc + 8 + (up ? +immediate : -immediate)), half ? 4L : 2L)};
return {"ldr", _c, half ? "sh" : "sb", " ",
_r[d], ",[", _r[n],
pre == 0 ? "]" : "",
immediate ? string{",", up ? "+" : "-", "0x", hex(immediate, 2L)} : string{},
pre == 1 ? "]" : "",
pre == 0 || writeback ? "!" : "", data};
}
auto ARM7TDMI::armDisassembleLoadRegister
(uint4 m, uint1 half, uint4 d, uint4 n, uint1 writeback, uint1 up, uint1 pre) -> string {
return {"ldr", _c, half ? "sh" : "sb", " ",
_r[d], ",[", _r[n],
pre == 0 ? "]" : "",
",", up ? "+" : "-", _r[m],
pre == 1 ? "]" : "",
pre == 0 || writeback ? "!" : ""};
}
auto ARM7TDMI::armDisassembleMemorySwap
(uint4 m, uint4 d, uint4 n, uint1 byte) -> string {
return {"swp", _c, byte ? "b" : "", " ", _r[d], ",", _r[m], ",[", _r[n], "]"};
}
auto ARM7TDMI::armDisassembleMoveHalfImmediate
(uint8 immediate, uint4 d, uint4 n, uint1 mode, uint1 writeback, uint1 up, uint1 pre) -> string {
string data;
if(n == 15) data = {" =0x", hex(read(Half | Nonsequential, _pc + (up ? +immediate : -immediate)), 4L)};
return {mode ? "ldr" : "str", _c, "h ",
_r[d], ",[", _r[n],
pre == 0 ? "]" : "",
immediate ? string{",", up ? "+" : "-", "0x", hex(immediate, 2L)} : string{},
pre == 1 ? "]" : "",
pre == 0 || writeback ? "!" : "", data};
}
auto ARM7TDMI::armDisassembleMoveHalfRegister
(uint4 m, uint4 d, uint4 n, uint1 mode, uint1 writeback, uint1 up, uint1 pre) -> string {
return {mode ? "ldr" : "str", _c, "h ",
_r[d], ",[", _r[n],
pre == 0 ? "]" : "",
",", up ? "+" : "-", _r[m],
pre == 1 ? "]" : "",
pre == 0 || writeback ? "!" : ""};
}
auto ARM7TDMI::armDisassembleMoveImmediateOffset
(uint12 immediate, uint4 d, uint4 n, uint1 mode, uint1 writeback, uint1 byte, uint1 up, uint1 pre) -> string {
string data;
if(n == 15) data = {" =0x", hex(read((byte ? Byte : Word) | Nonsequential,
_pc + 8 + (up ? +immediate : -immediate)), byte ? 2L : 4L)};
return {mode ? "ldr" : "str", _c, byte ? "b" : "", " ", _r[d], ",[", _r[n],
pre == 0 ? "]" : "",
immediate ? string{",", up ? "+" : "-", "0x", hex(immediate, 3L)} : string{},
pre == 1 ? "]" : "",
pre == 0 || writeback ? "!" : "", data};
}
auto ARM7TDMI::armDisassembleMoveMultiple
(uint16 list, uint4 n, uint1 mode, uint1 writeback, uint1 type, uint1 up, uint1 pre) -> string {
string registers;
for(auto index : range(16)) {
if(list.bit(index)) registers.append(_r[index], ",");
}
registers.trimRight(",", 1L);
Update to v103r31 release. byuu says: Changelog: - gba/cpu: slight speedup to CPU::step() - processor/arm7tdmi: fixed about ten bugs, ST018 and GBA games are now playable once again - processor/arm: removed core from codebase - processor/v30mz: code cleanup (renamed functions; updated instruction() for consistency with other cores) It turns out on my much faster system, the new ARM7TDMI core is very slightly slower than the old one (by about 2% or so FPS.) But the CPU::step() improvement basically made it a wash. So yeah, I'm in really serious trouble with how slow my GBA core is now. Sigh. As for higan/processor ... this concludes the first phase of major cleanups and rewrites. There will always be work to do, and I have two more phases in mind. One is that a lot of the instruction disassemblers are very old. One even uses sprintf still. I'd like to modernize them all. Also, the ARM7TDMI core (and the ARM core before it) can't really disassemble because the PC address used for instruction execution is not known prior to calling instruction(), due to pipeline reload fetches that may occur inside of said function. I had a nasty hack for debugging the new core, but I'd like to come up with a clean way to allow tracing the new ARM7TDMI core. Another is that I'd still like to rename a lot of instruction function names in various cores to be more descriptive. I really liked how the LR35902 core came out there, and would like to get that level of detail in with the other cores as well.
2017-08-10 11:26:02 +00:00
return {mode ? "ldm" : "stm", _c,
up == 0 && pre == 0 ? "da" : "",
up == 0 && pre == 1 ? "db" : "",
up == 1 && pre == 0 ? "ia" : "",
up == 1 && pre == 1 ? "ib" : "",
" ", _r[n], writeback ? "!" : "",
",{", registers, "}", type ? "^" : ""};
}
auto ARM7TDMI::armDisassembleMoveRegisterOffset
(uint4 m, uint2 type, uint5 shift, uint4 d, uint4 n, uint1 mode, uint1 writeback, uint1 byte, uint1 up, uint1 pre) -> string {
return {mode ? "ldr" : "str", _c, byte ? "b" : "", " ", _r[d], ",[", _r[n],
pre == 0 ? "]" : "",
",", up ? "+" : "-", _r[m],
type == 0 && shift ? string{" lsl #", shift} : string{},
type == 1 ? string{" lsr #", shift ? (uint)shift : 32} : string{},
type == 2 ? string{" asr #", shift ? (uint)shift : 32} : string{},
type == 3 && shift ? string{" ror #", shift} : string{},
type == 3 && !shift ? " rrx" : "",
pre == 1 ? "]" : "",
pre == 0 || writeback ? "!" : ""};
}
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
auto ARM7TDMI::armDisassembleMoveToRegisterFromStatus
(uint4 d, uint1 mode) -> string {
return {"mrs", _c, " ", _r[d], ",", mode ? "spsr" : "cpsr"};
}
auto ARM7TDMI::armDisassembleMoveToStatusFromImmediate
(uint8 immediate, uint4 rotate, uint4 field, uint1 mode) -> string {
uint32 data = immediate >> (rotate << 1) | immediate << 32 - (rotate << 1);
return {"msr", _c, " ", mode ? "spsr:" : "cpsr:",
field.bit(0) ? "c" : "",
field.bit(1) ? "x" : "",
field.bit(2) ? "s" : "",
field.bit(3) ? "f" : "",
",#0x", hex(data, 8L)};
}
auto ARM7TDMI::armDisassembleMoveToStatusFromRegister
(uint4 m, uint4 field, uint1 mode) -> string {
return {"msr", _c, " ", mode ? "spsr:" : "cpsr:",
field.bit(0) ? "c" : "",
field.bit(1) ? "x" : "",
field.bit(2) ? "s" : "",
field.bit(3) ? "f" : "",
",", _r[m]};
}
auto ARM7TDMI::armDisassembleMultiply
(uint4 m, uint4 s, uint4 n, uint4 d, uint1 save, uint1 accumulate) -> string {
if(accumulate) {
return {"mla", _c, _s, " ", _r[d], ",", _r[m], ",", _r[s], ",", _r[n]};
} else {
return {"mul", _c, _s, " ", _r[d], ",", _r[m], ",", _r[s]};
}
}
auto ARM7TDMI::armDisassembleMultiplyLong
(uint4 m, uint4 s, uint4 l, uint4 h, uint1 save, uint1 accumulate, uint1 sign) -> string {
return {sign ? "s" : "u", accumulate ? "mlal" : "mull", _c, _s, " ",
_r[l], ",", _r[h], ",", _r[m], ",", _r[s]};
}
auto ARM7TDMI::armDisassembleSoftwareInterrupt
(uint24 immediate) -> string {
return {"swi #0x", hex(immediate, 6L)};
}
auto ARM7TDMI::armDisassembleUndefined
() -> string {
return {"undefined"};
}
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
//
auto ARM7TDMI::thumbDisassembleALU
(uint3 d, uint3 m, uint4 mode) -> string {
static const string opcode[] = {
"and", "eor", "lsl", "lsr", "asr", "adc", "sbc", "ror",
"tst", "neg", "cmp", "cmn", "orr", "mul", "bic", "mvn",
};
return {opcode[mode], " ", _r[d], ",", _r[m]};
}
auto ARM7TDMI::thumbDisassembleALUExtended
(uint4 d, uint4 m, uint2 mode) -> string {
static const string opcode[] = {"add", "sub", "mov"};
if(d == 8 && m == 8 && mode == 2) return {"nop"};
return {opcode[mode], " ", _r[d], ",", _r[m]};
}
auto ARM7TDMI::thumbDisassembleAddRegister
(uint8 immediate, uint3 d, uint1 mode) -> string {
return {"add ", _r[d], ",", mode ? "sp" : "pc", ",#0x", hex(immediate, 2L)};
}
auto ARM7TDMI::thumbDisassembleAdjustImmediate
(uint3 d, uint3 n, uint3 immediate, uint1 mode) -> string {
return {!mode ? "add" : "sub", " ", _r[d], ",", _r[n], ",#", immediate};
}
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
auto ARM7TDMI::thumbDisassembleAdjustRegister
(uint3 d, uint3 n, uint3 m, uint1 mode) -> string {
return {!mode ? "add" : "sub", " ", _r[d], ",", _r[n], ",", _r[m]};
}
auto ARM7TDMI::thumbDisassembleAdjustStack
(uint7 immediate, uint1 mode) -> string {
return {!mode ? "add" : "sub", " sp,#0x", hex(immediate * 4, 3L)};
}
auto ARM7TDMI::thumbDisassembleBranchExchange
(uint4 m) -> string {
return {"bx ", _r[m]};
}
auto ARM7TDMI::thumbDisassembleBranchFarPrefix
(int11 displacementHi) -> string {
uint11 displacementLo = read(Half | Nonsequential, (_pc & ~1) + 2);
int22 displacement = displacementHi << 11 | displacementLo << 0;
uint32 address = _pc + 4 + displacement * 2;
Update to v103r31 release. byuu says: Changelog: - gba/cpu: slight speedup to CPU::step() - processor/arm7tdmi: fixed about ten bugs, ST018 and GBA games are now playable once again - processor/arm: removed core from codebase - processor/v30mz: code cleanup (renamed functions; updated instruction() for consistency with other cores) It turns out on my much faster system, the new ARM7TDMI core is very slightly slower than the old one (by about 2% or so FPS.) But the CPU::step() improvement basically made it a wash. So yeah, I'm in really serious trouble with how slow my GBA core is now. Sigh. As for higan/processor ... this concludes the first phase of major cleanups and rewrites. There will always be work to do, and I have two more phases in mind. One is that a lot of the instruction disassemblers are very old. One even uses sprintf still. I'd like to modernize them all. Also, the ARM7TDMI core (and the ARM core before it) can't really disassemble because the PC address used for instruction execution is not known prior to calling instruction(), due to pipeline reload fetches that may occur inside of said function. I had a nasty hack for debugging the new core, but I'd like to come up with a clean way to allow tracing the new ARM7TDMI core. Another is that I'd still like to rename a lot of instruction function names in various cores to be more descriptive. I really liked how the LR35902 core came out there, and would like to get that level of detail in with the other cores as well.
2017-08-10 11:26:02 +00:00
return {"bl 0x", hex(address, 8L)};
}
auto ARM7TDMI::thumbDisassembleBranchFarSuffix
(uint11 displacement) -> string {
Update to v103r31 release. byuu says: Changelog: - gba/cpu: slight speedup to CPU::step() - processor/arm7tdmi: fixed about ten bugs, ST018 and GBA games are now playable once again - processor/arm: removed core from codebase - processor/v30mz: code cleanup (renamed functions; updated instruction() for consistency with other cores) It turns out on my much faster system, the new ARM7TDMI core is very slightly slower than the old one (by about 2% or so FPS.) But the CPU::step() improvement basically made it a wash. So yeah, I'm in really serious trouble with how slow my GBA core is now. Sigh. As for higan/processor ... this concludes the first phase of major cleanups and rewrites. There will always be work to do, and I have two more phases in mind. One is that a lot of the instruction disassemblers are very old. One even uses sprintf still. I'd like to modernize them all. Also, the ARM7TDMI core (and the ARM core before it) can't really disassemble because the PC address used for instruction execution is not known prior to calling instruction(), due to pipeline reload fetches that may occur inside of said function. I had a nasty hack for debugging the new core, but I'd like to come up with a clean way to allow tracing the new ARM7TDMI core. Another is that I'd still like to rename a lot of instruction function names in various cores to be more descriptive. I really liked how the LR35902 core came out there, and would like to get that level of detail in with the other cores as well.
2017-08-10 11:26:02 +00:00
return {"bl (suffix)"};
}
auto ARM7TDMI::thumbDisassembleBranchNear
(int11 displacement) -> string {
uint32 address = _pc + 4 + displacement * 2;
return {"b 0x", hex(address, 8L)};
}
auto ARM7TDMI::thumbDisassembleBranchTest
(int8 displacement, uint4 condition) -> string {
uint32 address = _pc + 4 + displacement * 2;
return {"b", _conditions[condition], " 0x", hex(address, 8L)};
}
auto ARM7TDMI::thumbDisassembleImmediate
(uint8 immediate, uint3 d, uint2 mode) -> string {
static const string opcode[] = {"mov", "cmp", "add", "sub"};
return {opcode[mode], " ", _r[d], ",#0x", hex(immediate, 2L)};
}
auto ARM7TDMI::thumbDisassembleLoadLiteral
(uint8 displacement, uint3 d) -> string {
uint32 address = ((_pc + 4) & ~3) + (displacement << 2);
uint32 data = read(Word | Nonsequential, address);
return {"ldr ", _r[d], ",[pc,#0x", hex(address, 8L), "] =0x", hex(data, 8L)};
}
auto ARM7TDMI::thumbDisassembleMoveByteImmediate
(uint3 d, uint3 n, uint5 offset, uint1 mode) -> string {
return {mode ? "ldrb" : "strb", " ", _r[d], ",[", _r[n], ",#0x", hex(offset, 2L), "]"};
}
auto ARM7TDMI::thumbDisassembleMoveHalfImmediate
(uint3 d, uint3 n, uint5 offset, uint1 mode) -> string {
return {mode ? "ldrh" : "strh", " ", _r[d], ",[", _r[n], ",#0x", hex(offset * 2, 2L), "]"};
}
auto ARM7TDMI::thumbDisassembleMoveMultiple
(uint8 list, uint3 n, uint1 mode) -> string {
string registers;
for(uint m : range(8)) {
if(list.bit(m)) registers.append(_r[m], ",");
}
registers.trimRight(",", 1L);
return {mode ? "ldmia" : "stmia", " ", _r[n], "!,{", registers, "}"};
}
auto ARM7TDMI::thumbDisassembleMoveRegisterOffset
(uint3 d, uint3 n, uint3 m, uint3 mode) -> string {
static const string opcode[] = {"str", "strh", "strb", "ldsb", "ldr", "ldrh", "ldrb", "ldsh"};
return {opcode[mode], " ", _r[d], ",[", _r[n], ",", _r[m], "]"};
}
auto ARM7TDMI::thumbDisassembleMoveStack
(uint8 immediate, uint3 d, uint1 mode) -> string {
return {mode ? "ldr" : "str", " ", _r[d], ",[sp,#0x", hex(immediate * 4, 3L), "]"};
}
auto ARM7TDMI::thumbDisassembleMoveWordImmediate
(uint3 d, uint3 n, uint5 offset, uint1 mode) -> string {
return {mode ? "ldr" : "str", " ", _r[d], ",[", _r[n], ",#0x", hex(offset * 4, 2L), "]"};
}
auto ARM7TDMI::thumbDisassembleShiftImmediate
(uint3 d, uint3 m, uint5 immediate, uint2 mode) -> string {
static const string opcode[] = {"lsl", "lsr", "asr"};
return {opcode[mode], " ", _r[d], ",", _r[m], ",#", immediate};
}
auto ARM7TDMI::thumbDisassembleSoftwareInterrupt
(uint8 immediate) -> string {
return {"swi #0x", hex(immediate, 2L)};
}
auto ARM7TDMI::thumbDisassembleStackMultiple
(uint8 list, uint1 lrpc, uint1 mode) -> string {
string registers;
for(uint m : range(8)) {
if(list.bit(m)) registers.append(_r[m], ",");
}
if(lrpc) registers.append(!mode ? "lr," : "pc,");
registers.trimRight(",", 1L);
return {!mode ? "push" : "pop", " {", registers, "}"};
}
auto ARM7TDMI::thumbDisassembleUndefined
() -> string {
return {"undefined"};
}
#undef _s
#undef _move
#undef _comp
#undef _save