bsnes/higan/processor/arm7tdmi/algorithms.cpp

97 lines
3.5 KiB
C++
Raw Normal View History

Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
auto ARM7TDMI::ADD(uint32 source, uint32 modify, bool carry) -> uint32 {
uint32 result = source + modify + carry;
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
if(cpsr().t || opcode.bit(20)) {
uint32 overflow = ~(source ^ modify) & (source ^ result);
cpsr().v = 1 << 31 & (overflow);
cpsr().c = 1 << 31 & (overflow ^ source ^ modify ^ result);
cpsr().z = result == 0;
cpsr().n = result.bit(31);
}
return result;
}
auto ARM7TDMI::ASR(uint32 source, uint8 shift) -> uint32 {
carry = cpsr().c;
if(shift == 0) return source;
carry = shift > 32 ? source & 1 << 31 : source & 1 << shift - 1;
source = shift > 31 ? (int32)source >> 31 : (int32)source >> shift;
return source;
}
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
auto ARM7TDMI::BIT(uint32 result) -> uint32 {
if(cpsr().t || opcode.bit(20)) {
cpsr().c = carry;
cpsr().z = result == 0;
cpsr().n = result.bit(31);
}
return result;
}
auto ARM7TDMI::LSL(uint32 source, uint8 shift) -> uint32 {
carry = cpsr().c;
if(shift == 0) return source;
carry = shift > 32 ? 0 : source & 1 << 32 - shift;
source = shift > 31 ? 0 : source << shift;
return source;
}
auto ARM7TDMI::LSR(uint32 source, uint8 shift) -> uint32 {
carry = cpsr().c;
if(shift == 0) return source;
carry = shift > 32 ? 0 : source & 1 << shift - 1;
source = shift > 31 ? 0 : source >> shift;
return source;
}
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
auto ARM7TDMI::MUL(uint32 product, uint32 multiplicand, uint32 multiplier) -> uint32 {
idle();
if(multiplier >> 8 && multiplier >> 8 != 0xffffff) idle();
if(multiplier >> 16 && multiplier >> 16 != 0xffff) idle();
if(multiplier >> 24 && multiplier >> 24 != 0xff) idle();
product += multiplicand * multiplier;
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
if(cpsr().t || opcode.bit(20)) {
cpsr().z = product == 0;
cpsr().n = product.bit(31);
}
return product;
}
auto ARM7TDMI::ROR(uint32 source, uint8 shift) -> uint32 {
carry = cpsr().c;
if(shift == 0) return source;
if(shift &= 31) source = source << 32 - shift | source >> shift;
carry = source & 1 << 31;
return source;
}
auto ARM7TDMI::RRX(uint32 source) -> uint32 {
carry = source.bit(0);
return cpsr().c << 31 | source >> 1;
}
Update to v103r28 release. byuu says: Changelog: - processor/arm7tdmi: implemented 10 of 19 ARM instructions - processor/arm7tdmi: implemented 1 of 22 THUMB instructions Today's WIP was 6 hours of work, and yesterday's was 5 hours. Half of today was just trying to come up with the design to use a lambda-based dispatcher to map both instructions and disassembly, similar to the 68K core. The problem is that the ARM core has 28 unique bits, which is just far too many bits to have a full lookup table like the 16-bit 68K core. The thing I wanted more than anything else was to perform the opcode bitfield decoding once, and have it decoded for both instructions and the disassembler. It took three hours to come up with a design that worked for the ARM half ... relying on #defines being able to pull in other #defines that were declared and changed later after the first one. But, I'm happy with it. The decoding is in the table building, as it is with the 68K core. The decoding does happen at run-time on each instruction invocation, but it has to be done. As to the THUMB core, I can create a 64K-entry lambda table to cover all possible encodings, and ... even though it's a cache killer, I've decided to go for it, given the outstanding performance it obtained in the M68K core, as well as considering that THUMB mode is far more common in GBA games. As to both cores ... I'm a little torn between two extremes: On the one hand, I can condense the number of ARM/THUMB instructions further to eliminate more redundant code. On the other, I can split them apart to reduce the number of conditional tests needed to execute each instruction. It's really the disassembler that makes me not want to split them up further ... as I have to split the disassembler functions up equally to the instruction functions. But it may be worth it if it's a speed improvement.
2017-08-07 12:20:35 +00:00
auto ARM7TDMI::SUB(uint32 source, uint32 modify, bool carry) -> uint32 {
return ADD(source, ~modify, carry);
}
auto ARM7TDMI::TST(uint4 mode) -> bool {
switch(mode) {
case 0: return cpsr().z == 1; //EQ (equal)
case 1: return cpsr().z == 0; //NE (not equal)
case 2: return cpsr().c == 1; //CS (carry set)
case 3: return cpsr().c == 0; //CC (carry clear)
case 4: return cpsr().n == 1; //MI (negative)
case 5: return cpsr().n == 0; //PL (positive)
case 6: return cpsr().v == 1; //VS (overflow)
case 7: return cpsr().v == 0; //VC (no overflow)
case 8: return cpsr().c == 1 && cpsr().z == 0; //HI (unsigned higher)
case 9: return cpsr().c == 0 || cpsr().z == 1; //LS (unsigned lower or same)
case 10: return cpsr().n == cpsr().v; //GE (signed greater than or equal)
case 11: return cpsr().n != cpsr().v; //LT (signed less than)
case 12: return cpsr().z == 0 && cpsr().n == cpsr().v; //GT (signed greater than)
case 13: return cpsr().z == 1 || cpsr().n != cpsr().v; //LE (signed less than or equal)
case 14: return true; //AL (always)
case 15: return false; //NV (never)
}
unreachable;
}