bsnes/higan/sfc/cpu/cpu.hpp

257 lines
5.3 KiB
C++
Raw Normal View History

struct CPU : Processor::R65816, Thread, public PPUcounter {
Updated to v067r21 release. byuu says: This moves toward a profile-selection mode. Right now, it is incomplete. There are three binaries, one for each profile. The GUI selection doesn't actually do anything yet. There will be a launcher in a future release that loads each profile's respective binary. I reverted away from blargg's SMP library for the time being, in favor of my own. This will fix most of the csnes/bsnes-performance bugs. This causes a 10% speed hit on 64-bit platforms, and a 15% speed hit on 32-bit platforms. I hope to be able to regain that speed in the future, I may also experiment with creating my own fast-SMP core which drops bus hold delays and TEST register support (never used by anything, ever.) Save states now work in all three cores, but they are not cross-compatible. The profile name is stored in the description field of the save states, and it won't load a state if the profile name doesn't match. The debugger only works on the research target for now. Give it time and it will return for the other targets. Other than that, let's please resume testing on all three once again. See how far we get this time :) I can confirm the following games have issues on the performance profile: - Armored Police Metal Jacket (minor logo flickering, not a big deal) - Chou Aniki (won't start, so obviously unplayable) - Robocop vs The Terminator (major in-game flickering, unplayable) Anyone still have that gigantic bsnes thread archive from the ZSNES forum? Maybe I posted about how to fix those two broken games in there, heh. I really want to release this as v1.0, but my better judgment says we need to give it another week. Damn.
2010-10-20 11:22:44 +00:00
enum : bool { Threaded = true };
auto interruptPending() const -> bool override;
auto pio() const -> uint8;
auto joylatch() const -> bool;
CPU();
Updated to v067r23 release. byuu says: Added missing $4200 IRQ lock, which fixes Chou Aniki on the fast CPU core, so slower PCs can get their brotherly love on. Added range-based controller IOBit latching to the fast CPU core, which enables Super Scope and Justifier support. Uses the priority queue as well, so there is zero speed-hit. Given the way range-testing works, the trigger point may vary by 1-2 pixels when firing at the same spot. Not really a big deal when it avoids a massive speed penalty. Fixed PAL and interlace-mode HVIRQs at V=0,H<2 on the fast CPU core. Added the dot-renderer's sprite list update-on-OAM-write functionality to the scanline-based PPU renderer. Unfortunately it looks like all the speed gain was already taken from the global dirty flag I was using before, but this certainly won't hurt speed any, so whatever. Added #ifdef to stop CoInitialize(0) on non-Windows ports. Added #ifdefs to stop gradient fade on Windows port. Not going to fuck over the Linux port aesthetic because of Qt bug #47,326,927. If there's a way to tell what Qt theme is being used, I can leave it enabled for XP/Vista themes. Moved HDMA trigger from 1104 to 1112, and reduced channel overhead from 24 to 16, to better simulate one-cycle DMA->CPU sync. Code clarity: I've re-added my varint.hpp classes, and am actively using them in the accuracy cores. So far, I haven't done anything that would detriment speed, but it is certainly cool. The APU ports exposed by the CPU and SMP now take uint2 address arguments, the CPU WRAM address register is a uint17, and the IRQ H/VTIME values are uint10. This basically allows the source to clearly convey the data sizes, and eliminates the need to manually mask values when writing to registers or reading from memory. I'm going to be doing this everywhere, and it will have a speed impact eventually, because the automation means we can't skip masks when we know the data is already masked off. Source: archive contains the launcher code, so that I can look into why it's crashing on XP tomorrow. It doesn't look like Circuit USA's flags are going to work too well with this new CPU core. Still not sure what the hell Robocop vs The Terminator is doing, I'll read through the mega SNES thread for clues tomorrow. Speedy Gonzales is definitely broken, as modifying the MDR was breaking things with my current core. Probably because the new CPU core doesn't wait for a cycle edge to trigger. I was thinking that perhaps we could keep some form of cheat codes list to work as game-specific hacks for the performance core. Keeps the hacks out of the emulator, but could allow the remaining bugs to be worked around for people who have no choice but to use the performance core.
2010-08-16 09:42:20 +00:00
alwaysinline auto step(uint clocks) -> void;
alwaysinline auto synchronizeSMP() -> void;
auto synchronizePPU() -> void;
auto synchronizeCoprocessors() -> void;
auto synchronizeDevices() -> void;
auto portRead(uint2 port) const -> uint8;
auto portWrite(uint2 port, uint8 data) -> void;
static auto Enter() -> void;
auto main() -> void;
auto enable() -> void;
auto power() -> void;
auto reset() -> void;
//dma.cpp
auto dmaAddClocks(uint clocks) -> void;
auto dmaTransferValid(uint8 bbus, uint24 abus) -> bool;
auto dmaAddressValid(uint24 abus) -> bool;
auto dmaRead(uint24 abus) -> uint8;
auto dmaWrite(bool valid, uint addr = 0, uint8 data = 0) -> void;
auto dmaTransfer(bool direction, uint8 bbus, uint24 abus) -> void;
auto dmaAddressB(uint n, uint channel) -> uint8;
auto dmaAddress(uint n) -> uint24;
auto hdmaAddress(uint n) -> uint24;
auto hdmaIndirectAddress(uint n) -> uint24;
auto dmaEnabledChannels() -> uint;
auto hdmaActive(uint n) -> bool;
auto hdmaActiveAfter(uint s) -> bool;
auto hdmaEnabledChannels() -> uint;
auto hdmaActiveChannels() -> uint;
auto dmaRun() -> void;
auto hdmaUpdate(uint n) -> void;
auto hdmaRun() -> void;
auto hdmaInitReset() -> void;
auto hdmaInit() -> void;
//memory.cpp
auto io() -> void override;
auto read(uint24 addr) -> uint8 override;
auto write(uint24 addr, uint8 data) -> void override;
alwaysinline auto speed(uint24 addr) const -> uint;
auto disassemblerRead(uint24 addr) -> uint8 override;
//mmio.cpp
auto apuPortRead(uint24 addr, uint8 data) -> uint8;
auto cpuPortRead(uint24 addr, uint8 data) -> uint8;
auto dmaPortRead(uint24 addr, uint8 data) -> uint8;
auto apuPortWrite(uint24 addr, uint8 data) -> void;
auto cpuPortWrite(uint24 addr, uint8 data) -> void;
auto dmaPortWrite(uint24 addr, uint8 data) -> void;
//timing.cpp
auto dmaCounter() const -> uint;
auto addClocks(uint clocks) -> void;
auto scanline() -> void;
alwaysinline auto aluEdge() -> void;
alwaysinline auto dmaEdge() -> void;
alwaysinline auto lastCycle() -> void;
//irq.cpp
alwaysinline auto pollInterrupts() -> void;
auto nmitimenUpdate(uint8 data) -> void;
auto rdnmi() -> bool;
auto timeup() -> bool;
alwaysinline auto nmiTest() -> bool;
alwaysinline auto irqTest() -> bool;
//joypad.cpp
auto stepAutoJoypadPoll() -> void;
//serialization.cpp
auto serialize(serializer&) -> void;
uint8 wram[128 * 1024];
vector<Thread*> coprocessors;
privileged:
uint cpu_version = 2; //allowed: 1, 2
struct Status {
bool interrupt_pending;
uint clock_count;
uint line_clocks;
//timing
bool irq_lock;
uint dram_refresh_position;
bool dram_refreshed;
uint hdma_init_position;
bool hdma_init_triggered;
uint hdma_position;
bool hdma_triggered;
bool nmi_valid;
bool nmi_line;
bool nmi_transition;
bool nmi_pending;
bool nmi_hold;
bool irq_valid;
bool irq_line;
bool irq_transition;
bool irq_pending;
bool irq_hold;
bool reset_pending;
//DMA
bool dma_active;
uint dma_counter;
uint dma_clocks;
bool dma_pending;
bool hdma_pending;
bool hdma_mode; //0 = init, 1 = run
Update to v073r01 release. byuu says: While perhaps not perfect, pretty good is better than nothing ... I've added emulation of auto-joypad poll timing. Going off ikari_01's confirmation of what we suspected, that the strobe happens every 256 clocks, I've set up emulation as follows: Upon reset, our clock counter is reset to zero. At the start of each frame, our poll counter is reset to zero. Every 256 clocks, we call the step_auto_joypad_poll() function. If we are at V=225/240+ (based on overscan setting), we check the poll counter. At zero, we poll the actual controller and set the joypad polling flag in $4212.d0 to 1. From zero through fifteen, we read in one bit for each controller and shift it into the register. At sixteen, we turn off the joypad polling flag. The 256-clock divider allows the start point of polling for each frame to fluctuate wildly like real hardware. I count regardless of auto joypad enable, as per $4212.d0's behavior; but only poll when it's actually enabled. I do not consume any actual time from this polling. I honestly don't know if I even should, or if it manages to do it in the background. If it should consume time, then this most likely happens between opcode edges and we'll have to adjust the code a good bit. All commercial games should continue to work fine, but this will likely break some hacks/translations not tested on hardware. Without the timing emulation, reading $4218-421f before V=~228 would basically give you the valid input controller values of the previous frame. Now, like hardware, it should give you a state that is part previous frame, part current frame shifted into it. Button positions won't be reliable and will shift every 256 clocks. I've also removed the Qt GUI, and renamed ui-phoenix to just ui. This removes 400kb of source code (phoenix is a lean 130kb), and drops the archive size from 564KB to 475KB. Combined with the DSP HLE, and we've knocked off ~570KB of source cruft from the entire project. I am looking forward to not having to specify which GUI is included anymore.
2010-12-27 07:29:57 +00:00
//auto joypad polling
bool auto_joypad_active;
bool auto_joypad_latch;
uint auto_joypad_counter;
uint auto_joypad_clock;
Update to v073r01 release. byuu says: While perhaps not perfect, pretty good is better than nothing ... I've added emulation of auto-joypad poll timing. Going off ikari_01's confirmation of what we suspected, that the strobe happens every 256 clocks, I've set up emulation as follows: Upon reset, our clock counter is reset to zero. At the start of each frame, our poll counter is reset to zero. Every 256 clocks, we call the step_auto_joypad_poll() function. If we are at V=225/240+ (based on overscan setting), we check the poll counter. At zero, we poll the actual controller and set the joypad polling flag in $4212.d0 to 1. From zero through fifteen, we read in one bit for each controller and shift it into the register. At sixteen, we turn off the joypad polling flag. The 256-clock divider allows the start point of polling for each frame to fluctuate wildly like real hardware. I count regardless of auto joypad enable, as per $4212.d0's behavior; but only poll when it's actually enabled. I do not consume any actual time from this polling. I honestly don't know if I even should, or if it manages to do it in the background. If it should consume time, then this most likely happens between opcode edges and we'll have to adjust the code a good bit. All commercial games should continue to work fine, but this will likely break some hacks/translations not tested on hardware. Without the timing emulation, reading $4218-421f before V=~228 would basically give you the valid input controller values of the previous frame. Now, like hardware, it should give you a state that is part previous frame, part current frame shifted into it. Button positions won't be reliable and will shift every 256 clocks. I've also removed the Qt GUI, and renamed ui-phoenix to just ui. This removes 400kb of source code (phoenix is a lean 130kb), and drops the archive size from 564KB to 475KB. Combined with the DSP HLE, and we've knocked off ~570KB of source cruft from the entire project. I am looking forward to not having to specify which GUI is included anymore.
2010-12-27 07:29:57 +00:00
//MMIO
Updated to v067r23 release. byuu says: Added missing $4200 IRQ lock, which fixes Chou Aniki on the fast CPU core, so slower PCs can get their brotherly love on. Added range-based controller IOBit latching to the fast CPU core, which enables Super Scope and Justifier support. Uses the priority queue as well, so there is zero speed-hit. Given the way range-testing works, the trigger point may vary by 1-2 pixels when firing at the same spot. Not really a big deal when it avoids a massive speed penalty. Fixed PAL and interlace-mode HVIRQs at V=0,H<2 on the fast CPU core. Added the dot-renderer's sprite list update-on-OAM-write functionality to the scanline-based PPU renderer. Unfortunately it looks like all the speed gain was already taken from the global dirty flag I was using before, but this certainly won't hurt speed any, so whatever. Added #ifdef to stop CoInitialize(0) on non-Windows ports. Added #ifdefs to stop gradient fade on Windows port. Not going to fuck over the Linux port aesthetic because of Qt bug #47,326,927. If there's a way to tell what Qt theme is being used, I can leave it enabled for XP/Vista themes. Moved HDMA trigger from 1104 to 1112, and reduced channel overhead from 24 to 16, to better simulate one-cycle DMA->CPU sync. Code clarity: I've re-added my varint.hpp classes, and am actively using them in the accuracy cores. So far, I haven't done anything that would detriment speed, but it is certainly cool. The APU ports exposed by the CPU and SMP now take uint2 address arguments, the CPU WRAM address register is a uint17, and the IRQ H/VTIME values are uint10. This basically allows the source to clearly convey the data sizes, and eliminates the need to manually mask values when writing to registers or reading from memory. I'm going to be doing this everywhere, and it will have a speed impact eventually, because the automation means we can't skip masks when we know the data is already masked off. Source: archive contains the launcher code, so that I can look into why it's crashing on XP tomorrow. It doesn't look like Circuit USA's flags are going to work too well with this new CPU core. Still not sure what the hell Robocop vs The Terminator is doing, I'll read through the mega SNES thread for clues tomorrow. Speedy Gonzales is definitely broken, as modifying the MDR was breaking things with my current core. Probably because the new CPU core doesn't wait for a cycle edge to trigger. I was thinking that perhaps we could keep some form of cheat codes list to work as game-specific hacks for the performance core. Keeps the hacks out of the emulator, but could allow the remaining bugs to be worked around for people who have no choice but to use the performance core.
2010-08-16 09:42:20 +00:00
//$2140-217f
uint8 port[4];
//$2181-$2183
Updated to v067r23 release. byuu says: Added missing $4200 IRQ lock, which fixes Chou Aniki on the fast CPU core, so slower PCs can get their brotherly love on. Added range-based controller IOBit latching to the fast CPU core, which enables Super Scope and Justifier support. Uses the priority queue as well, so there is zero speed-hit. Given the way range-testing works, the trigger point may vary by 1-2 pixels when firing at the same spot. Not really a big deal when it avoids a massive speed penalty. Fixed PAL and interlace-mode HVIRQs at V=0,H<2 on the fast CPU core. Added the dot-renderer's sprite list update-on-OAM-write functionality to the scanline-based PPU renderer. Unfortunately it looks like all the speed gain was already taken from the global dirty flag I was using before, but this certainly won't hurt speed any, so whatever. Added #ifdef to stop CoInitialize(0) on non-Windows ports. Added #ifdefs to stop gradient fade on Windows port. Not going to fuck over the Linux port aesthetic because of Qt bug #47,326,927. If there's a way to tell what Qt theme is being used, I can leave it enabled for XP/Vista themes. Moved HDMA trigger from 1104 to 1112, and reduced channel overhead from 24 to 16, to better simulate one-cycle DMA->CPU sync. Code clarity: I've re-added my varint.hpp classes, and am actively using them in the accuracy cores. So far, I haven't done anything that would detriment speed, but it is certainly cool. The APU ports exposed by the CPU and SMP now take uint2 address arguments, the CPU WRAM address register is a uint17, and the IRQ H/VTIME values are uint10. This basically allows the source to clearly convey the data sizes, and eliminates the need to manually mask values when writing to registers or reading from memory. I'm going to be doing this everywhere, and it will have a speed impact eventually, because the automation means we can't skip masks when we know the data is already masked off. Source: archive contains the launcher code, so that I can look into why it's crashing on XP tomorrow. It doesn't look like Circuit USA's flags are going to work too well with this new CPU core. Still not sure what the hell Robocop vs The Terminator is doing, I'll read through the mega SNES thread for clues tomorrow. Speedy Gonzales is definitely broken, as modifying the MDR was breaking things with my current core. Probably because the new CPU core doesn't wait for a cycle edge to trigger. I was thinking that perhaps we could keep some form of cheat codes list to work as game-specific hacks for the performance core. Keeps the hacks out of the emulator, but could allow the remaining bugs to be worked around for people who have no choice but to use the performance core.
2010-08-16 09:42:20 +00:00
uint17 wram_addr;
//$4016-$4017
bool joypad_strobe_latch;
uint32 joypad1_bits;
uint32 joypad2_bits;
//$4200
bool nmi_enabled;
bool hirq_enabled, virq_enabled;
bool auto_joypad_poll;
//$4201
uint8 pio;
//$4202-$4203
uint8 wrmpya;
uint8 wrmpyb;
//$4204-$4206
uint16 wrdiva;
Updated to v067r23 release. byuu says: Added missing $4200 IRQ lock, which fixes Chou Aniki on the fast CPU core, so slower PCs can get their brotherly love on. Added range-based controller IOBit latching to the fast CPU core, which enables Super Scope and Justifier support. Uses the priority queue as well, so there is zero speed-hit. Given the way range-testing works, the trigger point may vary by 1-2 pixels when firing at the same spot. Not really a big deal when it avoids a massive speed penalty. Fixed PAL and interlace-mode HVIRQs at V=0,H<2 on the fast CPU core. Added the dot-renderer's sprite list update-on-OAM-write functionality to the scanline-based PPU renderer. Unfortunately it looks like all the speed gain was already taken from the global dirty flag I was using before, but this certainly won't hurt speed any, so whatever. Added #ifdef to stop CoInitialize(0) on non-Windows ports. Added #ifdefs to stop gradient fade on Windows port. Not going to fuck over the Linux port aesthetic because of Qt bug #47,326,927. If there's a way to tell what Qt theme is being used, I can leave it enabled for XP/Vista themes. Moved HDMA trigger from 1104 to 1112, and reduced channel overhead from 24 to 16, to better simulate one-cycle DMA->CPU sync. Code clarity: I've re-added my varint.hpp classes, and am actively using them in the accuracy cores. So far, I haven't done anything that would detriment speed, but it is certainly cool. The APU ports exposed by the CPU and SMP now take uint2 address arguments, the CPU WRAM address register is a uint17, and the IRQ H/VTIME values are uint10. This basically allows the source to clearly convey the data sizes, and eliminates the need to manually mask values when writing to registers or reading from memory. I'm going to be doing this everywhere, and it will have a speed impact eventually, because the automation means we can't skip masks when we know the data is already masked off. Source: archive contains the launcher code, so that I can look into why it's crashing on XP tomorrow. It doesn't look like Circuit USA's flags are going to work too well with this new CPU core. Still not sure what the hell Robocop vs The Terminator is doing, I'll read through the mega SNES thread for clues tomorrow. Speedy Gonzales is definitely broken, as modifying the MDR was breaking things with my current core. Probably because the new CPU core doesn't wait for a cycle edge to trigger. I was thinking that perhaps we could keep some form of cheat codes list to work as game-specific hacks for the performance core. Keeps the hacks out of the emulator, but could allow the remaining bugs to be worked around for people who have no choice but to use the performance core.
2010-08-16 09:42:20 +00:00
uint8 wrdivb;
//$4207-$420a
uint9 hirq_pos;
uint9 virq_pos;
//$420d
uint rom_speed;
//$4214-$4217
uint16 rddiv;
uint16 rdmpy;
//$4218-$421f
Update to v073r01 release. byuu says: While perhaps not perfect, pretty good is better than nothing ... I've added emulation of auto-joypad poll timing. Going off ikari_01's confirmation of what we suspected, that the strobe happens every 256 clocks, I've set up emulation as follows: Upon reset, our clock counter is reset to zero. At the start of each frame, our poll counter is reset to zero. Every 256 clocks, we call the step_auto_joypad_poll() function. If we are at V=225/240+ (based on overscan setting), we check the poll counter. At zero, we poll the actual controller and set the joypad polling flag in $4212.d0 to 1. From zero through fifteen, we read in one bit for each controller and shift it into the register. At sixteen, we turn off the joypad polling flag. The 256-clock divider allows the start point of polling for each frame to fluctuate wildly like real hardware. I count regardless of auto joypad enable, as per $4212.d0's behavior; but only poll when it's actually enabled. I do not consume any actual time from this polling. I honestly don't know if I even should, or if it manages to do it in the background. If it should consume time, then this most likely happens between opcode edges and we'll have to adjust the code a good bit. All commercial games should continue to work fine, but this will likely break some hacks/translations not tested on hardware. Without the timing emulation, reading $4218-421f before V=~228 would basically give you the valid input controller values of the previous frame. Now, like hardware, it should give you a state that is part previous frame, part current frame shifted into it. Button positions won't be reliable and will shift every 256 clocks. I've also removed the Qt GUI, and renamed ui-phoenix to just ui. This removes 400kb of source code (phoenix is a lean 130kb), and drops the archive size from 564KB to 475KB. Combined with the DSP HLE, and we've knocked off ~570KB of source cruft from the entire project. I am looking forward to not having to specify which GUI is included anymore.
2010-12-27 07:29:57 +00:00
uint16 joy1;
uint16 joy2;
uint16 joy3;
uint16 joy4;
} status;
struct ALU {
uint mpyctr;
uint divctr;
uint shift;
} alu;
struct Channel {
//$420b
bool dma_enabled;
//$420c
bool hdma_enabled;
//$43x0
bool direction;
bool indirect;
bool unused;
bool reverse_transfer;
bool fixed_transfer;
uint3 transfer_mode;
//$43x1
uint8 dest_addr;
//$43x2-$43x3
uint16 source_addr;
//$43x4
uint8 source_bank;
//$43x5-$43x6
union {
uint16_t transfer_size;
uint16_t indirect_addr;
};
//$43x7
uint8 indirect_bank;
//$43x8-$43x9
uint16 hdma_addr;
//$43xa
uint8 line_counter;
//$43xb/$43xf
uint8 unknown;
//internal state
bool hdma_completed;
bool hdma_do_transfer;
} channel[8];
struct Pipe {
bool valid;
uint addr;
uint8 data;
} pipe;
struct Debugger {
hook<auto (uint24) -> void> op_exec;
hook<auto (uint24, uint8) -> void> op_read;
hook<auto (uint24, uint8) -> void> op_write;
hook<auto () -> void> op_nmi;
hook<auto () -> void> op_irq;
} debugger;
};
extern CPU cpu;