bsnes/snes/fast/dsp/dsp.hpp

183 lines
4.4 KiB
C++
Raw Normal View History

Updated to 20100813 release. byuu says: Since we're now talking about three splits, that's getting a bit out of hand. This WIP combines everything back into one project again. Added the src/fast folder that has all the speed-oriented cores. A slight slowdown to csnes from what it was before, I'm using blargg's accurate DSP. I just don't like the idea of releasing a less accurate DSP core than Snes9X v1.52 has. Plus the fast DSP core doesn't serialize yet. I moved back to snes_spc 0.9.0 because I care more about Tales and Star Ocean than I do about Earthworm Jim 2. So if you try EWJ2 on csnes, expect it to sound like it does on Snes9X. In other words, don't wear headphones if you value your hearing. The middle-of-the-road bsnes core uses blargg's accurate DSP, because it's about 3% faster than mine which removes all of blargg's optimizations. There is absolutely no accuracy loss here. bsnes v067.20 that is included should be equal to v067 official. Performance: Code: asnes = 58fps bsnes = 172fps +2.97x csnes = 274fps +1.59x +4.72x The binaries are not profiled, so that's an additional 15% slower from the previous builds. Save states only work on asnes, as I don't know how to serialize blargg's cores yet. The copy_func thing is very confusing to me for some reason. The debugger won't work anywhere. Outside of that, please go ahead and bug test. Once I get the debugger and save states working, I'll build some profiled v1.0 releases for all three, and we can test that for a bit and then release.
2010-10-20 11:20:39 +00:00
class DSP : public Processor {
public:
Updated to v067r21 release. byuu says: This moves toward a profile-selection mode. Right now, it is incomplete. There are three binaries, one for each profile. The GUI selection doesn't actually do anything yet. There will be a launcher in a future release that loads each profile's respective binary. I reverted away from blargg's SMP library for the time being, in favor of my own. This will fix most of the csnes/bsnes-performance bugs. This causes a 10% speed hit on 64-bit platforms, and a 15% speed hit on 32-bit platforms. I hope to be able to regain that speed in the future, I may also experiment with creating my own fast-SMP core which drops bus hold delays and TEST register support (never used by anything, ever.) Save states now work in all three cores, but they are not cross-compatible. The profile name is stored in the description field of the save states, and it won't load a state if the profile name doesn't match. The debugger only works on the research target for now. Give it time and it will return for the other targets. Other than that, let's please resume testing on all three once again. See how far we get this time :) I can confirm the following games have issues on the performance profile: - Armored Police Metal Jacket (minor logo flickering, not a big deal) - Chou Aniki (won't start, so obviously unplayable) - Robocop vs The Terminator (major in-game flickering, unplayable) Anyone still have that gigantic bsnes thread archive from the ZSNES forum? Maybe I posted about how to fix those two broken games in there, heh. I really want to release this as v1.0, but my better judgment says we need to give it another week. Damn.
2010-10-20 11:22:44 +00:00
enum : bool { Threaded = false };
Updated to 20100813 release. byuu says: Since we're now talking about three splits, that's getting a bit out of hand. This WIP combines everything back into one project again. Added the src/fast folder that has all the speed-oriented cores. A slight slowdown to csnes from what it was before, I'm using blargg's accurate DSP. I just don't like the idea of releasing a less accurate DSP core than Snes9X v1.52 has. Plus the fast DSP core doesn't serialize yet. I moved back to snes_spc 0.9.0 because I care more about Tales and Star Ocean than I do about Earthworm Jim 2. So if you try EWJ2 on csnes, expect it to sound like it does on Snes9X. In other words, don't wear headphones if you value your hearing. The middle-of-the-road bsnes core uses blargg's accurate DSP, because it's about 3% faster than mine which removes all of blargg's optimizations. There is absolutely no accuracy loss here. bsnes v067.20 that is included should be equal to v067 official. Performance: Code: asnes = 58fps bsnes = 172fps +2.97x csnes = 274fps +1.59x +4.72x The binaries are not profiled, so that's an additional 15% slower from the previous builds. Save states only work on asnes, as I don't know how to serialize blargg's cores yet. The copy_func thing is very confusing to me for some reason. The debugger won't work anywhere. Outside of that, please go ahead and bug test. Once I get the debugger and save states working, I'll build some profiled v1.0 releases for all three, and we can test that for a bit and then release.
2010-10-20 11:20:39 +00:00
alwaysinline void step(unsigned clocks);
alwaysinline void synchronize_smp();
uint8 read(uint8 addr);
void write(uint8 addr, uint8 data);
Updated to v067r21 release. byuu says: This moves toward a profile-selection mode. Right now, it is incomplete. There are three binaries, one for each profile. The GUI selection doesn't actually do anything yet. There will be a launcher in a future release that loads each profile's respective binary. I reverted away from blargg's SMP library for the time being, in favor of my own. This will fix most of the csnes/bsnes-performance bugs. This causes a 10% speed hit on 64-bit platforms, and a 15% speed hit on 32-bit platforms. I hope to be able to regain that speed in the future, I may also experiment with creating my own fast-SMP core which drops bus hold delays and TEST register support (never used by anything, ever.) Save states now work in all three cores, but they are not cross-compatible. The profile name is stored in the description field of the save states, and it won't load a state if the profile name doesn't match. The debugger only works on the research target for now. Give it time and it will return for the other targets. Other than that, let's please resume testing on all three once again. See how far we get this time :) I can confirm the following games have issues on the performance profile: - Armored Police Metal Jacket (minor logo flickering, not a big deal) - Chou Aniki (won't start, so obviously unplayable) - Robocop vs The Terminator (major in-game flickering, unplayable) Anyone still have that gigantic bsnes thread archive from the ZSNES forum? Maybe I posted about how to fix those two broken games in there, heh. I really want to release this as v1.0, but my better judgment says we need to give it another week. Damn.
2010-10-20 11:22:44 +00:00
void enter();
Updated to 20100813 release. byuu says: Since we're now talking about three splits, that's getting a bit out of hand. This WIP combines everything back into one project again. Added the src/fast folder that has all the speed-oriented cores. A slight slowdown to csnes from what it was before, I'm using blargg's accurate DSP. I just don't like the idea of releasing a less accurate DSP core than Snes9X v1.52 has. Plus the fast DSP core doesn't serialize yet. I moved back to snes_spc 0.9.0 because I care more about Tales and Star Ocean than I do about Earthworm Jim 2. So if you try EWJ2 on csnes, expect it to sound like it does on Snes9X. In other words, don't wear headphones if you value your hearing. The middle-of-the-road bsnes core uses blargg's accurate DSP, because it's about 3% faster than mine which removes all of blargg's optimizations. There is absolutely no accuracy loss here. bsnes v067.20 that is included should be equal to v067 official. Performance: Code: asnes = 58fps bsnes = 172fps +2.97x csnes = 274fps +1.59x +4.72x The binaries are not profiled, so that's an additional 15% slower from the previous builds. Save states only work on asnes, as I don't know how to serialize blargg's cores yet. The copy_func thing is very confusing to me for some reason. The debugger won't work anywhere. Outside of that, please go ahead and bug test. Once I get the debugger and save states working, I'll build some profiled v1.0 releases for all three, and we can test that for a bit and then release.
2010-10-20 11:20:39 +00:00
void power();
void reset();
void serialize(serializer&);
DSP();
~DSP();
private:
Updated to v067r25 release. byuu says: Removed snes_spc, and the fast/smp + fast/dsp wrappers around it. Cloned dsp to fast/dsp, and re-added the state machine, affects Compatibility and Performance cores. Added debugger support to fast/cpu, with full properties list and Qt debugger functionality. Rewrote all debugger property functions to return data directly: - this avoids some annoying conflicts where ChipDebugger::foo() overshadows Chip::foo() - this removes the need for an extra 20-200 functions per debugger core - this makes the overall code size a good bit smaller - this currently makes PPU::oam_basesize() inaccessible, so the OAM viewer will show wrong sprite sizes Used an evil trick to simplify MMIO read/write address decoding: - MMIO *mmio[0x8000], where only 0x2000-5fff are used, allows direct indexing without -0x2000 adjust So end result: both save states and debugger support work on all three cores now. Dual Orb II sound is fixed. The speed hit was worse than I thought, -7% for compatibility, and -10% for performance. At this point, the compatibility core is the exact same code and speed as v067 official, and the performance core is now only ~36-40% faster than the compatibility core. Sigh, so much for my dream of using this on my netbook. At 53fps average now, compared to 39fps before. Profiling will only get that to ~58fps, and that's way too low for the more intensive scenes (Zelda 3 rain, CT black omen, etc.) It would probably be a good idea to find out why my DSP is so much slower than blargg's, given that it's based upon the same code. The simple ring buffer stuff can't possibly slow things down that much. More precisely, it would probably be best to leave blargg's DSP in the performance core since it's a pretty minor issue, but then I'd have to have three DSPs: accuracy=threaded, compatibility=state-machine, performance=blargg. Too much hassle. Only code in the core emulator now that wasn't at the very least rewritten for bsnes would be the DSP-3 and DSP-4 modules, which are really, really lazily done #define hacks around the original C code.
2010-08-19 06:54:15 +00:00
unsigned phase;
//global registers
enum global_reg_t {
r_mvoll = 0x0c, r_mvolr = 0x1c,
r_evoll = 0x2c, r_evolr = 0x3c,
r_kon = 0x4c, r_koff = 0x5c,
r_flg = 0x6c, r_endx = 0x7c,
r_efb = 0x0d, r_pmon = 0x2d,
r_non = 0x3d, r_eon = 0x4d,
r_dir = 0x5d, r_esa = 0x6d,
r_edl = 0x7d, r_fir = 0x0f, //8 coefficients at 0x0f, 0x1f, ... 0x7f
};
//voice registers
enum voice_reg_t {
v_voll = 0x00, v_volr = 0x01,
v_pitchl = 0x02, v_pitchh = 0x03,
v_srcn = 0x04, v_adsr0 = 0x05,
v_adsr1 = 0x06, v_gain = 0x07,
v_envx = 0x08, v_outx = 0x09,
};
//internal envelope modes
enum env_mode_t { env_release, env_attack, env_decay, env_sustain };
//internal constants
enum { echo_hist_size = 8 };
enum { brr_buf_size = 12 };
enum { brr_block_size = 9 };
//global state
struct state_t {
uint8 regs[128];
modulo_array<int, echo_hist_size> echo_hist[2]; //echo history keeps most recent 8 samples
int echo_hist_pos;
bool every_other_sample; //toggles every sample
int kon; //KON value when last checked
int noise;
int counter;
int echo_offset; //offset from ESA in echo buffer
int echo_length; //number of bytes that echo_offset will stop at
//hidden registers also written to when main register is written to
int new_kon;
int endx_buf;
int envx_buf;
int outx_buf;
//temporary state between clocks
//read once per sample
int t_pmon;
int t_non;
int t_eon;
int t_dir;
int t_koff;
//read a few clocks ahead before used
int t_brr_next_addr;
int t_adsr0;
int t_brr_header;
int t_brr_byte;
int t_srcn;
int t_esa;
int t_echo_disabled;
//internal state that is recalculated every sample
int t_dir_addr;
int t_pitch;
int t_output;
int t_looped;
int t_echo_ptr;
//left/right sums
int t_main_out[2];
int t_echo_out[2];
int t_echo_in [2];
} state;
//voice state
struct voice_t {
modulo_array<int, brr_buf_size> buffer; //decoded samples
int buf_pos; //place in buffer where next samples will be decoded
int interp_pos; //relative fractional position in sample (0x1000 = 1.0)
int brr_addr; //address of current BRR block
int brr_offset; //current decoding offset in BRR block
int vbit; //bitmask for voice: 0x01 for voice 0, 0x02 for voice 1, etc
int vidx; //voice channel register index: 0x00 for voice 0, 0x10 for voice 1, etc
int kon_delay; //KON delay/current setup phase
int env_mode;
int env; //current envelope level
int t_envx_out;
int hidden_env; //used by GAIN mode 7, very obscure quirk
} voice[8];
//gaussian
static const int16 gaussian_table[512];
int gaussian_interpolate(const voice_t &v);
//counter
enum { counter_range = 2048 * 5 * 3 }; //30720 (0x7800)
static const uint16 counter_rate[32];
static const uint16 counter_offset[32];
void counter_tick();
bool counter_poll(unsigned rate);
//envelope
void envelope_run(voice_t &v);
//brr
void brr_decode(voice_t &v);
//misc
void misc_27();
void misc_28();
void misc_29();
void misc_30();
//voice
void voice_output(voice_t &v, bool channel);
void voice_1 (voice_t &v);
void voice_2 (voice_t &v);
void voice_3 (voice_t &v);
void voice_3a(voice_t &v);
void voice_3b(voice_t &v);
void voice_3c(voice_t &v);
void voice_4 (voice_t &v);
void voice_5 (voice_t &v);
void voice_6 (voice_t &v);
void voice_7 (voice_t &v);
void voice_8 (voice_t &v);
void voice_9 (voice_t &v);
//echo
int calc_fir(int i, bool channel);
int echo_output(bool channel);
void echo_read(bool channel);
void echo_write(bool channel);
void echo_22();
void echo_23();
void echo_24();
void echo_25();
void echo_26();
void echo_27();
void echo_28();
void echo_29();
void echo_30();
//dsp
static void Enter();
alwaysinline void tick();
friend class DSPDebugger;
Updated to 20100813 release. byuu says: Since we're now talking about three splits, that's getting a bit out of hand. This WIP combines everything back into one project again. Added the src/fast folder that has all the speed-oriented cores. A slight slowdown to csnes from what it was before, I'm using blargg's accurate DSP. I just don't like the idea of releasing a less accurate DSP core than Snes9X v1.52 has. Plus the fast DSP core doesn't serialize yet. I moved back to snes_spc 0.9.0 because I care more about Tales and Star Ocean than I do about Earthworm Jim 2. So if you try EWJ2 on csnes, expect it to sound like it does on Snes9X. In other words, don't wear headphones if you value your hearing. The middle-of-the-road bsnes core uses blargg's accurate DSP, because it's about 3% faster than mine which removes all of blargg's optimizations. There is absolutely no accuracy loss here. bsnes v067.20 that is included should be equal to v067 official. Performance: Code: asnes = 58fps bsnes = 172fps +2.97x csnes = 274fps +1.59x +4.72x The binaries are not profiled, so that's an additional 15% slower from the previous builds. Save states only work on asnes, as I don't know how to serialize blargg's cores yet. The copy_func thing is very confusing to me for some reason. The debugger won't work anywhere. Outside of that, please go ahead and bug test. Once I get the debugger and save states working, I'll build some profiled v1.0 releases for all three, and we can test that for a bit and then release.
2010-10-20 11:20:39 +00:00
};
Updated to v067r25 release. byuu says: Removed snes_spc, and the fast/smp + fast/dsp wrappers around it. Cloned dsp to fast/dsp, and re-added the state machine, affects Compatibility and Performance cores. Added debugger support to fast/cpu, with full properties list and Qt debugger functionality. Rewrote all debugger property functions to return data directly: - this avoids some annoying conflicts where ChipDebugger::foo() overshadows Chip::foo() - this removes the need for an extra 20-200 functions per debugger core - this makes the overall code size a good bit smaller - this currently makes PPU::oam_basesize() inaccessible, so the OAM viewer will show wrong sprite sizes Used an evil trick to simplify MMIO read/write address decoding: - MMIO *mmio[0x8000], where only 0x2000-5fff are used, allows direct indexing without -0x2000 adjust So end result: both save states and debugger support work on all three cores now. Dual Orb II sound is fixed. The speed hit was worse than I thought, -7% for compatibility, and -10% for performance. At this point, the compatibility core is the exact same code and speed as v067 official, and the performance core is now only ~36-40% faster than the compatibility core. Sigh, so much for my dream of using this on my netbook. At 53fps average now, compared to 39fps before. Profiling will only get that to ~58fps, and that's way too low for the more intensive scenes (Zelda 3 rain, CT black omen, etc.) It would probably be a good idea to find out why my DSP is so much slower than blargg's, given that it's based upon the same code. The simple ring buffer stuff can't possibly slow things down that much. More precisely, it would probably be best to leave blargg's DSP in the performance core since it's a pretty minor issue, but then I'd have to have three DSPs: accuracy=threaded, compatibility=state-machine, performance=blargg. Too much hassle. Only code in the core emulator now that wasn't at the very least rewritten for bsnes would be the DSP-3 and DSP-4 modules, which are really, really lazily done #define hacks around the original C code.
2010-08-19 06:54:15 +00:00
#if defined(DEBUGGER)
#include "debugger/debugger.hpp"
extern DSPDebugger dsp;
#else
extern DSP dsp;
#endif