(there was no r23 release posted to the WIP thread)
byuu says:
Fixes OAM latchdata for all 3 cores and CGRAM address invalidation for
the accuracy core (the only one that really does it because it needs
cycle PPU timing.) Also removes the OAM/CGRAM read/write functions to
put them inside the actual MMIO functions. Easier to handle all the
special cases that way. VRAM is a bit more annoying since there's two
read and two write functions.
This should fix battletech/mechwarrior 3050 palette at least.
byuu says:
Wanted to torture myself, so I implemented the hardest window of all,
the cheat code editor. I had to sacrifice checkboxes inside lists,
unfortunately, but it's a necessary evil. Maybe some day if we can port
checkboxes inside list items to Windows and GTK+, we can add it back. It
should be the only really apparent GUI sacrifice, though. You toggle
cheats by double-clicking them in the list. Easy to do, but not
apparent.
I also added in the focus policy.
byuu says:
Changelog:
- adds synchronize video
- adds synchronize audio
- adds mute audio
- adds advanced settings window with driver selection only
- adds the pretty section header thing I started going toward with Qt
- adds a configuration file, saves to bsnes-phoenix.cfg (I don't make
the .bsnes folder yet)
- the status bar shows [A] for accuracy, [C] for compatibility and [P]
for carburetor
byuu says:
Changelog:
- fixed window casting crash in phoenix
- added perfect forwarding to nall::string variadic templates to fix
file load dialog crash in phoenix
- disabled copy constructors in utf8_t to prevent this problem from
occurring again in the future
- separated canvas window proc by creating a separate class for it
(ironically it was a desktop window causing the first crash)
- use processorArchitecture="*" to make compilation easier
- fixed status bar font assignment in phoenix/Windows
- added InitCommonControls + CoInitialize for XAudio2 on XP
- had to use DirectSound for audio; XAudio2 is crashing on exit which
breaks the profiling (only a problem because you can't change the
drivers without recompiling, there's really no reason to profile
XAudio2 anyway)
byuu says:
This adds proper manifest files to get the nice XP/Vista controls. Need
to find a way to auto-detect MinGW 32 vs 64 since I can't use $shell or
`` on gcc -v on Windows. For now you have to edit the
ui-phoenix/Makefile by hand.
I've implemented the video settings window. I am going to be using
separate windows this time. As nice as having everything in one place
was, I didn't like being forced to stretch things to fill out the
one-size-fits-all tab window I was using before. That and I don't feel
like implementing tab support with phoenix anyway.
The menu gets a load cartridge command, and bsnes writes save RAM files
now. Loading by file dialog crashes on 64-bit. Something's fucked up
there, but I don't know what. Again, help would be great here :)
byuu says:
This WIP fixes the Mode7 repeat issue in the accuracy core.
More importantly, it's the first build to include phoenix. There is
a stub GUI that does basically nothing right now. It will give you
a window, a command to close the emulator, and an FPS meter so you can
tell how fast it is. To load a ROM, you have to drag the ROM on top of
the binary. I don't know if it will work if the filename+path has spaces
in it or not, so avoid that to be safe.
[...]
For some reason, the 64-bit binary sometimes crashes on start, maybe 1:6
times. So just keep trying. I don't know what's up with that, I'd
appreciate if someone here wanted to debug that for me though :D
One really good bit of news, there was that old hiro bug where keyboard
input would cause the main window to beep. I spied on the main event
loop and, as suspected, the status bar was getting focus and rejecting
key presses. What. The. Fuck. Why would a status bar ever need focus? So
I set WM_DISABLED on it, which luckily leaves the font color alone.
I also had to use WM_DISABLED on the Viewport widget that I use for
video output. These two combined let me have my main window with no
keyboard beeping AND allow tab+shift-tab to work as you'd expect on
other windows, so hooray.
Now, at the moment there's no Manifest included, because Microsoft for
some reason includes the processorArcitecture in the file. So I can't
use the same manifest for 32-bit and 64-bit mode, or the binary will
crash on one or the other. Fuck. So the status bar may look old-school
or something, whatever, it's only temporary.
Next up, my goal is to avoid the hiro icon corruption bullshit by making
phoenix itself try and use an internal resource icon. So just compile
your app with that resource icon and voila, perfect icon. Not in there
yet so you get the white box.
Input is hard-coded, up/down/left/right/z/x/a/s/d/c/apostrophe/return.
Lastly, compilation is ... in a serious state of flux. The code is set
to compile bsnes/phoenix-gtk right now. Try it at your own risk. Give me
a few WIPs to get everything nice and refined. Ubuntu users will need
gcc-4.5, which you can get by adding the Maverick Meerkat repository,
updating apt, installing the gcc-4.5 + g++-4.5 packages, and then
removing and re-updating your apt/sources.list file so you don't end up
fucking your whole system when you run apt again in the future.
For anyone who can work with all of that, great! Please post a framerate
comparison between 32-bit and 64-bit builds. Any game, any screen, so
long as the FPS is not fluctuating when you measure it (eg don't do it
during an attract sequence.)
If anyone complains about the 64-bit binary not working and it turns out
they are on 32-bit Windows, they are going to be removed from this WIP
forum :P
byuu says:
This hopefully fixes vertical and horizontal mosaic with the accurate
S-PPU core.
It also adds frame-skipping support to the compatibility and performance
S-PPU cores. This is not visible to the user interface, because I feel
that virtually nobody is going to play with frame-skipping on when they
could use a faster emulation with it off. However, I did use it to
enhance the 'Fast Forward' key. When you press it, it will now turn off
video and audio sync, and set a frameskip of 9. This allows you to speed
things up an additional ~60% over your max theoretical speed without
frameskipping. ~35% with the compatible core, because the PPU is less of
a bottleneck for it. Slowdown was improved, it will turn on audio sync
if it is off until you release the key to ensure you actually get 30fps.
Eventually it'd probably be wise to set a cap on the fast forward speed,
but I hardly think it's an issue on today's computers.
byuu says:
Testing this one and all the new features would be appreciated.
New features?
http://img203.imageshack.us/img203/505/bsnes20100907.jpg
[For future reference, the above URL points to a screenshot of bsnes' Qt
GUI running with the following windows (apart from the main window) visible:
- A new 'state selection' dialog wih buttons labelled 'Slot 1' through
'Slot 10'.
- The 'Tools' dialog open to a new 'Effect Toggle' tab, which has
checkboxes for the various PPU backgrounds and sprites, and SPU
channels.
- The 'Configuration Settings' dialog open to the 'Advanced' tab, with
a "Use Native OS File Dialogs" checkbox.
- A 'Load Cartridge' dialog, which is the standard Windows file-open
dialog.
It seems these are the new features referred to. -- Ed]
byuu says:
This adds a new "Effect Toggle" window to the tools tab list. There's no
menu entry for it yet, oversight, but you can go to another tool and tab
to it.
The effect toggle window lets you toggle background/OAM layers on
a per-priority basis and DSP sound channels. This only works with the
compatibility and performance cores, because I'm not going to allow
accuracy-violating hacks like that into the core.
Non-essential icons have been removed, there's six of them left at the
moment. I am too much of a consistency guy to only have some scattered
around. I know other apps do that, but I'm not going to do that, and
I grow tired of trying to hammer in icons that don't really represent
the actions. Anyway, it still looks pretty good I think.
byuu says:
Holy hell, that was a total brain twister. After hours of crazy bit
twiddling and debug printf's, I finally figured out how to allow both
lores and hires scrolling in the accurate PPU renderer. In the process,
I modified the main loop to run from -7 to 255, regardless of the hires
setting, and perform X adjustment inside the tile fetching. This fixed
a strange main/subscreen misalignment issue, so I was able to restore
the proper sub-then-main rendering for the final screen output stage.
Code looks a good bit cleaner this way overall.
I also added load state and save state menus to the tools menu, so you
can use the menubar to load and save to ten slots. I am thinking that
I should nuke the icons. As pretty as they are, it's getting tiresome
trying to find icons for everything, I have no pictures to represent
loading or saving a slot, nor to represent individual slots. I'll just
stick to radios and checkboxes.
byuu says:
Bug-fix night for the new PPUs.
Accuracy:
Fixed BG palette clamping, which fixes Taz-Mania.
Added blocking for CGRAM writes during active display, to match the
compatibility core. It really should override to the last fetched
palette color, I'll probably try that out soon enough.
Performance:
Mosaic should match the other renderers. Unfortunately, as suspected, it
murders speed. 290->275fps. It's now only 11fps faster, hardly worth it
at all. But the old rendering code is really awful, so maybe it's for
the best it gets refreshed.
It's really tough to understand why this is such a performance hit, it's
just a decrement+compare check four times per pixel. But yeah, it hits
it really, really hard.
Fixed a missing check in Mode4 offset-per-tile, fixes vertical alignment
of a test image in the SNES Test Program.
(there was no r11 release posted to the WIP thread)
byuu says:
This took ten hours of mind boggling insanity to pull off.
It upgrades the S-PPU dot-based renderer to fetch one tile, and then
output all of its pixels before fetching again. It sounds easy enough,
but it's insanely difficult. I ended up taking one small shortcut, in
that rather than fetch at -7, I fetch at the first instance where a tile
is needed to plot to x=0. So if you have {-3 to +4 } as a tile, it
fetches at -3. That won't work so well on hardware, if two BGs fetch at
the same X offset, they won't have time.
I have had no luck staggering the reads at BG1=-7, BG3=-5, etc. While
I can shift and fetch just fine, what happens is that when a new tile is
fetched in, that gives a new palette, priority, etc; and this ends up
happening between two tiles which results in the right-most edges of the
screen ending up with the wrong colors and such.
Offset-per-tile is cheap as always. Although looking at it, I'm not sure
how BG3 could pre-fetch, especially with the way one or two OPT modes
can fetch two tiles.
There's no magic in Hoffset caching yet, so the SMW1 pixel issue is
still there.
Mode 7 got a bugfix, it was off-by-one horizontally from the mosaic
code. After re-designing the BG mosaic, I ended up needing a separate
mosaic for Mode7, and in the process I fixed that bug. The obvious
change is that the Chrono Trigger Mode7->Mode2 transition doesn't cause
the pendulum to jump anymore.
Windows were simplified just a tad. The range testing is shared for all
modes now. Ironically, it's a bit slower, but I'll take less code over
more speed for the accuracy core.
Speaking of speed, because there's so much less calculations per pixel
for BGs, performance for the entire emulator has gone up by 30% in the
accuracy core. Pretty neat overall, I can maintain 60fps in all but,
yeah you can guess can't you?
(there was no r09 release posted to the WIP thread)
byuu says:
It is feature-complete, but horizontal mosaic is less accurate. I have
an idea for a mosaic color ring buffer to get it equally accurate, but
I haven't implemented it yet. For now it's just a simple x & ~(mosaic >>
1) trick that is passable.
Hires blending was left out, as it's more processor intensive and
blargg's NTSC does a better job with that anyway.
There's some OPT vertical positioning issues in the SNES Test Program's
character test; Goodbye, Anthrox has some sort of fast CPU DMA issue;
etc.
Total speedup is a mere 13.5%. Not quite the 50% I wanted in the best
case, but I'll take what I can get.
254->289fps in Zelda 3 on my E8400 now. There's another 15% hiding with
blargg's SMP and 5-10% with blargg's fast DSP, but they lose too much
accuracy. It'd put me at or below Snes9X accuracy, while still being 50%
slower.
SSE2 was performing worse this time, both on x86 and amd64, so I left
that optimization off.
So, barring a miracle, this is about the best it's going to get.
byuu says:
This gets the basic new PPU skeleton up and running, still missing
a lot:
- Mode7
- direct color mode
- OAM color exemption (this one will impact performance negatively)
- vertical mosaic
- horizontal mosaic (this one may impact performance negatively)
- offset per tile
- interlace
- hires
- pseudo-hires
But it's correct enough to play most games okay.
So far, the new PPU is about 11% faster on my Atom, and 17% faster on my
E8400. I was hoping for more, but the window masking and sprite
calculation is just kicking my ass.
The 11/17 figure is total emulator overhead, so that means raw PPU vs
PPU, the new one is at least 22-34% faster than the old one.
I don't really have any ideas for additional optimizations. I'm even
using little-endian word reads where applicable. But at any rate, I need
to get all the above implemented correctly before trying to push
optimizations even further.
(there was no r06 release posted to the WIP thread)
byuu says:
New PPU renderer is coming along. Lots of new ideas, especially with
regards to the way the background renders in tiles rather than in
pixels. That skips the need for tile caching and compares, and it even
gets scrolling right.
The sprite item+tile lists are computing, but they are not rendering
yet.
It's a good deal faster for now, obviously, because 90% of the PPU
features are missing.
byuu says:
Started over again on the new S-PPU, heh. Three hours wasted last night
I guess.
I wanted to make the new one more like the accurate core in its
structure: multiple class instances for each background. It makes the
code easier to read, results in less structure insanity
(regs.main_enabled[bg] -> main_enabled), and will probably make it a tad
easier to parallelize if we want to go that route eg for the ARM.
Still very incomplete.
(there was no r03 release posted to the WIP thread)
byuu says:
This should provide hardware-accurate mosaic support in the accurate
renderer, with the exception that I'm still not sure what mid-frame
vertical mosaic or mid-scanline horizontal mosaic writes do. Either the
code I have is correct, or it bypasses the mosaic adjust and gives the
exact H/V positions.
I've also renamed the fast folder to alternative (thinking about naming
it simply alt instead), and started on a brand new PPU renderer. So far
it's just a barebones setup with some MMIO support and VRAM/OAM/CGRAM
writing. I'm not even confident that I can get this to be faster than
the current scanline renderer, but it's the only avenue that we have
left for any kind of significant bsnes speedup, so I have to try. I'm
going to finish up the MMIO stuff first, that way we have a clean slate
with no actual rendering. And then from here we can try various
different approaches.
byuu says:
This adds mosaic improvements to the S-PPU dot renderer. Specifically,
it eliminates the mosaic_table entirely, and performs mosaic adjustment
per pixel instead. It also moves from a mosaic countdown for mosaic Y to
a mosaic counter (incrementing).
In the process, I realized Sim Earth's map was broken, so I fixed that.
In doing so, I also fixed my old Mode7 demo that was always off-by-one,
causing different results on real hardware versus emulation. But then
I broke both Final Fantasy 5 and Air Strike Patrol effects that use
Mode7 but no mosaic.
I'm not really sure what's going on, but I think I am close. This is the
first time I can reproduce the Mode7 test ROM results without screwing
with M7Y which was obviously wrong. I think that somehow a mosaic >=
1 is glitching the Ycounter for the BG layers to tick one extra time.
There's a workaround that's not very nice to get everything going right
now. It could very well be that the workaround is hardware accurate, but
I can't help but feel there's a more eloquent way of doing this.
byuu says:
This adds RTS/CTS support to the serial communications emulation. Okay,
well the PC acts as if it is always ready, because it always is even on
the real thing, but the PC-waiting-for-SNES side works.
Source only, hardware communication only works on OS X and Linux
(Windows serial communication is totally different, I don't feel like
writing a Windows version), more documentation will come later.
byuu describes the changes since v067:
This release officially introduces the accuracy and performance cores,
alongside the previously-existing compatibility core. The accuracy core
allows the most accurate SNES emulation ever seen, with every last
processor running at the lowest possible clock synchronization level.
The performance core allows slower computers the chance to finally use
bsnes. It is capable of attaining 60fps in standard games even on an
entry-level Intel Atom processor, commonly found in netbooks.
The accuracy core is absolutely not meant for casual gaming at all. It
is meant solely for getting as close to 100% perfection as possible, no
matter the cost to speed. It should only be used for testing,
development or debugging.
The compatibility core is identical to bsnes v067 and earlier, but is
now roughly 10% faster. This is the default and recommended core for
casual gaming.
The performance core contains an entirely new S-CPU core, with
range-tested IRQs; and uses blargg's heavily-optimized S-DSP core
directly. Although there are very minor accuracy tradeoffs to increase
speed, I am confident that the performance core is still more accurate
and compatible than any other SNES emulator. The S-CPU, S-SMP, S-DSP,
SuperFX and SA-1 processors are all clock-based, just as in the accuracy
and compatibility cores; and as always, there are zero game-specific
hacks. Its compatibility is still well above 99%, running even the most
challenging games flawlessly.
If you have held off from using bsnes in the past due to its system
requirements, please give the performance core a try. I think you will
be impressed. I'm also not finished: I believe performance can be
increased even further.
I would also strongly suggest Windows Vista and Windows 7 users to take
advantage of the new XAudio2 driver by OV2. Not only does it give you
a performance boost, it also lowers latency and provides better sound by
way of skipping an API emulation layer.
Changelog:
- Split core into three profiles: accuracy, compatibility and
performance
- Accuracy core now takes advantage of variable-bitlength integers (eg
uint24_t)
- Performance core uses a new S-CPU core, written from scratch for speed
- Performance core uses blargg's snes_dsp library for S-DSP emulation
- Binaries are now compiled using GCC 4.5
- Added a workaround in the SA-1 core for a bug in GCC 4.5+
- The clock-based S-PPU renderer has greatly improved OAM emulation;
fixing Winter Gold and Megalomania rendering issues
- Corrected pseudo-hires color math in the clock-based S-PPU renderer;
fixing Super Buster Bros backgrounds
- Fixed a clamping bug in the Cx4 16-bit triangle operation [Jonas
Quinn]; fixing Mega Man X2 "gained weapon" star background effect
- Updated video renderer to properly handle mixed-resolution screens
with interlace enabled; fixing Air Strike Patrol level briefing screen
- Added mightymo's 2010-08-19 cheat code pack
- Windows port: added XAudio2 output support [OV2]
- Source: major code restructuring; virtual base classes for processor
- cores removed, build system heavily modified, etc.
byuu says:
Re-added blargg's DSP, but this time I merged only spc_dsp and also
fixed the initial regs for Dual Orb II. Which restores the 5-10% speedup
on the compatibility and performance cores. Fixed the initial register
values for the fast CPU core, which fixes Armored Police and the other
Atlus game's title screen in the performance core. Added a missing
debugvirtual prefix to op_step, which fixes CPU stepping in the
performance core. I was using the description field for profile
identification in savestates, but that was of course the description for
the state manager. Whoops. Added a new field for profile name to the
save states, and fixed the state manager to work with that change.
Adjusted the about screen colors, which is how you can tell which core
you're using without it being annoying. Probably did some other stuff
too, meh.
byuu says:
Removed snes_spc, and the fast/smp + fast/dsp wrappers around it.
Cloned dsp to fast/dsp, and re-added the state machine, affects
Compatibility and Performance cores.
Added debugger support to fast/cpu, with full properties list and Qt
debugger functionality.
Rewrote all debugger property functions to return data directly:
- this avoids some annoying conflicts where ChipDebugger::foo()
overshadows Chip::foo()
- this removes the need for an extra 20-200 functions per debugger
core
- this makes the overall code size a good bit smaller
- this currently makes PPU::oam_basesize() inaccessible, so the OAM
viewer will show wrong sprite sizes
Used an evil trick to simplify MMIO read/write address decoding:
- MMIO *mmio[0x8000], where only 0x2000-5fff are used, allows direct
indexing without -0x2000 adjust
So end result: both save states and debugger support work on all three
cores now. Dual Orb II sound is fixed. The speed hit was worse than
I thought, -7% for compatibility, and -10% for performance. At this
point, the compatibility core is the exact same code and speed as v067
official, and the performance core is now only ~36-40% faster than the
compatibility core. Sigh, so much for my dream of using this on my
netbook. At 53fps average now, compared to 39fps before. Profiling will
only get that to ~58fps, and that's way too low for the more intensive
scenes (Zelda 3 rain, CT black omen, etc.)
It would probably be a good idea to find out why my DSP is so much
slower than blargg's, given that it's based upon the same code. The
simple ring buffer stuff can't possibly slow things down that much.
More precisely, it would probably be best to leave blargg's DSP in the
performance core since it's a pretty minor issue, but then I'd have to
have three DSPs: accuracy=threaded, compatibility=state-machine,
performance=blargg. Too much hassle.
Only code in the core emulator now that wasn't at the very least
rewritten for bsnes would be the DSP-3 and DSP-4 modules, which are
really, really lazily done #define hacks around the original C code.
byuu says:
Fixed bsnes launcher on Windows XP
Fixed Windows bsnes launcher internationalization support (emulator can
be in a folder with spaces and Japanese characters, and you can drag
a Japanese file name onto the launcher, and it will load it properly)
Moved fast CPU to use a switch table for MMIO, unfortunately for no
speed gain
Bus::read/write take uint24 parameters for address, luckily no speed
penalty
MMIOAccess gained a handle() function, and hid the mmio[] table. Makes
hooking it cleaner
Added malloc.h header to nall/function.hpp to fix a ridiculous GCC 4.5.0
error
Fixed a fairly large bug in the fast CPU IRQ handler, which fixes
Robocop et al
Forgot to bump revision to .24 in the compiled binaries, too lazy to
recompile or hex edit to change them
Unfortunately, in order to add nice battery usage, I have to add the
sleep calls to the video and audio wait loops. But they don't know
anything about the GUI and its settings, nor do I really want to make
them know about this setting. I do not want to force allow it. Even with
the media timer trick, Sleep(0) makes Vsync+Async fail a lot more
frequently than never sleeping at all. I would rather laptop users
suffer 100% utilization of a single core than for all users to not be
able to get good audio+video sync. Not sure what to do about that, so
I'll probably just remove the battery usage comment from performance
mode for now.
byuu says:
Added missing $4200 IRQ lock, which fixes Chou Aniki on the fast CPU
core, so slower PCs can get their brotherly love on.
Added range-based controller IOBit latching to the fast CPU core, which
enables Super Scope and Justifier support. Uses the priority queue as
well, so there is zero speed-hit. Given the way range-testing works, the
trigger point may vary by 1-2 pixels when firing at the same spot. Not
really a big deal when it avoids a massive speed penalty.
Fixed PAL and interlace-mode HVIRQs at V=0,H<2 on the fast CPU core.
Added the dot-renderer's sprite list update-on-OAM-write functionality
to the scanline-based PPU renderer. Unfortunately it looks like all the
speed gain was already taken from the global dirty flag I was using
before, but this certainly won't hurt speed any, so whatever.
Added #ifdef to stop CoInitialize(0) on non-Windows ports.
Added #ifdefs to stop gradient fade on Windows port. Not going to fuck
over the Linux port aesthetic because of Qt bug #47,326,927. If there's
a way to tell what Qt theme is being used, I can leave it enabled for
XP/Vista themes.
Moved HDMA trigger from 1104 to 1112, and reduced channel overhead from
24 to 16, to better simulate one-cycle DMA->CPU sync.
Code clarity: I've re-added my varint.hpp classes, and am actively using
them in the accuracy cores. So far, I haven't done anything that would
detriment speed, but it is certainly cool. The APU ports exposed by the
CPU and SMP now take uint2 address arguments, the CPU WRAM address
register is a uint17, and the IRQ H/VTIME values are uint10. This
basically allows the source to clearly convey the data sizes, and
eliminates the need to manually mask values when writing to registers or
reading from memory. I'm going to be doing this everywhere, and it will
have a speed impact eventually, because the automation means we can't
skip masks when we know the data is already masked off.
Source: archive contains the launcher code, so that I can look into why
it's crashing on XP tomorrow.
It doesn't look like Circuit USA's flags are going to work too well with
this new CPU core. Still not sure what the hell Robocop vs The
Terminator is doing, I'll read through the mega SNES thread for clues
tomorrow. Speedy Gonzales is definitely broken, as modifying the MDR was
breaking things with my current core. Probably because the new CPU core
doesn't wait for a cycle edge to trigger.
I was thinking that perhaps we could keep some form of cheat codes list
to work as game-specific hacks for the performance core. Keeps the hacks
out of the emulator, but could allow the remaining bugs to be worked
around for people who have no choice but to use the performance core.
byuu says:
Added OV2's XAudio2 driver (it's better and faster than the DirectSound
one)
Fixed DirectInput keypad number codes
Added launcher to make the profiles work
Profiles now called: Accuracy, Compatibility, Performance (not debating
names anymore)
The launcher isn't going to work on OS X because of the .app folder
bullshit (yes, yes, .sfc folders.)
It also crashes on Windows XP for god only knows what reason. Works fine
on Windows 7 and Linux. So XP users, rename the .dll files to .exe to
test this release. I'll fix it on Monday.
The color highlighting fucks up the radio boxes on the Windows classic
theme, because Nokia can't afford a god damn QA team.
Lastly, I forgot to add launcher to the make archive-all command, so the
source for it will be in the next WIP.
byuu says:
This moves toward a profile-selection mode. Right now, it is incomplete.
There are three binaries, one for each profile. The GUI selection
doesn't actually do anything yet. There will be a launcher in a future
release that loads each profile's respective binary.
I reverted away from blargg's SMP library for the time being, in favor
of my own. This will fix most of the csnes/bsnes-performance bugs. This
causes a 10% speed hit on 64-bit platforms, and a 15% speed hit on
32-bit platforms. I hope to be able to regain that speed in the future,
I may also experiment with creating my own fast-SMP core which drops bus
hold delays and TEST register support (never used by anything, ever.)
Save states now work in all three cores, but they are not
cross-compatible. The profile name is stored in the description field of
the save states, and it won't load a state if the profile name doesn't
match.
The debugger only works on the research target for now. Give it time and
it will return for the other targets.
Other than that, let's please resume testing on all three once again.
See how far we get this time :)
I can confirm the following games have issues on the performance
profile:
- Armored Police Metal Jacket (minor logo flickering, not a big deal)
- Chou Aniki (won't start, so obviously unplayable)
- Robocop vs The Terminator (major in-game flickering, unplayable)
Anyone still have that gigantic bsnes thread archive from the ZSNES
forum? Maybe I posted about how to fix those two broken games in there,
heh.
I really want to release this as v1.0, but my better judgment says we
need to give it another week. Damn.
byuu says:
Since we're now talking about three splits, that's getting a bit out of
hand.
This WIP combines everything back into one project again. Added the
src/fast folder that has all the speed-oriented cores.
A slight slowdown to csnes from what it was before, I'm using blargg's
accurate DSP. I just don't like the idea of releasing a less accurate
DSP core than Snes9X v1.52 has. Plus the fast DSP core doesn't serialize
yet.
I moved back to snes_spc 0.9.0 because I care more about Tales and Star
Ocean than I do about Earthworm Jim 2. So if you try EWJ2 on csnes,
expect it to sound like it does on Snes9X. In other words, don't wear
headphones if you value your hearing.
The middle-of-the-road bsnes core uses blargg's accurate DSP, because
it's about 3% faster than mine which removes all of blargg's
optimizations. There is absolutely no accuracy loss here. bsnes v067.20
that is included should be equal to v067 official.
Performance:
Code:
asnes = 58fps
bsnes = 172fps +2.97x
csnes = 274fps +1.59x +4.72x
The binaries are not profiled, so that's an additional 15% slower from
the previous builds.
Save states only work on asnes, as I don't know how to serialize
blargg's cores yet. The copy_func thing is very confusing to me for some
reason. The debugger won't work anywhere.
Outside of that, please go ahead and bug test. Once I get the debugger
and save states working, I'll build some profiled v1.0 releases for all
three, and we can test that for a bit and then release.
byuu says:
12-15% faster than v067.10, and my Atom never goes below 58fps for
normal lo-res games at this point. Just a little more and I can leave
Async on. That's pretty much it though for the low hanging fruit.
Everything else will be a lot of work for a little gain. Speedups are
from range testing across scanline boundaries and from using blargg's
fast DSP core.
Snes9X is now only 1.93x faster than bsnes, and bsnes is now faster than
Super Sleuth.
I also fixed the Circuit USA menus (HDMA timing adjustment), Wild Guns
flickering (IRQ lock) and Jumpin' Derby (external IRQ triggering.)
There's definitely a lot of troublesome games, mostly the same ones we
had in the past (Koushien 2, Robocop vs The Terminator, etc.) I'm
definitely going to debug Starfox, but I may not bother with some of the
more obscure ones.
byuu says:
I wrote a new CPU core from scratch. It has range-based IRQs, and is
good enough even to run F-1 Grand Prix and Sink or Swim. It also uses
a binary min-heap array for the timing priority queue. This resulted in
a ~40% speedup.
I also added in blargg's snes_spc library, which is an S-SMP + S-DSP
emulator. I am still using his accurate DSP core, and not the fast one.
This gives an additional ~10% speedup.
THIS IS NOT PERFECT, THERE WILL BE BUGS!
I already know that Tales of Phantasia and Star Ocean are hitting some
edge cases. Now that it's fast enough, hopefully blargg can take a look
at it. Something he couldn't test before because you can't rip SPCs of
these games, so it's probably something simple.
My CPU core also doesn't nail every last possible edge case. So things
like Wild Guns and the two or three games that rely on NMI/IRQ hold
aren't going to work ... yet. Be patient.
The SuperFX and SA-1 cores are still cycle-accurate. It wouldn't hurt
compatibility to reduce their precision a bit.
End result is that you can now get well over 60fps in normal games even
n a first-generation Intel Atom.
byuu says:
This adds some sync.sh improvements to make it handle errors more
gracefully.
It also updates asnes a good bit. All of the four base processors now
have all publicly accessible functions right at the top of the main
headers, and everything else is private. This is to allow these headers
to essentially take the place of the previous base classes in the old
bsnes-merged format. So if there's something public there, you need to
implement that exact function to make your own module.
I removed the frame counter from the PPU, as it has nothing to do with
emulation. That now resides inside the Qt -> SNES interface code. Quite
amazing, I was actually saving the frame counter into the save state
files before, yuck.
Removed some baggage in the System class: it was friending a bunch of
long-dead functions and classes.
Forgot to re-add the CHEAT_SYSTEM define to info.hpp, so that's been put
back.
byuu says:
This fixes libsnes and debugger builds, and collapses bsnes/ppu/bppu to
bsnes/ppu and bsnes/dsp/sdsp to bsnes/dsp. It also introduces
bsnes/sync.sh, which will synchronize all of asnes/ with bsnes/,
excepting the custom speed-focused modules. So far, that's bsnes/ppu
(scanline renderer) and bsnes/dsp (state machine.)
Should make keeping the two ports in sync much, much easier. It's
basically the same thing as before, only you run sync.sh and have a few
duplicated folders now. May make it clearer by creating a stub/ or src/
folder inside bsnes to do all of the copying, so that you only see the
custom folders in bsnes/' root directory.