byuu says:
- bsnes: reset all thread clocks on power cycle
- bsnes: use uint64 instead of uint128 for scheduler clocks
- bsnes: use float instead of double for audio resampling
- bsnes: begin work of integrating SameBoy (incomplete; needs
additional features)
byuu says:
- converted (u)int(8,16,32,64) from Natural/Integer<T> to
(u)int(8,16,32,64)_t types
- SFC: mostly rewritten WDC65816 CPU core
- removed 487KiB of code! (unused CPU cores from other higan cores)
byuu says:
- bsnes: allow video filtering even when the emulator is paused
- bsnes: improve overscan masking, especially with HD mode 7
- bsnes: improve snow support, especially with HD mode 7
- bsnes: replace real-time cheat code replace with per-frame replace
(ala Pro Action Replay, Snes9X)
- bsnes: treat the latter step() half of CPU::read() calls as idle
cycles
- bsnes: templatize step() where possible (not always practical)
- bsnes: removed Natural<T> templates from key portions of the fast
PPU renderer
- bsnes: dethreaded peripherals (controllers and expansion port
devices)
- bsnes: above optimizations result in a ~20-25% speedup over v107.4
with no accuracy loss
Note that light guns aren't going to work for now, I'll have to fix them
before we can release v108.
byuu says:
- bsnes: added video filters from bsnes v082
- bsnes: added ZSNES snow effect option when games paused or unloaded
(no, I'm not joking)
- bsnes: added 7-zip support (LZMA 19.00 SDK)
[Recent higan WIPs have also mentioned bsnes changes, although the higan code
no longer includes the bsnes code. These changes include:
- higan, bsnes: added EXLOROM, EXLOROM-RAM, EXHIROM mappings
- higan, bsnes: focus the viewport after leaving fullscreen exclusive
mode
- bsnes: re-added mightymo's cheat code database
- bsnes: improved make install rules for the game and cheat code
databases
- bsnes: delayed construction of hiro::Window objects to properly show
bsnes window icons
- Ed.]
byuu says:
Changes: HD mode 7 supersampling support, HD mode 7 mosaic disable option,
various HD mode 7 bugfixes, default waveOut audio latency to 128 instead of 512,
removed 512x240 hires mode 7 mode.
There's also a small experiment, making this release a beta release as well:
for a large speedup, when in EXTBG mode, I'm bypassing rendering BG1 for a
performance boost. EXTBG is only used as a priority layer, and is overwritten by
BG2 except in one extremely pathological case.
byuu says:
Added DerKoun's HD mode 7 (up to 2160p), ~100fps boost for fast forwarding,
configurable latency settings for waveOut (please configure this yourself),
filename case insensitivity, and a few other things.
byuu says:
Don't let the point release fool you, there are many significant changes in this
release. I will be keeping bsnes releases using a point system until the new
higan release is ready.
Changelog:
- GUI: added high DPI support
- GUI: fixed the state manager image preview
- Windows: added a new waveOut driver with support for dynamic rate control
- Windows: corrected the XAudio 2.1 dynamic rate control support [BearOso]
- Windows: corrected the Direct3D 9.0 fullscreen exclusive window centering
- Windows: fixed XInput controller support on Windows 10
- SFC: added high-level emulation for the DSP1, DSP2, DSP4, ST010, and Cx4
coprocessors
- SFC: fixed a slight rendering glitch in the intro to Megalomania
If the coprocessor firmware is missing, bsnes will fallback on HLE where it is
supported, which is everything other than SD Gundam GX and the two Hayazashi
Nidan Morita Shougi games.
The Windows dynamic rate control works best with Direct3D in fullscreen
exclusive mode. I recommend the waveOut driver over the XAudio 2.1 driver, as it
is not possible to target a single XAudio2 version on all Windows OS releases.
The waveOut driver should work everywhere out of the box.
Note that with DRC, the synchronization source is your monitor, so you will
want to be running at 60hz (NTSC) or 50hz (PAL). If you have an adaptive sync
monitor, you should instead use the WASAPI (exclusive) or ASIO audio driver.
[This is specifically a release of bsnes, not the whole higan suite, even though
it contains all the higan source. -Ed.]
byuu says:
Today I am posting the first release of the new bsnes emulator.
bsnes is designed to be a revival of the classic bsnes design, focusing
specifically on performance and ease of use for SNES emulation.
In addition to all of the features of higan, bsnes supports the following
features:
- 300% faster (than higan) scanline-based, multi-threaded graphics renderer
- option to disable sprite limits in games
- option to enable hires mode 7 graphics
- option to enable more accurate pixel-based graphics renderer
- option to overclock SuperFX games by up to 800%
- periodic auto-saving of game save RAM
- save state manager with state screenshots
- several new save state hotkeys such as increment/decrement slot#
- option to auto-save states when unloading a game or closing the emulator
- option to auto-load aforementioned states when loading games
- save state undo and redo support (with associated hotkeys)
- speed override modes (50%, 75%, 100%, 150%, 200%)
- recent games list
- frame advance mode
- screenshot hotkey
- path selection for games, patches, saves, cheats, states, and screenshots
- dynamic video, audio, input driver changes
- direct loading and playing of games without the use of the higan library
- ZIP archive and multiple file extension support for games
- firmware folder for unappended coprocessor firmware (see documentation for
more)
- compatibility with sd2snes and Snes9X MSU1 game file naming
- compatibility with higan gamepaks (game folders)
- soft-patching support for both BPS and IPS patches
- menubar that does not pause emulation when entered
- video pixel shaders (requires OpenGL 3.2)
- built-in game database with over 1,200 games to ensure perfect memory
mapping
- (Linux, BSD only:) audio dynamic rate control to eliminate stuttering
- and much more!
The one feature I regret not being able to support in this release is Windows
dynamic rate control. I put in my best attempt, but XAudio2's API is simply not
fine-grained enough, and the WASAPI driver is not mature enough. I hope that DRC
support can be added to the Windows port in the near future, and I would like to
offer a large cash bounty to anyone who can help me make this happen.
byuu says:
The bad instruction was due to the instruction before it fetching one
too many bytes. Didn't notice right away as the disassembler got it
right.
The register map was incorrect on the active 16-bit flags.
I fixed and improved some other things along those lines. Hooked up some
basic KnGE (VPU) timings, made it print out VRAM and some of the WRAM
onto the screen each frame, tried to drive Vblank and Hblank IRQs, but
... I don't know for sure what vector addresses they belong to.
MAME says "INT4" for Vblank, and says nothing for Hblank. I am wildly
guessing INT4==SWI 4==0xffff10, but ... I have no idea. I'm also not
emulating the interrupts properly based on line levels, I'm just firing
on the 0→1 transitions. Sounds like Vblank is more nuanced too, but I
guess we'll see.
Emulation is running further along now, even to the point of it
successfully enabling the KnGE IRQs, but VRAM doesn't appear to get much
useful stuff written into it yet.
I reverted the nall/primitive changes, so request for testing is I guess
rescinded, for whatever it was worth.
byuu says:
Changelog:
- fixed a few TLCS900H CPU and disassembler bugs
- hooked up a basic Neo Geo Pocket emulator skeleton and memory map;
can run a few instructions from the BIOS
- emulated the flash memory used by Neo Geo Pocket games
- added sourcery to the higan source archives
- fixed ternary expressions in sfc/ppu-fast [hex_usr]
byuu says:
Changelog:
- reverted nall/inline-if.hpp usage for now, since the
nall/primitives.hpp math operators still cast to (u)int64_t
- improved nall/primitives.hpp more; integer8 x = -128; print(-x) will
now print 128 (unary operator+ and - cast to (u)int64_t)
- renamed processor/lr35902 to processor/sm83; after the Sharp SM83
CPU core [gekkio discovered the name]
- a few bugfixes to the TLCS900H CPU core
- completed the disassembler for the TLCS900H core
As a result of reverting most of the inline if stuff, I guess the
testing priority has been reduced. Which is probably a good thing,
considering I seem to have a smaller pool of testers these days.
Indeed, the TLCS900H core has ended up at 131KiB compared to the M68000
core at 128KiB. So it's now the largest CPU core in all of higan. It's
even more ridiculous because the M68000 core would ordinarily be quite a
bit smaller, had I not gone overboard with the extreme templating to
reduce instruction decoding overhead (you kind of have to do this for
RISC CPUs, and the inverted design of the TLCS900H kind of makes it
infeasible to do the same there.)
This CPU core is bound to have dozens of extremely difficult CPU bugs,
and there's no easy way for me to test them. I would greatly appreciate
any help in looking over the core for bugs. A fresh pair of eyes to spot
a mistake could save me up to several days of tedious debugging work.
The core still isn't ready to actually be tested: I have to hook up
cartridge loading, a memory bus, interrupts, timers, and the micro DMA
controller before it's likely that anything happens at all.
byuu says:
Half of the disassembler is implemented now. Well, the decoding half
anyway. I'm splitting the decoding and string building into separate
components this time around, on account of the instruction encoding
being in reverse order. The string building portion hasn't been written
yet, either.
We're up to 112KiB now, compared to 128KiB for the 68K.
byuu says:
First 32 instructions implemented in the TLCS900H disassembler. Only 992
to go!
I removed the use of anonymous namespaces in nall. It was something I
rarely used, because it rarely did what I wanted.
I updated all nested namespaces to use C++17-style namespace Foo::Bar {}
syntax instead of classic C++-style namespace Foo { namespace Bar {}}.
I updated ruby::Video::acquire() to return a struct, so we can use C++17
structured bindings. Long term, I want to get away from all functions
that take references for output only. Even though C++ botched structured
bindings by not allowing you to bind to existing variables, it's even
worse to have function calls that take arguments by reference and then
write to them. From the caller side, you can't tell the value is being
written, nor that the value passed in doesn't matter, which is terrible.
byuu says:
Any usage of natural and integer cast to 64-bit math operations now.
Hopefully this will be the last of the major changes for a bit on
nall/primitives, at least until serious work begins on removing implicit
conversion to primitive types.
I also completed the initial TLCS900H core, sans SWI (kind of a ways off
from support interrupts.) I really shouldn't say completed, though. The
micro DMA unit is missing, interrupt priority handling is missing,
there's no debugger, and, of course, there's surely dozens of absolutely
critical CPU bugs that are going to be an absolute hellscape nightmare
to track down.
It was a damn shame, right up until the very last eight instructions,
[CP|LD][I|D](R), the instruction encoding was consistent. Of course,
there could be other inconsistencies that I missed. In fact, that's
somewhat likely ... sigh.
byuu says:
This WIP is just work on nall/primitives ...
Basically, I'm coming to the conclusion that it's just not practical to
try and make Natural/Integer implicitly castable to primitive signed and
unsigned integers. C++ just has too many edge cases there.
I also want to get away from the problem of C++ deciding that all math
operations return 32-bit values, unless one of the parameters is 64-bit,
in which case you get a 64-bit value. You know, so things like
array[-1] won't end up accessing the 4 billionth element of the array.
It's nice to be fancy and minimally size operations (eg 32-bit+32-bit =
33-bit), but it's just too unintuitive. I think all
Natural<X>+Natural<Y> expessions should result in a Natural<64> (eg
natural) type.
nall/primitives/operators.hpp has been removed, and new
Natural<>Natural / Integer<>Integer casts exist. My feeling is that
signed and unsigned types should not be implicitly convertible where
data loss can occur. In the future, I think an integer8*natural8 is
fine to return an integer64, and the bitwise operators are probably all
fine between the two types. I could probably add
(Integer,Natural)+Boolean conversions as well.
To simplify expressions, there are new user-defined literals for _b
(boolean), _n (natural), _i (integer), _r (real), _n# (eg _n8),
_i# (eg _i8), _r# (eg _r32), and _s (nall::string).
In the long-term, my intention is to make the conversion and cast
constructors explicit for primitive types, but obviously that'll shatter
most of higan, so for now that won't be the case.
Something I can do in the future is allow implicit conversion and
casting to (u)int64_t. That may be a nice balance.
byuu says:
I've implemented a lot more TLCS900H instructions. There are currently
20 missing spots, all of which are unique instructions (well, MINC and
MDEC could be considered pairs of 3 each), from a map of 1024 slots.
After that, I have to write the disassembler. Then the memory bus. Then
I get to start the fun process of debugging this monstrosity.
Also new is nall/inline-if.hpp. Note that this file is technically a war
crime, so be careful when opening it. This replaces ternary() from the
previous WIP.
byuu says:
So this turned out to be a rather unproductive ten-hour rabbit hole, but
...
I reworked nall/primitives.hpp a lot. And because the changes are
massive, testing of this WIP for regressions is critically important. I
really can't stress that enough, we're almost certainly going to have
some hidden regressions here ...
We now have a nall/primitives/ subfolder that splits up the classes into
manageable components. The bit-field support is now shared between both
Natural and Integer. All of the assignment operator overloads are now
templated and take references instead of values. Things like the
GSU::Register class are non-copyable on account of the function<>
object inside of it, and previously only operator= would work with
classes like that.
The big change is nall/primitives/operators.hpp, which is a really
elaborate system to compute the minimum number of bits needed for any
operation, and to return a Natural<T> or Integer<T> when one or both of
the arguments are such a type.
Unfortunately, it doesn't really work yet ... Kirby's Dream Land 3
breaks if we include operators.hpp. Zelda 3 runs fine with this, but I
had to make a huge amount of core changes, including introducing a new
ternary(bool, lhs, rhs) function to nall/algorithm to get past
Natural<X> and Natural<Y> not being equivalent (is_integral types get a
special exemption to ternary ?: type equivalence, yet it's impossible to
simulate with our own classes, which is bullshit.) The horrifying part
is that ternary() will evaluate both lhs and rhs, unlike ?:
I converted some of the functions to test ? uint(x) : uint(y), and
others to ternary(test, x, y) ... I don't have a strong preference
either way yet.
But the part where things may have gotten broken is in the changes to
where ternary() was placed. Some cases like in the GBA PPU renderer, it
was rather unclear the order of evaluations, so I may have made a
mistake somewhere.
So again, please please test this if you can. Or even better, look over
the diff.
Longer-term, I'd really like the enable nall/primitives/operators.hpp,
but right now I'm not sure why Kirby's Dream Land 3 is breaking. Help
would be appreciated, but ... it's gonna be really complex and difficult
to debug, so I'm probably gonna be on my own here ... sigh.
byuu says:
I added some useful new functions to nall/primitives:
auto Natural<T>::integer() const -> Integer<T>;
auto Integer<T>::natural() const -> Natural<T>;
These let you cast between signed and unsigned representation without
having to care about the value of T (eg if you take a Natural<T> as a
template parameter.) So for instance when you're given an unsigned type
but it's supposed to be a sign-extended type (example: signed
multiplication), eg Natural<T> → Integer<T>, you can just say:
x = y.integer() * z.integer();
The TLCS900H core gained some more pesky instructions such as DAA, BS1F,
BS1B.
I stole an optimization from RACE for calculating the overflow flag on
addition. Assuming: z = x + y + c;
Before: ~(x ^ y) & (x ^ z) & signBit;
After: (x ^ z) & (y ^ z) & signBit;
Subtraction stays the same. Assuming: z = x - y - c;
Same: (x ^ y) & (x ^ z) & signBit;
However, taking a speed penalty, I've implemented the carry computation
in a way that doesn't require an extra bit.
Adding before:
uint9 z = x + y + c;
c = z & 0x100;
Subtracting before:
uint9 z = x - y - c;
c = z & 0x100;
Adding after:
uint8 z = x + y + c;
c = z < x || z == x && c;
Subtracting after:
uint8 z = x - y - c;
c = z > x || z == x && c;
I haven't been able to code golf the new carry computation to be any
shorter, unless I include an extra bit, eg for adding:
c = z < x + c;
But that defeats the entire point of the change. I want the computation
to work even when T is uintmax_t.
If anyone can come up with a faster method, please let me know.
Anyway ... I also had to split off INC and DEC because they compute
flags differently (word and long modes don't set flags at all, byte mode
doesn't set carry at all.)
I also added division by zero support, although I don't know if it's
actually hardware accurate. It's what other emulators do, though.
byuu says:
So tired ... so much left to do still ... sigh.
If someone's up for some code golf, open to suggestions on how to handle
the INTNEST control register. It's the only pure 16-bit register on the
system, and it breaks my `map`/`load`/`store<uint8,16,32>` abstraction.
Basically what I suspect happens is when you access INTNEST in 32-bit
mode, the upper 16-bits are just undefined (probably zero.) But
`map<uint32>(INTNEST)` must return a uint32& or nothing at all. So for the
time being, I'm just making store(ControlRegister) check if it's the
INTNEST ID, and clearing the upper bits of the written byte in that
case. It's hacky, but ... it's the best I can think of.
I added LDX, which is a 900H-only instruction, and the control register
map is for the 900/H CPU. I found the detailed differences between the
CPUs, and it doesn't look likely that I'm gonna support the 900 or
900/H1 at all. Not that there was a reason to anyway, but it's nice to
support more stuff when possible. Oh well.
The 4-byte instruction fetch queue is going to have to get implemented
inside fetch, or just not implemented at all ... not like I'd be able to
figure out the details of it anyway.
The manual isn't clear on how the MULA flags are calculated, but since
MUL doesn't set any flags, I assume the flags are based on the addition
after the multiplication, eg:
uint32 a = indirect<int16>(XDE) * indirect<int16>(XHL);
uint32 b = reg16; //opcode parameter
uint32 c = a + b; //flags set based on a+b
No idea if it's right or not. It doesn't set carry or half-carry, so
it's not just simply the same as calling algorithmAdd.
Up to almost 70KB, not even halfway done, don't even have a disassembler
started yet. There's a real chance this could overtake the 68K for the
biggest CPU core in higan, although at this point I'm still thinking the
68K will end up larger.
byuu says:
So I spent the better part of eight hours refactoring the TLCS900H core
to be more flexible in light of new edge cases.
I'm now including the size information inside of the types (eg
Register<Byte>, Memory<Word>) rather than as parameters to the
instruction handlers. This allows me to eg implement RETI without
needing template arguments in all the instructions. pop(SR), pop(PC) can
deduce how much to pop off of the stack. It's still highly templated,
but not unrolling the 3-bit register indexes and instead going through
the switch table to access registers is going to hurt the performance a
good deal.
A benefit of this is that Register{A} != Register{WA} != Register{XWA}
anymore, despite them sharing IDs.
I also renamed read/write to load/store for the CPU core, because
implicit conversions are nasty. They all call the virtual read/write.
I added more instructions, improved the memory addressing mode support,
and some other things.
I got rid of Byte, Word, Long because there's too many alternate sizes
needed: int8, int16, uint24, etc.
Ran into a really annoying C++ case ...
struct TLCS900H {
template<typename T> auto store(Register<T> target, T source) -> void;
};
If you call store(Register<uint32>(x), uint16(y)); it errors out since
the T types don't match. But you can't specialize it:
template<typename T, typename U> auto store(Register<T>, U) -> void;
template<typename U> auto TLCS900H::store<uint32, U>(Register<uint32>, U) -> void;
Because somehow it's 2019 and we still can't do partial template
specialization inside classes ...
So as a result, I had to make T source be type uint32 even for
Register<uint8> and Register<uint16>. Doesn't matter too much, just
annoying.
byuu says:
This probably won't fix the use of register yet (I imagine ruby and hiro
will complain now), but ... oh well, it's a start. We'll get it
compiling again eventually.
I added JP, JR, JRL, LD instructions this time around. I'm also starting
to feel that Byte, Word, Long labels for the TLCS900H aren't really
working. There's cases of needing uint24, int8, int16, ... it may just
be better to name the types instead of trying to be fancy.
At this point, all of the easy instructions are in. Now it's down to a
whole lot of very awkward bit-manipulation and special-use instructions.
Sigh.
byuu says:
For this WIP, I added more TLCS900H instructions. All of the
ADC,ADD,SBB/SBC,SUB,AND,OR,XOR.CP,PUSH,POP instructions are in.
Still an incredible amount of work left to do on this core ... it has all kinds
of novel instructions that aren't on any other processors.
Still no disassembler support yet, so I can't even test what I'm doing. Fun!
byuu says:
I started working on the Toshiba TLCS900H CPU core today.
It's basically, "what if we took the Z80, added in 32-bit support, added
in SPARC register windows, added a ton of additional addressing modes,
added control registers, and added a bunch of additional instructions?"
-- or in other words, it's basically hell for me.
It took several hours just to wrap my head around the way the opcode
decoder needed to function, but I think I have a decent strategy for
implementing it now.
I should have all of the first-byte register/memory address decoding in
place, although I'm sure there's lots of bugs. I don't have anything in
the way of a disassembler yet.
byuu says:
Changelog:
- Interface::displays() -> vector<Display> → Interface::display() -> Display
- <Platform::videoRefresh(display>, ...) → <Platform::videoFrame>(...)
- <Platform::audioSample>(...) → <Platform::audioFrame>(...)
- higan, icarus: use AboutDialog class instead of ad-hoc
implementations
- about dialog is now modal, but now has a clickable website URL
- icarus: reverted if constexpr for now
- MSX: implemented basic CPU, VDP support
I took out the multiple displays support thing because it was never
really implemented fully (Emulator::Video and the GUIs both ignored it)
or used anyway. If it ends up necessary in the future, I'll worry about
it then.
There's enough MSX emulation now to run Mr. Do! without sound or input.
I'm shipping higan with C-BIOS 0.29a, although it likely won't be good
enough in the future (eg it can't do BASIC, floppy disk, or cassette
loading.) I have keyboard and (not working) AY-3-8910 support in a
different branch, so that won't take too long to implement. Main problem
is naming all the darned keyboard keys. I think I need to change
settings.bml's input mapping lines so that the key names are values
instead of node names, so that any characters can appear inside of them.
It turns out my MSX set uses .rom for the file extensions ... gods. So,
icarus can't really import them like this. I may have to re-design
icarus' importer to stop caring about the file extension and instead ask
you what kind of games you are importing. There's no way icarus can
heuristically guess what systems the images belong to, because many
systems don't have any standardized magic bytes.
I'm struggling with where to put SG-1000, SC-3000, ColecoVision, Coleco
Adam stuff. I think they need to be split to two separate higan
subfolders (sg and cv, most likely ...) The MS/GG share a very
customized and extended VDP that the other systems don't have. The Sega
and Coleco older hardware share the same TMS9918 as the MSX, yet have
very different memory maps and peripherals that I don't want to mix
together. Especially if we start getting into the computer-variants
more.
Debian has served us well, but byuu would like to start using C++17 features
which generally requires GCC7. Debian Stable only has GCC6 right now, while
Ubuntu LTS has the required version, so that should get things going again.
byuu says:
The biggest change was improving WonderSwan emulation. With help from
trap15, I tracked down a bug where I was checking the wrong bit for
reverse DMA transfers. Then I also emulated VTOTAL to support variable
refresh rate. Then I improved HyperVoice emulation which should be
unsigned samples in three of four modes. That got Fire Lancer running
great. I also rewrote the disassembler. The old one disassembled many
instructions completely wrong, and deviated too much from any known x86
syntax. I also emulated some of the quirks of the V30 (two-byte POP into
registers fails, SALC is just XLAT mirrored, etc) which probably don't
matter unless someone tries to run code to verify it's a NEC CPU and not
an Intel CPU, but hey, why not?
I also put more work into the MSX skeleton, but it's still just a
skeleton with no real emulation yet.
byuu says:
Changelog:
- nall: converted range, iterator, vector to 64-bit
- added (very poor) ColecoVision emulation (including Coleco Adam
expansion)
- added MSX skeleton
- added Neo Geo Pocket skeleton
- moved audio,video,resource folders into emulator folder
- SFC heuristics: BS-X Town cart is "ZBSJ" [hex_usr]
The nall change is for future work on things like BPA: I need to be able
to handle files larger than 4GB. It is extremely possible that there are
still some truncations to 32-bit lurking around, and even more
disastrously, possibly some -1s lurking that won't sign-extend to
`(uint64_t)0-1`. There's a lot more classes left to do: `string`,
`array_view`, `array_span`, etc.