Commit Graph

119 Commits

Author SHA1 Message Date
Tim Allen 3bcf3c24c9 Update to v102r20 release.
byuu says:

Changelog:

  - nall: `#undef OUT` on Windows platform
  - GBA: add missing CPU prefetch state to serialization (this was
    breaking serialization in games using ROM prefetch)
  - GBA: reset all PPU data in the power() function (some things were
    missing before, causing issues on reset)
  - GBA: restored horizontal mosaic emulation to the new pixel-based
    renderer
  - GBA: fixed tilemap background horizontal flipping (Legend of Spyro -
    warning screen)
  - GBA: fixed d8 bits of scroll registers (ATV - Thunder Ridge Racers -
    menu screen)
  - SFC: DRAM refresh ticks the ALU MUL/DIV registers five steps forward
    [reported by kevtris]
  - SFC: merged dmaCounter and autoJoypadCounter into new shared
    clockCounter
      - left stub for old dmaCounter so that I can do some traces to
        ensure the new code's 100% identical

GBA save states would have been broken since whenever I emulated ROM
prefetch. I guess not many people are using the GBA core ...
2017-06-06 11:39:27 +10:00
Tim Allen 1ca4609079 Update to v102r18 release.
byuu says:

This WIP fixes all the critical pending issues I had open. I'm sure
there's many more that simply didn't make their way into said list. So
by all means, please report important issues you're aware of so they can
get fixed.

Changelog:

  - ruby: add variable texture support to GDI video driver [bug
    reported by Cydrak]
  - ruby: minor cleanups to XShm video driver
  - ruby: fix handling of up+down, left+right hat cases for XInput
    driver [bug reported by Cydrak]
  - nall: fixed vector class so that compilation with GCC 7.1 should
    succeed [SuperMikeMan]
  - sfc: initialize most DSP registers to random values to fix Magical
    Drop [Jonas Quinn]
  - sfc: lower PPU brightness when luma=0 from 50% scale to 25% scale;
    helps scenes like Final Fantasy III's intro
2017-05-30 17:48:41 +10:00
Tim Allen 7e7003fd29 Update to v102r15 release.
byuu says:

Changelog:

  - nall: added DSP::IIR::OnePole (which is a first-order IIR filter)
  - FC/APU: removed strong highpass, weak hipass filters (and the
    dummied out lowpass filter)
  - MS,GG,MD/PSG: removed lowpass filter
  - MS,GG,MD/PSG: audio was not being centered properly; removed
    centering for now
  - MD/YM2612: fixed clipping of accumulator from 18 signed bits to 14
    signed bits (-0x2000 to +0x1fff) [Cydrak]
  - MD/YM2612: removed lowpass filter
  - PCE/PSG: audio was not being centered properly; removed centering
    for now

First thing is that I've removed all of the ad-hoc audio filtering.
Emulator::Stream intrinsically provides a three-pass, second-order
biquad IIR butterworth lowpass filter that clips frequencies above 20KHz
with very good attenuation (as good as IIR gets, anyway.)

It doesn't really make sense to have the various cores running
additional lowpass filters. If we want to filter frequencies below
20KHz, then I can adapt Emulator::Audio::createStream() to take a cutoff
frequency value, and we can do it all at once, with much better quality.

Right now, I don't know what frequencies are best to cut off the various
other audio cores, so they're just gone for now.

As for the highpass filters for the Famicom core, well ... you don't get
aliasing from resampling low frequencies. And generally speaking, too
low a frequency will be inaudible anyway. All these were doing was
killing possible bass (if they were too strong.) We can add them again,
but only if someone can convert Ryphecha's ad-hoc magic integers into a
frequency cutoff. In which case, I'll use my biquad IIR filter to do it
even better. On this note, it may prove useful to do this for the MD PSG
as well, to try and head off unnecessary clamping when mixing with the
YM2612.

Finally, there was the audio centering issue that affected the
MS,GG,MD,PCE,SG cores. It was flooring the "silent" audio level, which
was resulting in extremely heavy distortion if you tried listening to
higan and, say, audacious at the same time. Without the botched
centering, this distortion is completely gone now.

However, without any centering, we've halved the potential volume range.
This means the audio slider in higan's audio settings panel will start
clamping twice as quickly. So ultimately, we need to figure out how to
fix the centering. This isn't as simple as just subtracting less. We
will probably have to center every individual audio channel before
summing them to do this properly.

Results:

On the Mega Drive, Altered Beast sounds quite a bit better, a lot less
distortion now. But it's still not perfect, especially sound effects.
Further, Bare Knuckle / Streets of Rage still has really bad sound
effects. It looks like I broke something in Cydrak's code when trying to
adapt it to my style =(
2017-03-07 07:23:22 +11:00
Tim Allen 4c3f9b93e7 Update to v102r12 release.
byuu says:

Changelog:

  - MD/PSG: fixed 68K bus Z80 status read address location
  - MS, GG, MD/PSG: channels post-decrement their counters, not
    pre-decrement [Cydrak]¹
  - MD/VDP: cache screen width registers once per scanline; screen
    height registers once per frame
  - MD/VDP: support 256-width display mode (used in Shining Force, etc)
  - MD/YM2612: implemented timers²
  - MD/YM2612: implemented 8-bit PCM DAC²
  - 68000: TRAP instruction should index the vector location by 32 (eg
    by 128 bytes), fixes Shining Force
  - nall: updated hex(), octal(), binary() functions to take uintmax
    instead of template<typename T> parameter³

¹: this one makes an incredible difference. Sie noticed that lots of
games set a period of 0, which would end up being a really long period
with pre-decrement. By fixing this, noise shows up in many more games,
and sounds way better in games even where it did before. You can hear
extra sound on Lunar - Sanposuru Gakuen's title screen, the noise in
Sonic The Hedgehog (Mega Drive) sounds better, etc.

²: this also really helps sound. The timers allow PSG music to play
back at the correct speed instead of playing back way too quickly. And
the PCM DAC lets you hear a lot of drum effects, as well as the
"Sega!!" sound at the start of Sonic the Hedgehog, and the infamous,
"Rise from your grave!" line from Altered Beast.

Still, most music on the Mega Drive comes from the FM channels, so
there's still not a whole lot to listen to.

I didn't implement Cydrak's $02c test register just yet. Sie wasn't 100%
certain on how the extended DAC bit worked, so I'd like to play it a
little conservative and get sound working, then I'll go back and add a
toggle or something to enable undocumented registers, that way we can
use that to detect any potential problems they might be causing.

³: unfortunately we lose support for using hex() on nall/arithmetic
types. If I have a const Pair& version of the function, then the
compiler gets confused on whether Natural<32> should use uintmax or
const Pair&, because compilers are stupid, and you can't have explicit
arguments in overloaded functions. So even though either function would
work, it just decides to error out instead >_>

This is actually really annoying, because I want hex() to be useful for
printing out nall/crypto keys and hashes directly.

But ... this change had to be made. Negative signed integers would crash
programs, and that was taking out my 68000 disassembler.
2017-02-27 19:45:51 +11:00
Tim Allen fa6cbac251 Update to v102r06 release.
byuu says:

Changelog:

  - added higan/emulator/platform.hpp (moved out Emulator::Platform from
    emulator/interface.hpp)
  - moved gmake build paramter to nall/GNUmakefile; both higan and
    icarus use it now
  - added build=profile mode
  - MD: added the region select I/O register
  - MD: started to add region selection support internally (still no
    external select or PAL support)
  - PCE: added cycle stealing when reading/writing to the VDC or VCE;
    and when using ST# instructions
  - PCE: cleaned up PSG to match the behavior of Mednafen (doesn't
    improve sound at all ;_;)
      - note: need to remove loadWaveSample, loadWavePeriod
  - HuC6280: ADC/SBC decimal mode consumes an extra cycle; does not set
    V flag
  - HuC6280: block transfer instructions were taking one cycle too many
  - icarus: added code to strip out PC Engine ROM headers
  - hiro: added options support to BrowserDialog

The last one sure ended in failure. The plan was to put a region
dropdown directly onto hiro::BrowserDialog, and I had all the code for
it working. But I forgot one important detail: the system loads
cartridges AFTER powering on, so even though I could technically change
the system region post-boot, I'd rather not do so.

So that means we have to know what region we want before we even select
a game. Shit.
2017-02-11 10:56:42 +11:00
Tim Allen c40e9754bc Update to v102r01 release.
byuu says:

Changelog:

  - MS, MD, PCE: remove controllers from scheduler in destructor
    [hex_usr]
  - PCE: no controller should return all bits set (still causing errant
    key presses when swapping gamepads)
  - PCE: emulate MDR for hardware I/O $0800-$17ff
  - PCE: change video resolution to 1140x242
  - PCE: added tertiary background Vscroll register (secondary cache)
  - PCE: create classes out of VDC VRAM, SATB, CRAM for cleaner access
    and I/O registers
  - PCE: high bits of CRAM read should be set
  - PCE: partially emulated VCE display registers: color frequency, HDS,
    HDW, VDS, VDW
  - PCE: 32-width sprites now split to two 16-width sprites to handle
    overflow properly
  - PCE: hopefully emulated sprite zero hit correctly (it's not well
    documented, and not often used)
  - PCE: trigger line coincidence interrupts during the previous
    scanline's Hblank period
  - tomoko: raise viewport from 320x240 to 326x242 to accommodate PC
    Engine's max resolution
  - nall: workaround for Clang compilation bug that can't figure out
    that a char is an integral data type
2017-01-22 11:33:36 +11:00
Tim Allen 5bdf55f08f Update to v101r25 release.
byuu says:

Changelog:

  - SMS: emulated VDP mode 4 graphical output (background, sprites)
  - added $(windres) to icarus as well

I'm sure the VDP emulation is still really, really buggy, but
essentially I handle:

  - mode 4 rendering
  - background scrolling
  - background hscroll lock
  - background vscroll lock
  - background nametable relocation
  - sprite nametable relocation
  - sprite tiledata relocation
  - sprite 192-line y=0xd0 edge case (end sprite rendering)
  - sprite 8-pixel x-coordinate displacement
  - sprite extended size (height only in mode 4)
  - sprite overflow
  - sprite collision
  - left column masking
  - display disable
  - backdrop color
  - 192, 224, 240 height

I do not support:

  - mode 2 rendering
  - sprite zoom
  - disallowing 240 height in NTSC mode
  - PAL mode
  - probably lots more
2016-12-30 18:24:35 +11:00
Tim Allen bab2ac812a Update to v101r24 release.
byuu says:

Changelog:

  - SMS: extended bus mapping of in/out ports: now decoding them fully
    inside ms/bus
  - SMS: moved Z80 disassembly code from processor/z80 to ms/cpu
    (cosmetic)
  - SMS: hooked up non-functional silent PSG sample generation, so I can
    cap the framerate at 60fps
  - SMS: hooked up the VDP main loop: 684 clocks/scanline, 262
    scanlines/frame (no PAL support yet)
  - SMS: emulated the VDP Vcounter and Hcounter polling ... hopefully
    it's right, as it's very bizarre
  - SMS: emulated VDP in/out ports (data read, data write, status read,
    control write, register write)
  - SMS: decoding and caching all VDP register flags (variable names
    will probably change)
  - nall: \#undef IN on Windows port (prevent compilation warning on
    processor/z80)

Watching Sonic the Hedgehog, I can definitely see some VDP register
writes going through, which is a good sign.

Probably the big thing that's needed before I can get enough into the
VDP to start showing graphics is interrupt support. And interrupts are
never fun to figure out :/

What really sucks on this front is I'm flying blind on the Z80 CPU core.
Without a working VDP, I can't run any Z80 test ROMs to look for CPU
bugs. And the CPU is certainly too buggy still to run said test ROM
anyway. I can't find any SMS emulators with trace logging from reset.
Such logs vastly accelerate tracking down CPU logic bugs, so without
them, it's going to take a lot longer.
2016-12-17 22:31:34 +11:00
Tim Allen f3e67da937 Update to v101r19 release.
byuu says:

Changelog:

-   added \~130 new PAL games to icarus (courtesy of Smarthuman
    and aquaman)
-   added all three Korean-localized games to icarus
-   sfc: removed SuperDisc emulation (it was going nowhere)
-   sfc: fixed MSU1 regression where the play/repeat flags were not
    being cleared on track select
-   nall: cryptography support added; will be used to sign future
    databases (validation will always be optional)
-   minor shims to fix compilation issues due to nall changes

The real magic is that we now have 25-30% of the PAL SNES library in
icarus!

Signing will be tricky. Obviously if I put the public key inside the
higan archive, then all anyone has to do is change that public key for
their own releases. And if you download from my site (which is now over
HTTPS), then you don't need the signing to verify integrity. I may just
put the public key on my site on my site and leave it at that, we'll
see.
2016-10-28 08:16:58 +11:00
Tim Allen c6fc15f8d2 Update to v101r18 release.
byuu says:

Changelog:

  - added 30 new PAL games to icarus (courtesy of Mikerochip)
  - new version of libco no longer requires mprotect nor W|X permissions
  - nall: default C compiler to -std=c11 instead of -std=c99
  - nall: use `-fno-strict-aliasing` during compilation
  - updated nall/certificates (hopefully for the last time)
  - updated nall/http to newer coding conventions
  - nall: improve handling of range() function

I didn't really work on higan at all, this is mostly just a release
because lots of other things have changed.

The most interesting is `-fno-strict-aliasing` ... basically, it joins
`-fwrapv` as being "stop the GCC developers from doing *really* evil
shit that could lead to security vulnerabilities or instabilities."

For the most part, it's a ~2% speed penalty for higan. Except for the
Sega Genesis, where it's a ~10% speedup. I have no idea how that's
possible, but clearly something's going very wrong with strict aliasing
on the Genesis core.

So ... it is what it is. If you need the performance for the non-Genesis
cores, you can turn it off in your builds. But I'm getting quite sick of
C++'s "surprises" and clever compiler developers, so I'm keeping it on
in all of my software going forward.
2016-09-14 21:55:53 +10:00
Tim Allen 4c3f58150c Update to v101r15 release.
byuu says:

Changelog:

  - added (poorly-named) castable<To, With> template
  - Z80 debugger rewritten to make declaring instructions much simpler
  - Z80 has more instructions implemented; supports displacement on
    (IX), (IY) now
  - added `Processor::M68K::Bus` to mirror `Processor::Z80::Bus`
      - it does add a pointer indirection; so I'm not sure if I want to
        do this for all of my emulator cores ...
2016-09-04 23:51:27 +10:00
Tim Allen d91f3999cc Update to v101r14 release.
byuu says:
Changelog:

  - rewrote the Z80 core to properly handle 0xDD (IX0 and 0xFD (IY)
    prefixes
  - added Processor::Z80::Bus as a new type of abstraction
  - all of the instructions implemented have their proper T-cycle counts
    now
  - added nall/certificates for my public keys

The goal of `Processor::Z80::Bus` is to simulate the opcode fetches being
2-read + 2-wait states; operand+regular reads/writes being 3-read. For
now, this puts the cycle counts inside the CPU core. At the moment, I
can't think of any CPU core where this wouldn't be appropriate. But it's
certainly possible that such a case exists. So this may not be the
perfect solution.

The reason for having it be a subclass of Processor::Z80 instead of
virtual functions for the MasterSystem::CPU core to define is due to
naming conflicts. I wanted the core to say `in(addr)` and have it take
the four clocks. But I also wanted a version of the function that didn't
consume time when called. One way to do that would be for the core to
call `Z80::in(addr)`, which then calls the regular `in(addr)` that goes to
`MasterSystem::CPU::in(addr)`. But I don't want to put the `Z80::`
prefix on all of the opcodes. Very easy to forget it, and then end up not
consuming any time. Another is to use uglier names in the
`MasterSystem::CPU` core, like `read_`, `write_`, `in_`, `out_`, etc. But,
yuck.

So ... yeah, this is an experiment. We'll see how it goes.
2016-09-03 21:26:04 +10:00
Tim Allen 043f6a8b33 Update to v101r08 release.
byuu says:

Changelog:

  - 68K: fixed read-modify-write instructions
  - 68K: fixed ADDX bug (using wrong target)
  - 68K: fixed major bug with SUB using wrong argument ordering
  - 68K: fixed sign extension when reading address registers from
    effective addressing
  - 68K: fixed sign extension on CMPA, SUBA instructions
  - VDP: improved OAM sprite attribute table caching behavior
  - VDP: improved DMA fill operation behavior
  - added Master System / Game Gear stubs (needed for developing the Z80
    core)
2016-08-17 22:31:22 +10:00
Tim Allen ffd150735b Update to v101r07 release.
byuu says:

Added VDP sprite rendering. Can't get any games far enough in to see if
it actually works. So in other words, it doesn't work at all and is 100%
completely broken.

Also added 68K exceptions and interrupts. So far only the VDP interrupt
is present. It definitely seems to be firing in commercial games, so
that's promising. But the implementation is almost certainly completely
wrong. There is fuck all of nothing for documentation on how interrupts
actually work. I had to find out the interrupt vector numbers from
reading the comments from the Sonic the Hedgehog disassembly. I have
literally no fucking clue what I0-I2 (3-bit integer priority value in
the status register) is supposed to do. I know that Vblank=6, Hblank=4,
Ext(gamepad)=2. I know that at reset, SR.I=7. I don't know if I'm
supposed to block interrupts when I is >, >=, <, <= to the interrupt
level. I don't know what level CPU exceptions are supposed to be.

Also implemented VDP regular DMA. No idea if it works correctly since
none of the commercial games run far enough to use it. So again, it's
horribly broken for usre.

Also improved VDP fill mode. But I don't understand how it takes
byte-lengths when the bus is 16-bit. The transfer times indicate it's
actually transferring at the same speed as the 68K->VDP copy, strongly
suggesting it's actually doing 16-bit transfers at a time. In which case,
what happens when you set an odd transfer length?

Also, both DMA modes can now target VRAM, VSRAM, CRAM. Supposedly there's
all kinds of weird shit going on when you target VSRAM, CRAM with VDP
fill/copy modes, but whatever. Get to that later.

Also implemented a very lazy preliminary wait mechanism to to stall out
a processor while another processor exerts control over the bus. This
one's going to be a major work in progress. For one, it totally breaks
the model I use to do save states with libco. For another, I don't
know if a 68K->VDP DMA instantly locks the CPU, or if it the CPU could
actually keep running if it was executing out of RAM when it started
the DMA transfer from ROM (eg it's a bus busy stall, not a hard chip
stall.) That'll greatly change how I handle the waiting.

Also, the OSS driver now supports Audio::Latency. Sound should be
even lower latency now. On FreeBSD when set to 0ms, it's absolutely
incredible. Cannot detect latency whatsoever. The Mario jump sound seems
to happen at the very instant I hear my cherry blue keyswitch activate.
2016-08-15 14:56:38 +10:00
Tim Allen ac2d0ba1cf Update to v101r05 release.
byuu says:

Changelog:

  - 68K: fixed bug that affected BSR return address
  - VDP: added very preliminary emulation of planes A, B, W (W is
    entirely broken though)
  - VDP: added command/address stuff so you can write to VRAM, CRAM,
    VSRAM
  - VDP: added VRAM fill DMA

I would be really surprised if any commercial games showed anything at
all, so I'd probably recommend against wasting your time trying, unless
you're really bored :P

Also, I wanted to add: I am accepting patches\! So if anyone wants to
look over the 68K core for bugs, that would save me untold amounts of
time in the near future :D
2016-08-13 09:47:30 +10:00
Tim Allen 1df2549d18 Update to v101r04 release.
byuu says:

Changelog:

  - pulled the (u)intN type aliases into higan instead of leaving them
    in nall
  - added 68K LINEA, LINEF hooks for illegal instructions
  - filled the rest of the 68K lambda table with generic instance of
    ILLEGAL
  - completed the 68K disassembler effective addressing modes
      - still unsure whether I should use An to decode absolute
        addresses or not
      - pro: way easier to read where accesses are taking place
      - con: requires An to be valid; so as a disassembler it does a
        poor job
      - making it optional: too much work; ick
  - added I/O decoding for the VDP command-port registers
  - added skeleton timing to all five processor cores
  - output at 1280x480 (needed for mixed 256/320 widths; and to handle
    interlace modes)

The VDP, PSG, Z80, YM2612 are all stepping one clock at a time and
syncing; which is the pathological worst case for libco. But they also
have no logic inside of them. With all the above, I'm averaging around
250fps with just the 68K core actually functional, and the VDP doing a
dumb "draw white pixels" loop. Still way too early to tell how this
emulator is going to perform.

Also, the 320x240 mode of the Genesis means that we don't need an aspect
correction ratio. But we do need to ensure the output window is a
multiple 320x240 so that the scale values work correctly. I was
hard-coding aspect correction to stretch the window an additional \*8/7.
But that won't work anymore so ... the main higan window is now 640x480,
960x720, or 1280x960. Toggling aspect correction only changes the video
width inside the window.

It's a bit jarring ... the window is a lot wider, more black space now
for most modes. But for now, it is what it is.
2016-08-12 11:07:04 +10:00
Tim Allen 0a57cac70c Update to v101r02 release.
byuu says:

Changelog:

  - Emulator: use `(uintmax)-1 >> 1` for the units of time
  - MD: implemented 13 new 68K instructions (basically all of the
    remaining easy ones); 21 remain
  - nall: replaced `(u)intmax_t` (64-bit) with *actual* `(u)intmax` type
    (128-bit where available)
      - this extends to everything: atoi, string, etc. You can even
        print 128-bit variables if you like

22,552 opcodes still don't exist in the 68K map. Looking like quite a
few entries will be blank once I finish.
2016-08-09 21:07:18 +10:00
Tim Allen e39987a3e3 Update to v101 release.
byuu says (in the public announcement):

Not a large changelog this time, sorry. This release is mostly to fix
the SA-1 issue, and to get some real-world testing of the new scheduler
model. Most of the work in the past month has gone into writing a 68000
CPU core; yet it's still only about half-way finished.

Changelog (since the previous release):

  - fixed SNES SA-1 IRQ regression (fixes Super Mario RPG level-up
    screen)
  - new scheduler for all emulator cores (precision of 2^-127)
  - icarus database adds nine new SNES games
  - added Input/Frequency to settings file (allows simulation of
    latency)

byuu says (in the WIP forum):

Changelog:

  - in 32-bit mode, Thread uses uint64\_t with 2^-63 time units (10^-7
    precision in the worst case)
      - nearly ten times the precision of an attosecond
  - in 64-bit mode, Thread uses uint128\_t with 2^-127 time units
    (10^-26 precision in the worst case)
      - far more accurate than yoctoseconds; almost closing in on planck
        time

Note: a quartz crystal is accurate to 10^-4 or 10^-5. A cesium fountain
atomic clock is accurate to 10^-15. So ... yeah. 2^-63 was perfectly
fine; but there was no speed penalty whatsoever for using uint128\_t in
64-bit mode, so why not?
2016-08-08 20:04:15 +10:00
Tim Allen f5e5bf1772 Update to v100r16 release.
byuu says:

(Windows users may need to include <sys/time.h> at the top of
nall/chrono.hpp, not sure.)

Unchangelog:
- forgot to add the Scheduler clock=0 fix because I have the memory of
  a goldfish

Changelog:
- new icarus database with nine additional games
- hiro(GTK,Qt) won't constantly write its settings.bml file to disk
  anymore
- added latency simulator for fun (settings.bml => Input/Latency in
  milliseconds)

So the last one ... I wanted to test out nall::chrono, and I was also
thinking that by polling every emulated frame, it's pretty wasteful when
you are using Fast Forward and hitting 200+fps. As I've said before,
calls to ruby::input::poll are not cheap.

So to get around this, I added a limiter so that if you called the
hardware poll function within N milliseconds, it'll return without
doing any actual work. And indeed, that increases my framerate of Zelda
3 uncapped from 133fps to 142fps. Yay. But it's not a "real" speedup,
as it only helps you when you exceed 100% speed (theoretically, you'd
need to crack 300% speed since the game itself will poll at 16ms at 100%
speed, but yet it sped up Zelda 3, so who am I to complain?)

I threw the latency value into the settings file. It should be 16,
but I set it to 5 since that was the lowest before it started negatively
impacting uncapped speeds. You're wasting your time and CPU cycles setting
it lower than 5, but if people like placebo effects it might work. Maybe
I should let it be a signed integer so people can set it to -16 and think
it's actually faster :P (I'm only joking. I took out the 96000hz audio
placebo effect as well. Not really into psychological tricks anymore.)

But yeah seriously, I didn't do this to start this discussion again for
the billionth time. Please don't go there. And please don't tell me this
WIP has higher/lower latency than before. I don't want to hear it.

The only reason I bring it up is for the fun part that is worth
discussing: put up or shut up time on how sensitive you are to
latency! You can set the value above 5 to see how games feel.

I personally can't really tell a difference until about 50. And I can't
be 100% confident it's worse until about 75. But ... when I set it to
150, games become "extra difficult" ... the higher it goes, the worse
it gets :D

For this WIP, I've left no upper limit cap. I'll probably set a cap of
something like 500ms or 1000ms for the official release. Need to balance
user error/trolling with enjoyability. I'll think about it.

[...]

Now, what I worry about is stupid people seeing it and thinking it's an
"added latency" setting, as if anyone would intentionally make things
worse by default. This is a limiter. So if 5ms have passed since the
game last polled, and that will be the case 99.9% of the time in games,
the next poll will happen just in time, immediately when the game polls
the inputs. Thus, a value below 1/<framerate>ms is not only pointless,
if you go too low it will ruin your fast forward max speeds.

I did say I didn't want to resort to placebo tricks, but I also don't
want to spark up public discussion on this again either. So it might
be best to default Input/Latency to 0ms, and internally have a max(5,
latency) wrapper around the value.
2016-08-03 22:32:40 +10:00
Tim Allen c50723ef61 Update to v100r15 release.
byuu wrote:

Aforementioned scheduler changes added. Longer explanation of why here:
http://hastebin.com/raw/toxedenece

Again, we really need to test this as thoroughly as possible for
regressions :/
This is a really major change that affects absolutely everything: all
emulation cores, all coprocessors, etc.

Also added ADDX and SUB to the 68K core, which brings us just barely
above 50% of the instruction encoding space completed.

[Editor's note: The "aformentioned scheduler changes" were described in
a previous forum post:

    Unfortunately, 64-bits just wasn't enough precision (we were
    getting misalignments ~230 times a second on 21/24MHz clocks), so
    I had to move to 128-bit counters. This of course doesn't exist on
    32-bit architectures (and probably not on all 64-bit ones either),
    so for now ... higan's only going to compile on 64-bit machines
    until we figure something out. Maybe we offer a "lower precision"
    fallback for machines that lack uint128_t or something. Using the
    booth algorithm would be way too slow.

    Anyway, the precision is now 2^-96, which is roughly 10^-29. That
    puts us far beyond the yoctosecond. Suck it, MAME :P I'm jokingly
    referring to it as the byuusecond. The other 32-bits of precision
    allows a 1Hz clock to run up to one full second before all clocks
    need to be normalized to prevent overflow.

    I fixed a serious wobbling issue where I was using clock > other.clock
    for synchronization instead of clock >= other.clock; and also another
    aliasing issue when two threads share a common frequency, but don't
    run in lock-step. The latter I don't even fully understand, but I
    did observe it in testing.

    nall/serialization.hpp has been extended to support 128-bit integers,
    but without explicitly naming them (yay generic code), so nall will
    still compile on 32-bit platforms for all other applications.

    Speed is basically a wash now. FC's a bit slower, SFC's a bit faster.

The "longer explanation" in the linked hastebin is:

    Okay, so the idea is that we can have an arbitrary number of
    oscillators. Take the SNES:

    - CPU/PPU clock = 21477272.727272hz
    - SMP/DSP clock = 24576000hz
    - Cartridge DSP1 clock = 8000000hz
    - Cartridge MSU1 clock = 44100hz
    - Controller Port 1 modem controller clock = 57600hz
    - Controller Port 2 barcode battler clock = 115200hz
    - Expansion Port exercise bike clock = 192000hz

    Is this a pathological case? Of course it is, but it's possible. The
    first four do exist in the wild already: see Rockman X2 MSU1
    patch. Manifest files with higan let you specify any frequency you
    want for any component.

    The old trick higan used was to hold an int64 counter for each
    thread:thread synchronization, and adjust it like so:

    - if thread A steps X clocks; then clock += X * threadB.frequency
      - if clock >= 0; switch to threadB
    - if thread B steps X clocks; then clock -= X * threadA.frequency
      - if clock <  0; switch to threadA

    But there are also system configurations where one processor has to
    synchronize with more than one other processor. Take the Genesis:

    - the 68K has to sync with the Z80 and PSG and YM2612 and VDP
    - the Z80 has to sync with the 68K and PSG and YM2612
    - the PSG has to sync with the 68K and Z80 and YM2612

    Now I could do this by having an int64 clock value for every
    association. But these clock values would have to be outside the
    individual Thread class objects, and we would have to update every
    relationship's clock value. So the 68K would have to update the Z80,
    PSG, YM2612 and VDP clocks. That's four expensive 64-bit multiply-adds
    per clock step event instead of one.

    As such, we have to account for both possibilities. The only way to
    do this is with a single time base. We do this like so:

    - setup: scalar = timeBase / frequency
    - step: clock += scalar * clocks

    Once per second, we look at every thread, find the smallest clock
    value. Then subtract that value from all threads. This prevents the
    clock counters from overflowing.

    Unfortunately, these oscillator values are psychotic, unpredictable,
    and often times repeating fractions. Even with a timeBase of
    1,000,000,000,000,000,000 (one attosecond); we get rounding errors
    every ~16,300 synchronizations. Specifically, this happens with a CPU
    running at 21477273hz (rounded) and SMP running at 24576000hz. That
    may be good enough for most emulators, but ... you know how I am.

    Plus, even at the attosecond level, we're really pushing against the
    limits of 64-bit integers. Given the reciprocal inverse, a frequency
    of 1Hz (which does exist in higan!) would have a scalar that consumes
    1/18th of the entire range of a uint64 on every single step. Yes, I
    could raise the frequency, and then step by that amount, I know. But
    I don't want to have weird gotchas like that in the scheduler core.

    Until I increase the accuracy to about 100 times greater than a
    yoctosecond, the rounding errors are too great. And since the only
    choice above 64-bit values is 128-bit values; we might as well use
    all the extra headroom. 2^-96 as a timebase gives me the ability to
    have both a 1Hz and 4GHz clock; and run them both for a full second;
    before an overflow event would occur.

Another hastebin includes demonstration code:

    #include <libco/libco.h>

    #include <nall/nall.hpp>
    using namespace nall;

    //

    cothread_t mainThread = nullptr;
    const uint iterations = 100'000'000;
    const uint cpuFreq = 21477272.727272 + 0.5;
    const uint smpFreq = 24576000.000000 + 0.5;
    const uint cpuStep = 4;
    const uint smpStep = 5;

    //

    struct ThreadA {
      cothread_t handle = nullptr;
      uint64 frequency = 0;
      int64 clock = 0;

      auto create(auto (*entrypoint)() -> void, uint frequency) {
        this->handle = co_create(65536, entrypoint);
        this->frequency = frequency;
        this->clock = 0;
      }
    };

    struct CPUA : ThreadA {
      static auto Enter() -> void;
      auto main() -> void;
      CPUA() { create(&CPUA::Enter, cpuFreq); }
    } cpuA;

    struct SMPA : ThreadA {
      static auto Enter() -> void;
      auto main() -> void;
      SMPA() { create(&SMPA::Enter, smpFreq); }
    } smpA;

    uint8 queueA[iterations];
    uint offsetA;
    cothread_t resumeA = cpuA.handle;

    auto EnterA() -> void {
      offsetA = 0;
      co_switch(resumeA);
    }

    auto QueueA(uint value) -> void {
      queueA[offsetA++] = value;
      if(offsetA >= iterations) {
        resumeA = co_active();
        co_switch(mainThread);
      }
    }

    auto CPUA::Enter() -> void { while(true) cpuA.main(); }

    auto CPUA::main() -> void {
      QueueA(1);
      smpA.clock -= cpuStep * smpA.frequency;
      if(smpA.clock < 0) co_switch(smpA.handle);
    }

    auto SMPA::Enter() -> void { while(true) smpA.main(); }

    auto SMPA::main() -> void {
      QueueA(2);
      smpA.clock += smpStep * cpuA.frequency;
      if(smpA.clock >= 0) co_switch(cpuA.handle);
    }

    //

    struct ThreadB {
      cothread_t handle = nullptr;
      uint128_t scalar = 0;
      uint128_t clock = 0;

      auto print128(uint128_t value) {
        string s;
        while(value) {
          s.append((char)('0' + value % 10));
          value /= 10;
        }
        s.reverse();
        print(s, "\n");
      }

      //femtosecond (10^15) =    16306
      //attosecond  (10^18) =   688838
      //zeptosecond (10^21) = 13712691
      //yoctosecond (10^24) = 13712691 (hitting a dead-end on a rounding error causing a wobble)
      //byuusecond? ( 2^96) = (perfect? 79,228 times more precise than a yoctosecond)

      auto create(auto (*entrypoint)() -> void, uint128_t frequency) {
        this->handle = co_create(65536, entrypoint);

        uint128_t unitOfTime = 1;
      //for(uint n : range(29)) unitOfTime *= 10;
        unitOfTime <<= 96;  //2^96 time units ...

        this->scalar = unitOfTime / frequency;
        print128(this->scalar);
        this->clock = 0;
      }

      auto step(uint128_t clocks) -> void { clock += clocks * scalar; }
      auto synchronize(ThreadB& thread) -> void { if(clock >= thread.clock) co_switch(thread.handle); }
    };

    struct CPUB : ThreadB {
      static auto Enter() -> void;
      auto main() -> void;
      CPUB() { create(&CPUB::Enter, cpuFreq); }
    } cpuB;

    struct SMPB : ThreadB {
      static auto Enter() -> void;
      auto main() -> void;
      SMPB() { create(&SMPB::Enter, smpFreq); clock = 1; }
    } smpB;

    auto correct() -> void {
      auto minimum = min(cpuB.clock, smpB.clock);
      cpuB.clock -= minimum;
      smpB.clock -= minimum;
    }

    uint8 queueB[iterations];
    uint offsetB;
    cothread_t resumeB = cpuB.handle;

    auto EnterB() -> void {
      correct();
      offsetB = 0;
      co_switch(resumeB);
    }

    auto QueueB(uint value) -> void {
      queueB[offsetB++] = value;
      if(offsetB >= iterations) {
        resumeB = co_active();
        co_switch(mainThread);
      }
    }

    auto CPUB::Enter() -> void { while(true) cpuB.main(); }

    auto CPUB::main() -> void {
      QueueB(1);
      step(cpuStep);
      synchronize(smpB);
    }

    auto SMPB::Enter() -> void { while(true) smpB.main(); }

    auto SMPB::main() -> void {
      QueueB(2);
      step(smpStep);
      synchronize(cpuB);
    }

    //

    #include <nall/main.hpp>
    auto nall::main(string_vector) -> void {
      mainThread = co_active();

      uint masterCounter = 0;
      while(true) {
        print(masterCounter++, " ...\n");

        auto A = clock();
        EnterA();
        auto B = clock();
        print((double)(B - A) / CLOCKS_PER_SEC, "s\n");

        auto C = clock();
        EnterB();
        auto D = clock();
        print((double)(D - C) / CLOCKS_PER_SEC, "s\n");

        for(uint n : range(iterations)) {
          if(queueA[n] != queueB[n]) return print("fail at ", n, "\n");
        }
      }
    }

...and that's everything.]
2016-07-31 12:11:20 +10:00
Tim Allen ca277cd5e8 Update to v100r14 release.
byuu says:

(Windows: compile with -fpermissive to silence an annoying error. I'll
fix it in the next WIP.)

I completely replaced the time management system in higan and overhauled
the scheduler.

Before, processor threads would have "int64 clock"; and there would
be a 1:1 relationship between two threads. When thread A ran for X
cycles, it'd subtract X * B.Frequency from clock; and when thread B ran
for Y cycles, it'd add Y * A.Frequency from clock. This worked well
and allowed perfect precision; but it doesn't work when you have more
complicated relationships: eg the 68K can sync to the Z80 and PSG; the
Z80 to the 68K and PSG; so the PSG needs two counters.

The new system instead uses a "uint64 clock" variable that represents
time in attoseconds. Every time the scheduler exits, it subtracts
the smallest clock count from all threads, to prevent an overflow
scenario. The only real downside is that rounding errors mean that
roughly every 20 minutes, we have a rounding error of one clock cycle
(one 20,000,000th of a second.) However, this only applies to systems
with multiple oscillators, like the SNES. And when you're in that
situation ... there's no such thing as a perfect oscillator anyway. A
real SNES will be thousands of times less out of spec than 1hz per 20
minutes.

The advantages are pretty immense. First, we obviously can now support
more complex relationships between threads. Second, we can build a
much more abstracted scheduler. All of libco is now abstracted away
completely, which may permit a state-machine / coroutine version of
Thread in the future. We've basically gone from this:

    auto SMP::step(uint clocks) -> void {
      clock += clocks * (uint64)cpu.frequency;
      dsp.clock -= clocks;
      if(dsp.clock < 0 && !scheduler.synchronizing()) co_switch(dsp.thread);
      if(clock >= 0 && !scheduler.synchronizing()) co_switch(cpu.thread);
    }

To this:

    auto SMP::step(uint clocks) -> void {
      Thread::step(clocks);
      synchronize(dsp);
      synchronize(cpu);
    }

As you can see, we don't have to do multiple clock adjustments anymore.
This is a huge win for the SNES CPU that had to update the SMP, DSP, all
peripherals and all coprocessors. Likewise, we don't have to synchronize
all coprocessors when one runs, now we can just synchronize the active
one to the CPU.

Third, when changing the frequencies of threads (think SGB speed setting
modes, GBC double-speed mode, etc), it no longer causes the "int64
clock" value to be erroneous.

Fourth, this results in a fairly decent speedup, mostly across the
board. Aside from the GBA being mostly a wash (for unknown reasons),
it's about an 8% - 12% speedup in every other emulation core.

Now, all of this said ... this was an unbelievably massive change, so
... you know what that means >_> If anyone can help test all types of
SNES coprocessors, and some other system games, it'd be appreciated.

----

Lastly, we have a bitchin' new about screen. It unfortunately adds
~200KiB onto the binary size, because the PNG->C++ header file
transformation doesn't compress very well, and I want to keep the
original resource files in with the higan archive. I might try some
things to work around this file size increase in the future, but for now
... yeah, slightly larger archive sizes, sorry.

The logo's a bit busted on Windows (the Label control's background
transparency and alignment settings aren't working), but works well on
GTK. I'll have to fix Windows before the next official release. For now,
look on my Twitter feed if you want to see what it's supposed to look
like.

----

EDIT: forgot about ICD2::Enter. It's doing some weird inverse
run-to-save thing that I need to implement support for somehow. So, save
states on the SGB core probably won't work with this WIP.
2016-07-30 13:56:12 +10:00
Tim Allen 92fe5b0813 Update to v100r08 release.
byuu says:

Six and a half hours this time ... one new opcode, and all old opcodes
now in a deprecated format. Hooray, progress!

For building the table, I've decided to move from:

    for(uint opcode : range(65536)) {
      if(match(...)) bind(opNAME, ...);
    }

To instead having separate for loops for each supported opcode. This
lets me specialize parts I want with templates.

And to this aim, I'm moving to replace all of the
(read,write)(size, ...) functions with (read,write)<Size>(...) functions.

This will amount to the ~70ish instructions being triplicated ot ~210ish
instructions; but I think this is really important.

When I was getting into flag calculations, a ton of conditionals
were needed to mask sizes to byte/word/long. There was also lots of
conditionals in all the memory access handlers.

The template code is ugly, but we eliminate a huge amount of branch
conditions this way.
2016-07-18 08:11:29 +10:00
Tim Allen 059347e575 Update to v100r07 release.
byuu says:

Four and a half hours of work and ... zero new opcodes implemented.

This was the best job I could do refining the effective address
computations. Should have all twelve 68000 modes implemented now. Still
have a billion questions about when and how I'm supposed to perform
certain edge case operations, though.
2016-07-17 13:24:28 +10:00
Tim Allen b72f35a13e Update to v100r05 release.
byuu says:

Alright, I'm definitely going to need to find some people willing to
tolerate my questions on this chip, so I'm going to go ahead and announce
I'm working on this I guess.

This core is way too big for a surprise like the NES and WS cores
were. It'll probably even span multiple v10x releases before it's
even ready.
2016-07-13 08:47:04 +10:00
Tim Allen 13ad9644a2 Update to v099r16 release (public beta).
byuu says:

Changelog:
- hiro: BrowserDialog can navigate up to drive selection on Windows
- nall: (file,path,dir,base,prefix,suffix)name =>
  Location::(file,path,dir,base,prefix,suffix)
- higan/tomoko: rename audio filter label from "Sinc" to "IIR - Biquad"
- higan/tomoko: allow loading files via icarus on the command-line
  once again
- higan/tomoko: (begrudging) quick hack to fix presentation window focus
  on startup
- higan/audio: don't divide output audio volume by number of streams
- processor/r65816: fix a regression in (read,write)DB; fixes Taz-Mania
- fixed compilation regressions on Windows and Linux

I'm happy with where we are at with code cleanups and stability, so I'd
like to release v100. But even though I'm not assigning any special
significance to this version, we should probably test it more thoroughly
first.
2016-07-04 21:53:24 +10:00
Tim Allen 8d5cc0c35e Update to v099r15 release.
byuu says:

Changelog:
- nall::lstring -> nall::string_vector
- added IntegerBitField<type, lo, hi> -- hopefully it works correctly...
- Multitap 1-4 -> Super Multitap 2-5
- fixed SFC PPU CGRAM read regression
- huge amounts of SFC PPU IO register cleanups -- .bits really is lovely
- re-added the read/write(VRAM,OAM,CGRAM) helpers for the SFC PPU
  - but they're now optimized to the realities of the PPU (16-bit data
    sizes / no address parameter / where appropriate)
  - basically used to get the active-display overrides in a unified place;
    but also reduces duplicate code in (read,write)IO
2016-07-04 21:48:17 +10:00
Tim Allen 82293c95ae Update to v099r14 release.
byuu says:

Changelog:
- (u)int(max,ptr) abbreviations removed; use _t suffix now [didn't feel
  like they were contributing enough to be worth it]
- cleaned up nall::integer,natural,real functionality
  - toInteger, toNatural, toReal for parsing strings to numbers
  - fromInteger, fromNatural, fromReal for creating strings from numbers
  - (string,Markup::Node,SQL-based-classes)::(integer,natural,real)
    left unchanged
  - template<typename T> numeral(T value, long padding, char padchar)
    -> string for print() formatting
    - deduces integer,natural,real based on T ... cast the value if you
      want to override
    - there still exists binary,octal,hex,pointer for explicit print()
      formatting
- lstring -> string_vector [but using lstring = string_vector; is
  declared]
  - would be nice to remove the using lstring eventually ... but that'd
    probably require 10,000 lines of changes >_>
- format -> string_format [no using here; format was too ambiguous]
- using integer = Integer<sizeof(int)*8>; and using natural =
  Natural<sizeof(uint)*8>; declared
  - for consistency with boolean. These three are meant for creating
    zero-initialized values implicitly (various uses)
- R65816::io() -> idle() and SPC700::io() -> idle() [more clear; frees
  up struct IO {} io; naming]
- SFC CPU, PPU, SMP use struct IO {} io; over struct (Status,Registers) {}
  (status,registers); now
  - still some CPU::Status status values ... they didn't really fit into
    IO functionality ... will have to think about this more
- SFC CPU, PPU, SMP now use step() exclusively instead of addClocks()
  calling into step()
- SFC CPU joypad1_bits, joypad2_bits were unused; killed them
- SFC PPU CGRAM moved into PPU::Screen; since nothing else uses it
- SFC PPU OAM moved into PPU::Object; since nothing else uses it
  - the raw uint8[544] array is gone. OAM::read() constructs values from
    the OAM::Object[512] table now
  - this avoids having to determine how we want to sub-divide the two
    OAM memory sections
  - this also eliminates the OAM::synchronize() functionality
- probably more I'm forgetting

The FPS fluctuations are driving me insane. This WIP went from 128fps to
137fps. Settled on 133.5fps for the final build. But nothing I changed
should have affected performance at all. This level of fluctuation makes
it damn near impossible to know whether I'm speeding things up or slowing
things down with changes.
2016-07-01 21:50:32 +10:00
Tim Allen 7a68059f78 Update to v099r12 release.
byuu says:

Changelog:
- fixed FC AxROM / VRC7 regression
- BitField split to BooleanBitField/NaturalBitField (in preparation
  for IntegerBitField)
- BitFieldReference removed
- GB CPU cleaned up
- GB Cartridge + Mappers cleaned up
- SFC CGRAM is now emulated as uint15[256] instead of uint[512]
- sfc/ppu/memory.cpp no longer needed; removed
- purged SFC Debugger hooks for now (some of the operator[] calls were
  bypassing them anyway)

Unfortunately, for reasons that defy all semblance of logic, the CGRAM
change caused a slight speed hit. As have the last few changes. We're
now down to around 129.5fps compared to 123.fps for v099 and 134.5fps
at our peak (v099r01-r02).

I really like the style I came up with for the Game Boy mappers to settle
the purpose(ROM,RAM) vs (rom,ram)Purpose naming convention. If I ever get
around to redoing the NES mappers, that's likely the approach I'll take.
2016-06-28 20:43:47 +10:00
Tim Allen 3e807946b8 Update to v099r11 release.
byuu says:

Changelog:
- NES PPU core updated to use BitFields (absolutely massive improvement
  in code readability)
- NES APU core updated to new coding style
- NES cartridge/board and cartridge/chip updated to new coding style
- pushed NES PPU rendering one dot forward (doesn't fix King's Quest V
  yet, sadly)
- fixed SNES PPU BG tilemask for 128KiB VRAM mode (doesn't fix Yoshi's
  Island, though)

So ... I kind of went overboard with the fc/cartridge changes. This WIP
diff is 185KiB >_>
I didn't realize it was going to be as big a task as it was, but once
I started everything broke in a chain reaction, so I had to do it all
at once.

There's a massive chance we've broken a bunch of NES things. Any typos
in this WIP are going to be absolutely insidious to track down =(

But ... supposing I pulled it off, this means the Famicom core is now
fully converted to the new coding style as well. That leaves only the
GB and GBA cores. Once those are finished, then we'll finally be free
of these gigantic hellspawn diffs.
2016-06-27 23:07:57 +10:00
Tim Allen 3a9c7c6843 Update to v099r09 release.
byuu says:

Changelog:
- Emulator::Interface::Medium::bootable removed
- Emulator::Interface::load(bool required) argument removed
  [File::Required makes no sense on a folder]
- Super Famicom.sys now has user-configurable properties (CPU,PPU1,PPU2
  version; PPU1 VRAM size, Region override)
- old nall/property removed completely
- volatile flags supported on coprocessor RAM files now (still not in
  icarus, though)
- (hopefully) fixed SNES Multitap support (needs testing)
- fixed an OAM tiledata range clipping limit in 128KiB VRAM mode (doesn't
  fix Yoshi's Island, sadly)
- (hopefully, again) fixed the input polling bug hex_usr reported
- re-added dialog box for when File::Required files are missing
  - really cool: if you're missing a boot ROM, BIOS ROM, or IPL ROM,
    it warns you immediately
  - you don't have to select a game before seeing the error message
    anymore
- fixed cheats.bml load/save location
2016-06-25 18:53:11 +10:00
Tim Allen ccd8878d75 Update to v099r07 release.
byuu says:

Changelog:
- (hopefully) fixed BS Memory and Sufami Turbo slot loading
- ported GB, GBA, WS cores to use nall/vfs
- completely removed loadRequest, saveRequest functionality from
  Emulator::Interface and ui-tomoko
  - loadRequest(folder) is now load(folder)
- save states now use a shared Emulator::SerializerVersion string
  - whenever this is bumped, all older states will break; but this makes
    bumping state versions way easier
  - also, the version string makes it a lot easier to identify
    compatibility windows for save states
- SNES PPU now uses uint16 vram[32768] for memory accesses [hex_usr]

NOTE: Super Game Boy loading is currently broken, and I'm not entirely
sure how to fix it :/
The file loading handoff was -really- complicated, and so I'm kind of
at a loss ... so for now, don't try it.
Everything else should theoretically work, so please report any bugs
you find.

So, this is pretty much it. I'd be very curious to hear feedback from
people who objected to the old nall/stream design, whether they are
happy with the new file loading system or think it could use further
improvements.

The 16-bit VRAM turned out to be a wash on performance (roughly the same
as before. 1fps slower on Zelda 3, 1fps faster on Yoshi's Island.) The
main reason for this was because Yoshi's Island was breaking horribly
until I changed the vramRead, vramWrite functions to take uint15 instead
of uint16.

I suspect the issue is we're using uint16s in some areas now that need
to be uint15, and this game is setting the VRAM address to 0x8000+,
causing us to go out of bounds on memory accesses.

But ... I want to go ahead and do something cute for fun, and just because
we can ... and this new interface is so incredibly perfect for it!! I
want to support an SNES unit with 128KiB of VRAM. Not out of the box,
but as a fun little tweakable thing. The SNES was clearly designed to
support that, they just didn't use big enough VRAM chips, and left one
of the lines disconnected. So ... let's connect it anyway!

In the end, if we design it right, the only code difference should be
one area where we mask by 15-bits instead of by 16-bits.
2016-06-24 22:09:30 +10:00
Tim Allen 875f031182 Update to v099r06 release.
byuu says:

Changelog:
- Super Famicom core converted to use nall/vfs
  - excludes Super Game Boy; since that's invoked from inside the GB core

This was definitely the major obstacle to test nall/vfs'
applicability. Things worked out pretty great in the end.

We went from 22.0KiB (cartridge) + 18.6KiB (interface) to 24.5KiB
(cartridge) + 11.4KiB (interface). Or 40.7KiB to 36.0KiB. This removes
a very large source of indirection. Before it was: "coprocessor <=>
cartridge <=> interface" for loading and saving data, and now it's just
"coprocessor <=> cartridge". And it may make sense to eventually turn
this into just "cartridge -> coprocessor" by making each coprocessor
class handle its own markup parsing.

It's nice to have all the manifest parsing in one location (well, sans
MSU1); but it's also nice for loading/unloading to be handled by each
coprocessor itself. So I'll have to think longer about that one.

I've also started handling Interface::save() differently. Instead of
keeping track of memory IDs and filenames, and iterating through that
vector of objects ... instead I now have a system that mirrors the markup
parsing on loading, but handles saving instead. This was actually the
reason the code size savings weren't more significant, but I like this
style more. As before, it removes an extra level of indirection.

So ... next up, I need to port over the GB, then GBA, then WS
cores. These shouldn't take too long since they're all very simple with
just ROM+RAM(+RTC) right now. Then get the SGB callbacks using vfs. Then
after that, gut all the old stream stuff from nall and higan. Kill the
(load,save)Request stuff, rename the load(Gamepak)Request to something
simpler, and then we should be good.

Anyway ... these are some huge changes.
2016-06-24 22:01:03 +10:00
Tim Allen f04d9d58f5 Update to v099r05 release.
byuu says:

Changelog:
- added nall/vfs
- converted Famicom core to use nall/vfs interface instead of nall/stream
  interface
2016-06-20 21:00:32 +10:00
Tim Allen 44a8c5a2b4 Update to v099r03 release.
byuu says:

Changelog:
- finished cleaning up the SFC core to my new coding conventions
- removed sfc/controller/usart (superseded by 21fx)
- hid Synchronize Video option from the menu (still in the configuration
  file)

Pretty much the only minor detail left is some variable names in the
SA-1 core that really won't look good at all if I move to camelCase,
so I'll have to rethink how I handle those. It's probably a good area
to attempt using BitFields, to see how it impacts performance. But I'll
do that in a test branch first.

But for the most part, this should be the end of the gigantic diffs (this
one was 174KiB), at least for the SFC/WS cores. Still have the FC/GB/GBA
cores to clean up more fully. Assuming we don't spot any new regressions,
we should be ~95% out of the woods on code cleanups breaking things.
2016-06-17 23:03:54 +10:00
Tim Allen ae5b4c3bb3 Update to v099r01 release.
byuu says:

Changelog:
- massive cleanups and optimizations on the PPU core
- ~9% speedup over v099 official

This is pretty much it for the low-hanging fruit of speeding up higan. Any
more gains from this point will be extremely hard-fought, unfortunately.
2016-06-14 20:51:54 +10:00
Tim Allen 50420e3dd2 Update to v098r19 release.
byuu says:

Changelog:
- added nall/bit-field.hpp
- updated all CPU cores (sans LR35902 due to some complexities) to use
  BitFields instead of bools
- updated as many CPU cores as I could to use BitFields instead of union {
  struct { uint8_t ... }; }; pairs

The speed changes are mostly a wash for this. In some instances,
I noticed a ~2-3% speedup (eg SNES emulation), and in others a 2-3%
slowdown (eg Famicom emulation.) It's within the margin of error, so
it's safe to say it has no impact.

This does give us a lot of new useful things, however:

- no more manual reconstruction of flag values from lots of left shifts
  and ORs
- no more manual deconstruction of flag values from lots of ANDs
- ability to get completely free aliases to flag groups (eg GSU can
  provide alt2, alt1 and also alt (which is alt2,alt1 combined)
- removes the need for the nasty order_lsbN macro hack (eventually will
  make higan 100% endian independent)
- saves us from insane compilers that try and do nasty things with
  alignment on union-structs
- saves us from insane compilers that try to store bit-field bits in
  reverse order
- will allow some really novel new use cases (I'm planning an
  instant-decode ARM opcode function, for instance.)
- reduces code size (we can serialize flag registers in one line instead
  of one for each flag)

However, I probably won't use it for super critical code that's constantly
reading out register values (eg PPU MMIO registers.) I think there we
would end up with a performance penalty.
2016-06-09 08:26:35 +10:00
Tim Allen b08449215a Update to v098r18 release.
byuu says:

Changelog:
- hiro: fixed the BrowserDialog column resizing when navigating to new
  folders (prevents clipping of filenames)
  - note: this is kind of a quick-fix; but I have a good idea how to do
    the proper fix now
- nall: added BitField<T, Lo, Hi> class
  - note: not yet working on the SFC CPU class; need to go at it with
    a debugger to find out what's happening
- GB: emulated DMG/SGB STAT IRQ bug; fixes Zerd no Densetsu and Road Rash
  (won't fix anything else; don't get hopes up)
2016-06-07 21:55:03 +10:00
Tim Allen 20ac95ee49 Update to v098r15 release.
byuu says:

Changelog:
- removed template usage from processor/spc700; cleaned up many function
  names and the switch table
  - object size: 176.8kb => 127.3kb
  - source code size: 43.5kb => 37.0kb
- fixed processor/r65816 BRK/COP vector regression [hex_usr]
- corrected HuC3 unmapped RAM read value; fixes Robopon [endrift]
- cosmetic: simplified the butterworth constant calculation
  [Wolfram|Alpha]

The SPC700 core changes took forever, about three hours of work.

Only the LR35902 and R6502 still need their template functions
removed. The point of this is that it doesn't cause any speed penalty
to do so, and it results in smaller binary sizes and faster compilation
times.
2016-06-05 14:52:43 +10:00
Tim Allen fdc41611cf Update to v098r14 release.
byuu says:

Changelog:
- improved attenuation of biquad filter by computing butterworth Q
  coefficients correctly (instead of using the same constant)
- adding 1e-25 to each input sample into the biquad filters to try and
  prevent denormalization
- updated normalization from [0.0 to 1.0] to [-1.0 to +1.0]; volume/reverb
  happen in floating-point mode now
- good amount of work to make the base Emulator::Audio support any number
  of output channels
  - so that we don't have to do separate work on left/right channels;
    and can instead share the code for each channel
- Emulator::Interface::audioSample(int16 left, int16 right); changed to:
  - Emulator::Interface::audioSample(double* samples, uint channels);
  - samples are normalized [-1.0 to +1.0]
  - for now at least, channels will be the value given to
    Emulator::Audio::reset()
- fixed GUI crash on startup when audio driver is set to None

I'm probably going to be updating ruby to accept normalized doubles as
well; but I'm not sure if I will try and support anything other 2-channel
audio output. It'll depend on how easy it is to do so; perhaps it'll be
a per-driver setting.

The denormalization thing is fierce. If that happens, it drops the
emulator framerate from 220fps to about 20fps for Game Boy emulation. And
that happens basically whenever audio output is silent. I'm probably
also going to make a nall/denormal.hpp file at some point with
platform-specific functionality to set the CPU state to "denormals as
zero" where applicable. I'll still add the 1e-25 offset (inaudible)
as another fallback.
2016-06-01 21:23:22 +10:00
Tim Allen 839813d0f1 Update to v098r13 release.
byuu says:

Changelog:
- nall/dsp returns with new iir/biquad.hpp and resampler/cubic.hpp files
- nall/queue.hpp added (simple ring buffer ... nall/vector wouldn't
  cause too many moves with FIFO)
- audio streams now only buffer 20ms; so even if multiple audio streams
  desync, latency can never exceed 20ms
- replaced blackman windwed sinc FIR hermite audio filter with transposed
  direct form II biquadratic sixth-order IIR butterworth filter (better
  attenuation of frequencies above 20KHz, faster, no need for decimation,
  less code)
- put in experimental eight-tap echo filter (a lot better than what I
  had before, but still rather weak)
- substantial cleanups to the SuperFX GSU processor core (slightly
  faster, 479KB->100KB object file, 42.7KB->33.4KB source code size,
  way less code duplication)

We'll definitely want to test the whole SuperFX library (not many games)
just to make sure there's no regressions caused by this one.

Not sure what I want to do with audio processing effects yet. I've always
really wanted lots of fun controls to customize audio, and now finally
with this new biquad filter, I can finally start implementing real
effects. For instance, an equalizer wouldn't be too complicated anymore.

The new reverb effect is still a poor man's version. I need to find human
readable source for implementing a comb-filter properly. I'm pretty sure
I can already treat nall::queue as an all-pass filter since all that
does is phase shift (fancy audio term for "delay audio"). What's really
going to be hard is figuring out how to expose user-friendly settings for
controlling it. It looks like you need a bunch of coprime coefficients,
and I don't think casual users are going to be able to hand-enter coprime
values to get the echo effect they want. I uh ... don't even know how
to calculate coprime values dynamically right now >_> But we're going
to have to, as they are correlated to the output sampling rate.

We'll definitely want to make some audio profiles so that users can
quickly select pre-configured themes that sound nice, but expose the
underlying coefficients so that they can tweak stuff to their liking. This
isn't just about higan, this is about me trying to learn digital signal
processing, so please don't be too upset about feature creep or anything
on this.

Anyway ... I'm having some difficulties with my audio right now. When
the reverb effect is enabled, there's a bunch of static on system
reset for just a moment. But this should not be possible. nall::queue
is initializing all previous reverb sample elements to 0.0. I don't
understand where static is coming in from. Further, we have the same
issue with both the windowed sinc and the biquad filters ... a bit of
a popping sound when starting a game. Any help tracking this down would
be appreciated.

There's also one really annoying issue ... I can't seem to do reverb
or volume adjustments with normalized samples. If I say "volume *= 0.5"
in higan/audio/audio.cpp line 68, it doesn't just halve the volume, it
adds a whole bunch of distortion. This makes absolutely zero sense to
me. The sample values are between 0.0 (mute) and 1.0 (full volume) here,
so multiplying a double by 0.5 shouldn't cause distortion. So right now,
I'm doing these adjustments with less precision after denormalizing back
to int16. Anyone ever see something like that? :/
2016-06-01 08:29:36 +10:00
Tim Allen ae5d380d06 Update to v098r11 release.
byuu says:

Changelog:
- fixed nall/path.hpp compilation issue
- fixed ruby/audio/xaudio header declaration compilation issue (again)
- cleaned up xaudio2.hpp file to match my coding syntax (12.5% of the
  file was whitespace overkill)
- added null terminator entry to nall/windows/utf8.hpp argc[] array
- nall/windows/guid.hpp uses the Windows API for generating the GUID
  - this should stop all the bug reports where two nall users were
    generating GUIDs at the exact same second
- fixed hiro/cocoa compilation issue with uint# types
- fixed major higan/sfc Super Game Boy audio latency issue
- fixed higan/sfc CPU core bug with pei, [dp], [dp]+y instructions
- major cleanups to higan/processor/r65816 core
  - merged emulation/native-mode opcodes
  - use camel-case naming on memory.hpp functions
  - simplify address masking code for memory.hpp functions
  - simplify a few opcodes themselves (avoid redundant copies, etc)
  - rename regs.* to r.* to match modern convention of other CPU cores
- removed device.order<> concept from Emulator::Interface
  - cores will now do the translation to make the job of the UI easier
- fixed plurality naming of arrays in Emulator::Interface
  - example: emulator.ports[p].devices[d].inputs[i]
  - example: vector<Medium> media
- probably more surprises

Major show-stoppers to the next official release:
- we need to work on GB core improvements: LY=153/0 case, multiple STAT
  IRQs case, GBC audio output regs, etc.
- we need to re-add software cursors for light guns (Super Scope,
  Justifier)
- after the above, we need to fix the turbo button for the Super Scope

I really have no idea how I want to implement the light guns. Ideally,
we'd want it in higan/video, so we can support the NES Zapper with the
same code. But this isn't going to be easy, because only the SNES knows
when its output is interlaced, and its resolutions can vary as
{256,512}x{224,240,448,480} which requires pixel doubling that was
hard-coded to the SNES-specific behavior, but isn't appropriate to be
exposed in higan/video.
2016-05-25 21:13:02 +10:00
Tim Allen 3ebc77c148 Update to v098r10 release.
byuu says:

Changelog:
- synchronized tomoko, loki, icarus with extensive changes to nall
  (118KiB diff)
2016-05-16 19:51:12 +10:00
Tim Allen 6ae0abe3d3 Update to v098r09 release.
byuu says:

Changelog:
- fixed major nall/vector/prepend bug
- renamed hiro/ListView to hiro/TableView
- added new hiro/ListView control which is a simplified abstraction of
  hiro/TableView
- updated higan's cheat database window and icarus' scan dialog to use
  the new ListView control
- compilation works once again on all platforms (Windows, Cocoa, GTK,
  Qt)
- the loki skeleton compiles once again (removed nall/DSP references;
  updated port/device ID names)

Small catch: need to capture layout resize events internally in Windows
to call resizeColumns. For now, just resize the icarus window to get it
to use the full window width for list view items.
2016-05-04 20:07:13 +10:00
Tim Allen 0955295475 Update to v098r08 release.
byuu says:

Changelog:
- nall/vector rewritten from scratch
- higan/audio uses nall/vector instead of raw pointers
- higan/sfc/coprocessor/sdd1 updated with new research information
- ruby/video/glx and ruby/video/glx2: fuck salt glXSwapIntervalEXT!

The big change here is definitely nall/vector. The Windows, OS X and Qt
ports won't compile until you change some first/last strings to
left/right, but GTK will compile.

I'd be really grateful if anyone could stress-test nall/vector. Pretty
much everything I do relies on this class. If we introduce a bug, the
worst case scenario is my entire SFC game dump database gets corrupted,
or the byuu.org server gets compromised. So it's really critical that we
test the hell out of this right now.

The S-DD1 changes mean you need to update your installation of icarus
again. Also, even though the Lunar FMV never really worked on the
accuracy core anyway (it didn't initialize the PPU properly), it really
won't work now that we emulate the hard-limit of 16MiB for S-DD1 games.
2016-05-02 19:57:04 +10:00
Tim Allen 1929ad47d2 Update to v098r03 release.
byuu says:

It took several hours, but I've rebuilt much of the SNES' bus memory
mapping architecture.

The new design unifies the cartridge string-based mapping
("00-3f,80-bf:8000-ffff") and internal bus.map calls. The map() function
now has an accompanying unmap() function, and instead of a fixed 256
callbacks, it'll scan to find the first available slot. unmap() will
free slots up when zero addresses reference a given slot.

The controllers and expansion port are now both entirely dynamic.
Instead of load/unload/power/reset, they only have the constructor
(power/reset/load) and destructor (unload). What this means is you can
now dynamically change even expansion port devices after the system is
loaded.

Note that this is incredibly dangerous and stupid, but ... oh well. The
whole point of this was for 21fx. There's no way to change the expansion
port device prior to loading a game, but if the 21fx isn't active, then
the reset vector hijack won't work. Now you can load a 21fx game, change
the expansion port device, and simply reset the system to active the
device.

The unification of design between controller port devices and expansion
port devices is nice, and overall this results in a reduction of code
(all of the Mapping stuff in Cartridge is gone, replaced with direct bus
mapping.) And there's always the potential to expand this system more in
the future now.

The big missing feature right now is the ability to push/pop mappings.
So if you look at how the 21fx does the reset vector, you might vomit
a little bit. But ... it works.

Also changed exit(0) to _exit(0) in the POSIX version of nall::execute.

[The _exit(0) thing is an attempt to make higan not crash when it tries
to launch icarus and it's not on $PATH. The theory is that higan forks,
then the child tries to exec icarus and fails, so it exits, all the
unique_ptrs clean up their resources and tell the X server to free
things the parent process is still using. Calling _exit() prevents
destructors from running, and seems to prevent the problem. -Ed.]
2016-04-09 20:21:18 +10:00
Tim Allen 19e1d89f00 Update to v098r01 release.
byuu says:

Changelog:
- SFC: balanced profile removed
- SFC: performance profile removed
- SFC: code for handling non-threaded CPU, SMP, DSP, PPU removed
- SFC: Coprocessor, Controller (and expansion port) shared Thread code
  merged to SFC::Cothread
  - Cothread here just means "Thread with CPU affinity" (couldn't think
    of a better name, sorry)
- SFC: CPU now has vector<Thread*> coprocessors, peripherals;
  - this is the beginning of work to allow expansion port devices to be
    dynamically changed at run-time
- ruby: all audio drivers default to 48000hz instead of 22050hz now if
  no frequency is assigned
  - note: the WASAPI driver can default to whatever the native frequency
    is; doesn't have to be 48000hz
- tomoko: removed the ability to change the frequency from the UI (but
  it will display the frequency used)
- tomoko: removed the timing settings panel
  - the goal is to work toward smooth video via adaptive sync
  - the model is broken by not being in control of the audio frequency
    anyway
  - it's further broken by PAL running at 50hz and WSC running at 75hz
  - it was always broken anyway by SNES interlace timing varying from
    progressive timing
- higan: audio/ stub created (for now, it's just nall/dsp/ moved here
  and included as a header)
- higan: video/ stub created
- higan/GNUmakefile: now includes build rules for essential components
  (libco, emulator, audio, video)

The audio changes are in preparation to merge wareya's awesome WASAPI
work without the need for the nall/dsp resampler.
2016-04-09 13:40:12 +10:00
Tim Allen 7dc62e3a69 Update to v097r19 release.
byuu says:

Changelog:
- fixed nall/windows/guard.hpp
- fixed hiro/(windows,gtk)/header.hpp
- fixed Famicom PPU OAM reads (mask the correct bits when writing)
  [hex_usr]
- removed the need for (system := system) lines from higan/GNUmakefile
- added "All" option to filetype dropdown for ROM loading
  - allows loading GBC games in SGB mode (and technically non-GB(C)
    games, which will obviously fail to do anything)
- loki can load and play game folders now (command-line only) (extremely
  unimpressive; don't waste your time :P)
  - the input is extremely hacked in as a quick placeholder; not sure
    how I'm going to do mapping yet for it
2016-03-13 11:22:14 +11:00
Tim Allen fc7d5991ce Update to v097r18 release.
byuu says:

Changelog:
- fixed SNES sprite priority regression from r17
- added nall/windows/guard.hpp to guard against global namespace
  pollution (similar to nall/xorg/guard.hpp)
- almost fixed Windows compilation (still accuracy profile only, sorry)
- finished porting all of gba/ppu's registers over to the new .bit,.bits
  format ... all GBA registers.cpp files gone now
- the "processors :=" line in the target-$(ui)/GNUmakefile is no longer
  required
  - processors += added to each emulator core
  - duplicates are removed using the new nall/GNUmakefile's $(unique)
    function
- SFC core can be compiled without the GB core now
  - "-DSFC_SUPERGAMEBOY" is required to build in SGB support now (it's
    set in target-tomoko/GNUmakefile)
- started once again on loki (higan/target-loki/) [as before, loki is
  Linux/BSD only on account of needing hiro::Console]

loki shouldn't be too horrendous ... I hope. I just have the base
skeleton ready for now. But the code from v094r08 should be mostly
copyable over to it. It's just that it's about 50KiB of incredibly
tricky code that has to be just perfect, so it's not going to be quick.
But at least with the skeleton, it'll be a lot easier to pick away at it
as I want.

Windows compilation fix: move hiro/windows/header.hpp line 18 (header
guard) to line 16 instead.
2016-03-13 11:22:14 +11:00
Tim Allen 29be18ce0c Update to v097r17 release.
byuu says:

Changelog:
- ruby: if DirectSoundCreate fails (no sound device present), return
  false from init instead of crashing
- nall: improved edge case return values for
  (basename,pathname,dirname,...)
- nall: renamed file_system_object class to inode
- nall: varuint_t replaced with VariadicNatural; which contains
  .bit,.bits,.byte ala Natural/Integer
- nall: fixed boolean compilation error on Windows
- WS: popa should not restore SP
- GBA: rewrote the CPU/APU cores to use the .bit,.bits functions;
  removed registers.cpp from each

Note that the GBA changes are extremely major. This is about five hours
worth of extremely delicate work. Any slight errors could break
emulation in extremely bad ways. Let's hold off on extensive testing
until the next WIP, after I do the same to the PPU.

So far ... endrift's SOUNDCNT_X I/O test is failing, although that code
didn't change, so clearly I messed up SOUNDCNT_H somehow ...

To compile on Windows:

1. change nall/string/platform.hpp line 47 to

    return slice(result, 0, 3);

2. change ruby/video.wgl.cpp line 72 to

    auto lock(uint32_t*& data, uint& pitch, uint width, uint height) -> bool {

3. add this line to the very top of hiro/windows/header.cpp:

    #define boolean FuckYouMicrosoft
2016-03-13 11:22:14 +11:00
Tim Allen 4b29f4bad7 Update to v097r15 release.
byuu says:

Changelog:
- higan now uses Natural<Size>/Integer<Size> for its internal types
- Super Famicom emulation now uses uint24 instead of uint for bus
  addresses (it's a 24-bit bus)
- cleaned up gb/apu MMIO writes
- cleaned up sfc/coprocessor/msu1 MMIO writes
- ~3% speed penalty

I've wanted to do that 24-bit bus thing for so long, but have always
been afraid of the speed impact. It's probably going to hurt
balanced/performance once they compile again, but it wasn't significant
enough to harm the accuracy core's frame rate, thankfully. Only lost one
frame per second.

The GBA core handlers are clearly going to take a lot more work. The
bit-ranges will make it substantially easier to handle, though. Lots of
32-bit registers where certain values span multiple bytes, but we have
to be able to read/write at byte-granularity.
2016-02-16 20:32:49 +11:00