Update to bsnes v038r05? release.

New WIP, this one's fairly big as nightlies go.

First, moved the priority queue to a generic implementation so I can
re-use it elsewhere in the future. Took a ~1% speed hit or so by using
functors for the callback and using the signed math trick to avoid the
need for a normalize() function. Sadly it gets up to 3% slower if the
priorityqueue class code isn't placed right next to the CPU core.

Second, while I failed miserably at using the queues for IRQ / NMI
testing, I did come up with a neat compromise. NMI is only tested once
per scanline, IRQs only have PPU dot precision (every 4 clocks), the
hold time for both is four clock cycles, and scanlines for both NTSC
and PAL, even on the short colorburst scanline, are always evenly
divisible by four.
... so testing every 2 clock cycles was kind of pointless, as it'd
always be false. Since the delays between the PPU counter and CPU
trigger for NMI is 2, and IRQ is 10, they even align again with an
offset of 2.
... hence, I can call poll_interrupts() half as often by using
if(ppu.hcounter() & 2). I reverse that for the Super Scope / Justifier
dot testing and cut their overhead in half as well.

That gives us a nice ~10-15% speedup. Nowhere near the idealistic
~30-40% for range tested IRQs, because that only actually tests once
per scanline (~1364 cycles). This just cuts ~682 tests down to ~341
tests. Still, it's pretty close to half as good while still being
super clean and easy. It greatly diminishes the value of a range-based
IRQ tester, as that will only offer a ~15-20% speedup now at best.
Getting PGO working again is the new lowest-hanging fruit.

I also eked out a tiny bit more speed by adding some previous missed
"else" statements in the irq_valid testing part.

With the newfound speed, I gave a tiny bit up (1-2%) to simplify and
improve some old edge cases. It's known that IRQs won't trigger on the
very last dot of each field. It's due to the way the V and H counters
are misaligned, that we can't easily emulate.

So before I had a bunch of cruft to support that, update_interrupts()
was called at the start of each scanline, which would call irq_valid()
to run a bunch of tests to make sure the latch positions would
actually work on hardware. Writes to $4207-420a would also call the
update_interrupts() proc.

I killed all that, and now compute the HTIME position inline in
poll_interrupts(), and perform the last dot check there. Since testing
is ten clocks behind anyway, then we need only check to see if VTIME >
0 and ppu.vcounter(-6 clocks) == 0 to know that it was set for the
last dot on any given field.

This gives us two nice perks for free: one, no more need to hard-code
scanlines/frame inside the CPU core; and two, the old version was
missing an edge case in interlace mode where odd fields would allow an
IRQ on the last dot, which was simply because my old irq_valid() test
didn't have a third condition for that.

All that said, I'm getting ~157.5fps instead of ~137.5fps now in Zelda
3.

Third, I removed grayscale/sepia/invert from the video settings panel,
and stuck them in advanced. Used the new space to add checkboxes for
NTSC merge fields and the start in fullscreen thing.

Reference:
    //called once every four clock cycles;
    //as NMI steps by scanlines (divisible by 4) and IRQ by PPU
    4-cycle dots.
    //
    //ppu.(vh)counter(n) returns the value of said counters n-clocks
    before current time;
    //it is used to emulate hardware communication delay between
    opcode and interrupt units.
    alwaysinline void sCPU::poll_interrupts() {
      //NMI hold
      if(status.nmi_hold) {
        status.nmi_hold = false;
        if(status.nmi_enabled) status.nmi_transition = true;
      }

      //NMI test
      bool nmi_valid = (ppu.vcounter(2) >= (!ppu.overscan() ? 225 :
    240));
      if(!status.nmi_valid && nmi_valid) {
        //0->1 edge sensitive transition
        status.nmi_line = true;
        status.nmi_hold = true;  //hold /NMI for four cycles
      } else if(status.nmi_valid && !nmi_valid) {
        //1->0 edge sensitive transition
        status.nmi_line = false;
      }
      status.nmi_valid = nmi_valid;

      //IRQ hold
      status.irq_hold = false;
      if(status.irq_line) {
        if(status.virq_enabled || status.hirq_enabled)
    status.irq_transition = true;
      }

      //IRQ test (unrolling the duplicate Nirq_enabled tests causes
    speed hit)
      bool irq_valid = (status.virq_enabled || status.hirq_enabled);
      if(irq_valid) {
        if((status.virq_enabled && ppu.vcounter(10) !=
    (status.virq_pos))
        || (status.hirq_enabled && ppu.hcounter(10) !=
    (status.hirq_pos + 1) * 4)
        || (status.virq_pos && ppu.vcounter(6) == 0)  //IRQs cannot
    trigger on last dot of field
        ) irq_valid = false;
      }
      if(!status.irq_valid && irq_valid) {
        //0->1 edge sensitive transition
        status.irq_line = true;
        status.irq_hold = true;  //hold /IRQ for four cycles
      }
      status.irq_valid = irq_valid;
    }

[No archive available]
This commit is contained in:
byuu 2009-01-04 15:41:00 +00:00
parent 155b4fbfcd
commit 3908890072

Diff Content Not Available