Update to bsnes v031r02? release.

New WIP. Please be sure to test a few games with this one to look for
regressions.

I got tired of using bit packing for CPU / SMP register flags, because
they do not mask the upper bits properly.

In other words, (assume big endian) if you have struct { uint8_t n:1,
v:1, m:1, x:1, d:1, i:1, z:1, c:1; } p; and you set p.m = 7; it will
set p.v and p.n as well. It doesn't cast the type to bool.

So I rewrote the old template struct trick, but bound it with a
reference rather than relying upon union alignment. Looks something
like this:

    template<int mask>
    struct CPUFlag {
      uint8 &data;

      inline operator bool() const { return data & mask; }
      inline CPUFlag& operator=(bool i) { data = (data & ~mask) | (-i
    & mask); return *this; }

      CPUFlag(uint8 &data_) : data(data_) {}
    };

    class CPURegFlags {

    public:
      uint8 data;
      CPUFlag<0x80> n;
      CPUFlag<0x40> v;
    ...
      CPURegFlags() : data(0), n(data), v(data), m(data), x(data),
    d(data), i(data), z(data), c(data) {}

    };


Surprisingly, benchmarks show this method is ~2x faster, but flags
were never a bottleneck so it won't affect bsnes' speed.

Anyway, with this, I decided to get rid of the confusing and stupid
!!() stuff all throughout the CMP and SMP opfn.cpp files. It's no
longer needed since the template assignment takes only a boolean
argument. Anything not zero becomes one with that.

So code such as this:

    uint8 sSMP::op_adc(uint8 x, uint8 y) {
    int16 r = x + y + regs.p.c;
      regs.p.n = !!(r & 0x80);
      regs.p.v = !!(~(x ^ y) & (y ^ (uint8)r) & 0x80);
      regs.p.h = !!((x ^ y ^ (uint8)r) & 0x10);
      regs.p.z = ((uint8)r == 0);
      regs.p.c = (r > 0xff);
      return r;
    }


Now looks like this:

    uint8 sSMP::op_adc(uint8 x, uint8 y) {
      int r = x + y + regs.p.c;
      regs.p.n = r & 0x80;
      regs.p.v = ~(x ^ y) & (x ^ r) & 0x80;
      regs.p.h = (x ^ y ^ r) & 0x10;
      regs.p.z = (uint8)r == 0;
      regs.p.c = r > 0xff;
      return r;
    }


I also took the time to figure out how the hell the overflow stuff
worked. Pretty neat stuff.

Essentially, overflow is set when you add/subtract two positive or two
negative numbers, and the result ends up with a different sign. Hence,
the sign overflowed, so your negative number is now positive, or vice
versa.

A simple way to simulate it is:
int result = (int8_t)x + (int8_t)y;
bool overflow = (result < -128 || result > 127);

But there's no reason to perform signed math, since the result can't
be used for anything else, not even any other flags, as the opcode
math is always unsigned.

So to implement it with this:
int result = (uint8_t)x + (uint8_t)y;

We just verify that both signs in x and y are the same, and that their
sign is different from the result to set overflow, eg:
bool overflow = (x & 0x80) == (y & 0x80) && (x & 0x80) != (result &
0x80);

But that's kind of slow. We can test a single bit for equality and
merge the &0x80's by using a XOR table:
0^0=0, 0^1=1, 1^0=1, 1^1=0
The trick here is that if the two bits are equal, we get 0, if they
are not equal, we get one.

So if we want to see if x&0x80 == y&0x80, we can do:
!((x ^ y) & 0x80);

... or we can simply invert the XOR result so that 1 = equal, 0 =
different, eg ~(x ^ y) & 0x80;

The latter is nice because it keeps the bit positions in-tact. Whereas
the former reduces to 1 or 0, the latter remains 0x80 or 0x00. This is
good for chaining, as I'll demonstrate below.

Do the same for the second test and we get:
bool overflow = ~(x ^ y) & 0x80 && (x ^ result) & 0x80;

We complement the former because we want to verify they are the same,
we don't for the latter because we want to verify that they have
changed.

Now we can basically use one more trick to combine the two bit masks
here. We want to return 1 when overflow is set, so we can look for a
pattern that will only return one when both the first and second tests
pass.

An AND table works great here. 0&0=0, 0&1=0, 1&0=0, 1&1=1. Only if
both are true do we end up with 1.

So this means we can AND the two results, and then mask the only bit
we care about once to get the result, eg:
bool overflow = ~(x ^ y) & (x ^ result) & 0x80;

And there we go, that's where that bizarre math trick comes from. I
realized while doing this something that bugged me in the past.

I used to think that for some reason, the S-SMP add overflow test
required x^y & y^r, whereas S-CPU add overflow used x^y & x^r.
Probably because I read the algorithm from Snes9x's sources or
something.

But that was flawed -- since addition is commutative, it doesn't
matter whether the latter is x^result or y^result. Only in subtraction
does the order matter, where you must always use x^result to test the
initial value every time.

Subtraction switches up things a little. It sets overflow only when
the signs of x and y are _different_, and when x and the result are
also different, eg:
bool overflow = (x ^ y) & (x ^ result) & 0x80;

Fun stuff, huh?

So I was wanting this tested thoroughly, just in case there was a typo
or something when updating the opfn.cpp files.

---

That said, I also polished up the UI a bit. Moved disabled to the
bottom of the speed regulation list, and added key / joypad bindings
for "exit emulator", "speed regulation increase / decrease" and
"frameskip increase / decrease".

I know these key bindings do not update the menubar radiobox positions
yet. I'll get that taken care of shortly.

[No archive available]
This commit is contained in:
byuu 2008-04-19 12:03:00 +00:00
parent b895f29bed
commit 340d86845a

Diff Content Not Available