Commit Graph

11 Commits

Author SHA1 Message Date
Tim Allen fbc1571889 Update to v106r85 release.
byuu says:

The bad instruction was due to the instruction before it fetching one
too many bytes. Didn't notice right away as the disassembler got it
right.

The register map was incorrect on the active 16-bit flags.

I fixed and improved some other things along those lines. Hooked up some
basic KnGE (VPU) timings, made it print out VRAM and some of the WRAM
onto the screen each frame, tried to drive Vblank and Hblank IRQs, but
... I don't know for sure what vector addresses they belong to.

MAME says "INT4" for Vblank, and says nothing for Hblank. I am wildly
guessing INT4==SWI 4==0xffff10, but ... I have no idea. I'm also not
emulating the interrupts properly based on line levels, I'm just firing
on the 0→1 transitions. Sounds like Vblank is more nuanced too, but I
guess we'll see.

Emulation is running further along now, even to the point of it
successfully enabling the KnGE IRQs, but VRAM doesn't appear to get much
useful stuff written into it yet.

I reverted the nall/primitive changes, so request for testing is I guess
rescinded, for whatever it was worth.
2019-01-22 11:26:20 +11:00
Tim Allen 53843934c0 Update to v106r84 release.
byuu says:

Changelog:

  - fixed a few TLCS900H CPU and disassembler bugs
  - hooked up a basic Neo Geo Pocket emulator skeleton and memory map;
    can run a few instructions from the BIOS
  - emulated the flash memory used by Neo Geo Pocket games
  - added sourcery to the higan source archives
  - fixed ternary expressions in sfc/ppu-fast [hex_usr]
2019-01-21 16:27:24 +11:00
Tim Allen 37b610da53 Update to v106r83 release.
byuu says:

Changelog:

  - reverted nall/inline-if.hpp usage for now, since the
    nall/primitives.hpp math operators still cast to (u)int64_t
  - improved nall/primitives.hpp more; integer8 x = -128; print(-x) will
    now print 128 (unary operator+ and - cast to (u)int64_t)
  - renamed processor/lr35902 to processor/sm83; after the Sharp SM83
    CPU core [gekkio discovered the name]
  - a few bugfixes to the TLCS900H CPU core
  - completed the disassembler for the TLCS900H core

As a result of reverting most of the inline if stuff, I guess the
testing priority has been reduced. Which is probably a good thing,
considering I seem to have a smaller pool of testers these days.

Indeed, the TLCS900H core has ended up at 131KiB compared to the M68000
core at 128KiB. So it's now the largest CPU core in all of higan. It's
even more ridiculous because the M68000 core would ordinarily be quite a
bit smaller, had I not gone overboard with the extreme templating to
reduce instruction decoding overhead (you kind of have to do this for
RISC CPUs, and the inverted design of the TLCS900H kind of makes it
infeasible to do the same there.)

This CPU core is bound to have dozens of extremely difficult CPU bugs,
and there's no easy way for me to test them. I would greatly appreciate
any help in looking over the core for bugs. A fresh pair of eyes to spot
a mistake could save me up to several days of tedious debugging work.

The core still isn't ready to actually be tested: I have to hook up
cartridge loading, a memory bus, interrupts, timers, and the micro DMA
controller before it's likely that anything happens at all.
2019-01-19 12:34:17 +11:00
Tim Allen 25145f59cc Update to v106r80 release.
byuu says:

Any usage of natural and integer cast to 64-bit math operations now.
Hopefully this will be the last of the major changes for a bit on
nall/primitives, at least until serious work begins on removing implicit
conversion to primitive types.

I also completed the initial TLCS900H core, sans SWI (kind of a ways off
from support interrupts.) I really shouldn't say completed, though. The
micro DMA unit is missing, interrupt priority handling is missing,
there's no debugger, and, of course, there's surely dozens of absolutely
critical CPU bugs that are going to be an absolute hellscape nightmare
to track down.

It was a damn shame, right up until the very last eight instructions,
[CP|LD][I|D](R), the instruction encoding was consistent. Of course,
there could be other inconsistencies that I missed. In fact, that's
somewhat likely ... sigh.
2019-01-16 00:09:50 +11:00
Tim Allen 6871e0e32a Update to v106r78 release.
byuu says:

I've implemented a lot more TLCS900H instructions. There are currently
20 missing spots, all of which are unique instructions (well, MINC and
MDEC could be considered pairs of 3 each), from a map of 1024 slots.

After that, I have to write the disassembler. Then the memory bus. Then
I get to start the fun process of debugging this monstrosity.

Also new is nall/inline-if.hpp. Note that this file is technically a war
crime, so be careful when opening it. This replaces ternary() from the
previous WIP.
2019-01-14 17:16:28 +11:00
Tim Allen c9f7c6c4be Update to v106r76 release.
byuu says:

I added some useful new functions to nall/primitives:

    auto Natural<T>::integer() const -> Integer<T>;
    auto Integer<T>::natural() const -> Natural<T>;

These let you cast between signed and unsigned representation without
having to care about the value of T (eg if you take a Natural<T> as a
template parameter.) So for instance when you're given an unsigned type
but it's supposed to be a sign-extended type (example: signed
multiplication), eg Natural<T> → Integer<T>, you can just say:

    x = y.integer() * z.integer();

The TLCS900H core gained some more pesky instructions such as DAA, BS1F,
BS1B.

I stole an optimization from RACE for calculating the overflow flag on
addition. Assuming: z = x + y + c;

    Before: ~(x ^ y) & (x ^ z) & signBit;
    After: (x ^ z) & (y ^ z) & signBit;

Subtraction stays the same. Assuming: z = x - y - c;

    Same: (x ^ y) & (x ^ z) & signBit;

However, taking a speed penalty, I've implemented the carry computation
in a way that doesn't require an extra bit.

Adding before:

    uint9 z = x + y + c;
    c = z & 0x100;

Subtracting before:

    uint9 z = x - y - c;
    c = z & 0x100;

Adding after:

    uint8 z = x + y + c;
    c = z < x || z == x && c;

Subtracting after:

    uint8 z = x - y - c;
    c = z > x || z == x && c;

I haven't been able to code golf the new carry computation to be any
shorter, unless I include an extra bit, eg for adding:

    c = z < x + c;

But that defeats the entire point of the change. I want the computation
to work even when T is uintmax_t.

If anyone can come up with a faster method, please let me know.

Anyway ... I also had to split off INC and DEC because they compute
flags differently (word and long modes don't set flags at all, byte mode
doesn't set carry at all.)

I also added division by zero support, although I don't know if it's
actually hardware accurate. It's what other emulators do, though.
2019-01-11 12:51:18 +11:00
Tim Allen 95d0020297 Update to v106r75 release.
byuu says:

So tired ... so much left to do still ... sigh.

If someone's up for some code golf, open to suggestions on how to handle
the INTNEST control register. It's the only pure 16-bit register on the
system, and it breaks my `map`/`load`/`store<uint8,16,32>` abstraction.
Basically what I suspect happens is when you access INTNEST in 32-bit
mode, the upper 16-bits are just undefined (probably zero.) But
`map<uint32>(INTNEST)` must return a uint32& or nothing at all. So for the
time being, I'm just making store(ControlRegister) check if it's the
INTNEST ID, and clearing the upper bits of the written byte in that
case. It's hacky, but ... it's the best I can think of.

I added LDX, which is a 900H-only instruction, and the control register
map is for the 900/H CPU. I found the detailed differences between the
CPUs, and it doesn't look likely that I'm gonna support the 900 or
900/H1 at all. Not that there was a reason to anyway, but it's nice to
support more stuff when possible. Oh well.

The 4-byte instruction fetch queue is going to have to get implemented
inside fetch, or just not implemented at all ... not like I'd be able to
figure out the details of it anyway.

The manual isn't clear on how the MULA flags are calculated, but since
MUL doesn't set any flags, I assume the flags are based on the addition
after the multiplication, eg:

    uint32 a = indirect<int16>(XDE) * indirect<int16>(XHL);
    uint32 b = reg16; //opcode parameter
    uint32 c = a + b; //flags set based on a+b

No idea if it's right or not. It doesn't set carry or half-carry, so
it's not just simply the same as calling algorithmAdd.

Up to almost 70KB, not even halfway done, don't even have a disassembler
started yet. There's a real chance this could overtake the 68K for the
biggest CPU core in higan, although at this point I'm still thinking the
68K will end up larger.
2019-01-10 13:21:18 +11:00
Tim Allen 41148b1024 Update to v106r74 release.
byuu says:

So I spent the better part of eight hours refactoring the TLCS900H core
to be more flexible in light of new edge cases.

I'm now including the size information inside of the types (eg
Register<Byte>, Memory<Word>) rather than as parameters to the
instruction handlers. This allows me to eg implement RETI without
needing template arguments in all the instructions. pop(SR), pop(PC) can
deduce how much to pop off of the stack. It's still highly templated,
but not unrolling the 3-bit register indexes and instead going through
the switch table to access registers is going to hurt the performance a
good deal.

A benefit of this is that Register{A} != Register{WA} != Register{XWA}
anymore, despite them sharing IDs.

I also renamed read/write to load/store for the CPU core, because
implicit conversions are nasty. They all call the virtual read/write.

I added more instructions, improved the memory addressing mode support,
and some other things.

I got rid of Byte, Word, Long because there's too many alternate sizes
needed: int8, int16, uint24, etc.

Ran into a really annoying C++ case ...

    struct TLCS900H {
      template<typename T> auto store(Register<T> target, T source) -> void;
    };

If you call store(Register<uint32>(x), uint16(y)); it errors out since
the T types don't match. But you can't specialize it:

    template<typename T, typename U> auto store(Register<T>, U) -> void;
    template<typename U> auto TLCS900H::store<uint32, U>(Register<uint32>, U) -> void;

Because somehow it's 2019 and we still can't do partial template
specialization inside classes ...

So as a result, I had to make T source be type uint32 even for
Register<uint8> and Register<uint16>. Doesn't matter too much, just
annoying.
2019-01-09 10:36:03 +11:00
Tim Allen dbee893408 Update to v106r73 release.
byuu says:

This probably won't fix the use of register yet (I imagine ruby and hiro
will complain now), but ... oh well, it's a start. We'll get it
compiling again eventually.

I added JP, JR, JRL, LD instructions this time around. I'm also starting
to feel that Byte, Word, Long labels for the TLCS900H aren't really
working. There's cases of needing uint24, int8, int16, ... it may just
be better to name the types instead of trying to be fancy.

At this point, all of the easy instructions are in. Now it's down to a
whole lot of very awkward bit-manipulation and special-use instructions.
Sigh.
2019-01-07 18:59:04 +11:00
Tim Allen cb86cd116c Update to v106r72 release.
byuu says:

For this WIP, I added more TLCS900H instructions. All of the
ADC,ADD,SBB/SBC,SUB,AND,OR,XOR.CP,PUSH,POP instructions are in.

Still an incredible amount of work left to do on this core ... it has all kinds
of novel instructions that aren't on any other processors.

Still no disassembler support yet, so I can't even test what I'm doing. Fun!
2019-01-05 18:04:27 +11:00
Tim Allen 1a889ae232 Update to v106r71 release.
byuu says:

I started working on the Toshiba TLCS900H CPU core today.

It's basically, "what if we took the Z80, added in 32-bit support, added
in SPARC register windows, added a ton of additional addressing modes,
added control registers, and added a bunch of additional instructions?"
-- or in other words, it's basically hell for me.

It took several hours just to wrap my head around the way the opcode
decoder needed to function, but I think I have a decent strategy for
implementing it now.

I should have all of the first-byte register/memory address decoding in
place, although I'm sure there's lots of bugs. I don't have anything in
the way of a disassembler yet.
2019-01-05 11:35:26 +11:00