bsnes/higan/gba/ppu/object.cpp

96 lines
2.9 KiB
C++
Raw Normal View History

Update to v102r19 release. byuu says: Note: add `#undef OUT` to the top of higan/gba/ppu/ppu.hpp to compile on Windows (ugh ...) Now to await posts about this in four more threads again ;) Changelog: - GBA: rewrote PPU from a scanline-based renderer to a pixel-based renderer - ruby: fixed video/gdi bugs Note that there's an approximately 21% speed penalty compared to v102r18 for the pixel-based renderer. Also, horizontal mosaic effects are not yet implemented. But they should be prior to v103. This one is a little tricky as it currently works on fully rendered scanlines. I need to roll the mosaic into the background renderers, and then for sprites, well ... see below. The trickiest part by far of this new renderer is the object (sprite) system. Unlike every other system I emulate, the GBA supports affine rendering of its sprites. Or in other words, rotation effects. And it also has a very complex priority system. Right now, I can't see any way that the GBA PPU could render pixels in real-time like this. My belief is that there's a 240-entry buffer that fills up the next scanline's row of pixels. Which means it probably also runs on the last scanline of Vblank so that the first scanline has sprite data. However, I didn't design my object renderer like this just yet. For now, it creates a buffer of all 240 pixels right away at the start of the scanline. I know\!\! That's technically scanline-based. But it's only for fetching object tiledata, and it's only temporary. What needs to happen is I need a way to run something like a "mini libco thread" inside of the main thread, so that the object renderer can run in parallel with the rest of the PPU, yet not be a hideous abomination of a state machine, yet also not be horrendously slow as a full libco thread would be. I'm envisioning some kind of stackless yielding coroutine. But I'll need to think through how to design that, given the absence of coroutines even in C++17.
2017-06-04 03:16:44 +00:00
auto PPU::Objects::scanline(uint y) -> void {
mosaicOffset = 0;
Update to v102r19 release. byuu says: Note: add `#undef OUT` to the top of higan/gba/ppu/ppu.hpp to compile on Windows (ugh ...) Now to await posts about this in four more threads again ;) Changelog: - GBA: rewrote PPU from a scanline-based renderer to a pixel-based renderer - ruby: fixed video/gdi bugs Note that there's an approximately 21% speed penalty compared to v102r18 for the pixel-based renderer. Also, horizontal mosaic effects are not yet implemented. But they should be prior to v103. This one is a little tricky as it currently works on fully rendered scanlines. I need to roll the mosaic into the background renderers, and then for sprites, well ... see below. The trickiest part by far of this new renderer is the object (sprite) system. Unlike every other system I emulate, the GBA supports affine rendering of its sprites. Or in other words, rotation effects. And it also has a very complex priority system. Right now, I can't see any way that the GBA PPU could render pixels in real-time like this. My belief is that there's a 240-entry buffer that fills up the next scanline's row of pixels. Which means it probably also runs on the last scanline of Vblank so that the first scanline has sprite data. However, I didn't design my object renderer like this just yet. For now, it creates a buffer of all 240 pixels right away at the start of the scanline. I know\!\! That's technically scanline-based. But it's only for fetching object tiledata, and it's only temporary. What needs to happen is I need a way to run something like a "mini libco thread" inside of the main thread, so that the object renderer can run in parallel with the rest of the PPU, yet not be a hideous abomination of a state machine, yet also not be horrendously slow as a full libco thread would be. I'm envisioning some kind of stackless yielding coroutine. But I'll need to think through how to design that, given the absence of coroutines even in C++17.
2017-06-04 03:16:44 +00:00
for(auto& pixel : buffer) pixel = {};
if(ppu.blank() || !io.enable) return;
for(auto& object : ppu.object) {
uint8 py = y - object.y;
if(object.affine == 0 && object.affineSize == 1) continue; //hidden
if(py >= object.height << object.affineSize) continue; //offscreen
uint rowSize = io.mapping == 0 ? 32 >> object.colors : object.width >> 3;
uint baseAddress = object.character << 5;
if(object.mosaic && io.mosaicHeight) {
int mosaicY = (y / (1 + io.mosaicHeight)) * (1 + io.mosaicHeight);
py = object.y >= 160 || mosaicY - object.y >= 0 ? mosaicY - object.y : 0;
}
int16 pa = ppu.objectParam[object.affineParam].pa;
int16 pb = ppu.objectParam[object.affineParam].pb;
int16 pc = ppu.objectParam[object.affineParam].pc;
int16 pd = ppu.objectParam[object.affineParam].pd;
//center-of-sprite coordinates
int16 centerX = object.width >> 1;
int16 centerY = object.height >> 1;
Update to v087r30 release. byuu says: Changelog: - DMA channel masks added (some are 27-bit source/target and some are 14-bit length -- hooray, varuint_t class.) - No more state.pending flags. Instead, we set dma.pending flag when we want a transfer (fixes GBA Video - Pokemon audio) [Cydrak] - fixed OBJ Vmosaic [Cydrak, krom] - OBJ cannot read <=0x13fff in BG modes 3-5 (fixes the garbled tile at the top-left of some games) - DMA timing should be much closer to hardware now, but probably not perfect - PPU frame blending uses blargg's bit-perfect, rounded method (slower, but what can you do?) - GBA carts really unload now - added nall/gba/cartridge.hpp: used when there is no manifest. Scans ROMs for library tags, and selects the first valid one found - added EEPROM auto-detection when EEPROM size=0. Forces disk/save state size to 8192 (otherwise states could crash between pre and post detect.) - detects first read after a set read address command when the size is zero, and sets all subsequent bit-lengths to that value, prints detected size to terminal - added nall/nes/cartridge.hpp: moves iNES detection out of emulation core. Important to note: long-term goal is to remove all nall/(system)/cartridge.hpp detections from the core and replace with databases. All in good time. Anyway, the GBA workarounds should work for ~98.5% of the library, if my pre-scanning was correct (~40 games with odd tags. I reject ones without numeric versions now, too.) I think we're basically at a point where we can release a new version now. Compatibility should be relatively high (at least for a first release), and fixes are only going to affect one or two games at a time. I'd like to start doing some major cleaning house internally (rename NES->Famicom, SNES->SuperFamicom and such.) Would be much wiser to do that on a .01 WIP to minimize regressions. The main problems with a release now: - speed is pretty bad, haven't really optimized much yet (not sure how much we can improve it yet, this usually isn't easy) - sound isn't -great-, but the GBA audio sucks anyway :P - couple of known bugs (Sonic X video, etc.)
2012-04-22 10:49:19 +00:00
Update to v102r19 release. byuu says: Note: add `#undef OUT` to the top of higan/gba/ppu/ppu.hpp to compile on Windows (ugh ...) Now to await posts about this in four more threads again ;) Changelog: - GBA: rewrote PPU from a scanline-based renderer to a pixel-based renderer - ruby: fixed video/gdi bugs Note that there's an approximately 21% speed penalty compared to v102r18 for the pixel-based renderer. Also, horizontal mosaic effects are not yet implemented. But they should be prior to v103. This one is a little tricky as it currently works on fully rendered scanlines. I need to roll the mosaic into the background renderers, and then for sprites, well ... see below. The trickiest part by far of this new renderer is the object (sprite) system. Unlike every other system I emulate, the GBA supports affine rendering of its sprites. Or in other words, rotation effects. And it also has a very complex priority system. Right now, I can't see any way that the GBA PPU could render pixels in real-time like this. My belief is that there's a 240-entry buffer that fills up the next scanline's row of pixels. Which means it probably also runs on the last scanline of Vblank so that the first scanline has sprite data. However, I didn't design my object renderer like this just yet. For now, it creates a buffer of all 240 pixels right away at the start of the scanline. I know\!\! That's technically scanline-based. But it's only for fetching object tiledata, and it's only temporary. What needs to happen is I need a way to run something like a "mini libco thread" inside of the main thread, so that the object renderer can run in parallel with the rest of the PPU, yet not be a hideous abomination of a state machine, yet also not be horrendously slow as a full libco thread would be. I'm envisioning some kind of stackless yielding coroutine. But I'll need to think through how to design that, given the absence of coroutines even in C++17.
2017-06-04 03:16:44 +00:00
//origin coordinates (top-left of sprite)
int28 originX = -(centerX << object.affineSize);
int28 originY = -(centerY << object.affineSize) + py;
//fractional pixel coordinates
int28 fx = originX * pa + originY * pb;
int28 fy = originX * pc + originY * pd;
for(uint px : range(object.width << object.affineSize)) {
uint sx, sy;
if(!object.affine) {
sx = px ^ (object.hflip ? object.width - 1 : 0);
sy = py ^ (object.vflip ? object.height - 1 : 0);
} else {
sx = (fx >> 8) + centerX;
sy = (fy >> 8) + centerY;
}
uint9 bx = object.x + px;
if(bx < 240 && sx < object.width && sy < object.height) {
uint offset = (sy >> 3) * rowSize + (sx >> 3);
offset = offset * 64 + (sy & 7) * 8 + (sx & 7);
uint8 color = ppu.readObjectVRAM(baseAddress + (offset >> !object.colors));
if(object.colors == 0) color = sx & 1 ? color >> 4 : color & 15;
if(color) {
if(object.mode & 2) {
buffer[bx].window = true;
} else if(!buffer[bx].enable || object.priority < buffer[bx].priority) {
if(object.colors == 0) color = object.palette * 16 + color;
buffer[bx].enable = true;
buffer[bx].priority = object.priority;
buffer[bx].color = ppu.pram[256 + color];
buffer[bx].translucent = object.mode == 1;
buffer[bx].mosaic = object.mosaic;
}
}
}
fx += pa;
fy += pc;
}
Update to v087r30 release. byuu says: Changelog: - DMA channel masks added (some are 27-bit source/target and some are 14-bit length -- hooray, varuint_t class.) - No more state.pending flags. Instead, we set dma.pending flag when we want a transfer (fixes GBA Video - Pokemon audio) [Cydrak] - fixed OBJ Vmosaic [Cydrak, krom] - OBJ cannot read <=0x13fff in BG modes 3-5 (fixes the garbled tile at the top-left of some games) - DMA timing should be much closer to hardware now, but probably not perfect - PPU frame blending uses blargg's bit-perfect, rounded method (slower, but what can you do?) - GBA carts really unload now - added nall/gba/cartridge.hpp: used when there is no manifest. Scans ROMs for library tags, and selects the first valid one found - added EEPROM auto-detection when EEPROM size=0. Forces disk/save state size to 8192 (otherwise states could crash between pre and post detect.) - detects first read after a set read address command when the size is zero, and sets all subsequent bit-lengths to that value, prints detected size to terminal - added nall/nes/cartridge.hpp: moves iNES detection out of emulation core. Important to note: long-term goal is to remove all nall/(system)/cartridge.hpp detections from the core and replace with databases. All in good time. Anyway, the GBA workarounds should work for ~98.5% of the library, if my pre-scanning was correct (~40 games with odd tags. I reject ones without numeric versions now, too.) I think we're basically at a point where we can release a new version now. Compatibility should be relatively high (at least for a first release), and fixes are only going to affect one or two games at a time. I'd like to start doing some major cleaning house internally (rename NES->Famicom, SNES->SuperFamicom and such.) Would be much wiser to do that on a .01 WIP to minimize regressions. The main problems with a release now: - speed is pretty bad, haven't really optimized much yet (not sure how much we can improve it yet, this usually isn't easy) - sound isn't -great-, but the GBA audio sucks anyway :P - couple of known bugs (Sonic X video, etc.)
2012-04-22 10:49:19 +00:00
}
Update to v102r19 release. byuu says: Note: add `#undef OUT` to the top of higan/gba/ppu/ppu.hpp to compile on Windows (ugh ...) Now to await posts about this in four more threads again ;) Changelog: - GBA: rewrote PPU from a scanline-based renderer to a pixel-based renderer - ruby: fixed video/gdi bugs Note that there's an approximately 21% speed penalty compared to v102r18 for the pixel-based renderer. Also, horizontal mosaic effects are not yet implemented. But they should be prior to v103. This one is a little tricky as it currently works on fully rendered scanlines. I need to roll the mosaic into the background renderers, and then for sprites, well ... see below. The trickiest part by far of this new renderer is the object (sprite) system. Unlike every other system I emulate, the GBA supports affine rendering of its sprites. Or in other words, rotation effects. And it also has a very complex priority system. Right now, I can't see any way that the GBA PPU could render pixels in real-time like this. My belief is that there's a 240-entry buffer that fills up the next scanline's row of pixels. Which means it probably also runs on the last scanline of Vblank so that the first scanline has sprite data. However, I didn't design my object renderer like this just yet. For now, it creates a buffer of all 240 pixels right away at the start of the scanline. I know\!\! That's technically scanline-based. But it's only for fetching object tiledata, and it's only temporary. What needs to happen is I need a way to run something like a "mini libco thread" inside of the main thread, so that the object renderer can run in parallel with the rest of the PPU, yet not be a hideous abomination of a state machine, yet also not be horrendously slow as a full libco thread would be. I'm envisioning some kind of stackless yielding coroutine. But I'll need to think through how to design that, given the absence of coroutines even in C++17.
2017-06-04 03:16:44 +00:00
}
auto PPU::Objects::run(uint x, uint y) -> void {
output = {};
if(ppu.blank() || !io.enable) {
mosaic = {};
return;
}
Update to v102r19 release. byuu says: Note: add `#undef OUT` to the top of higan/gba/ppu/ppu.hpp to compile on Windows (ugh ...) Now to await posts about this in four more threads again ;) Changelog: - GBA: rewrote PPU from a scanline-based renderer to a pixel-based renderer - ruby: fixed video/gdi bugs Note that there's an approximately 21% speed penalty compared to v102r18 for the pixel-based renderer. Also, horizontal mosaic effects are not yet implemented. But they should be prior to v103. This one is a little tricky as it currently works on fully rendered scanlines. I need to roll the mosaic into the background renderers, and then for sprites, well ... see below. The trickiest part by far of this new renderer is the object (sprite) system. Unlike every other system I emulate, the GBA supports affine rendering of its sprites. Or in other words, rotation effects. And it also has a very complex priority system. Right now, I can't see any way that the GBA PPU could render pixels in real-time like this. My belief is that there's a 240-entry buffer that fills up the next scanline's row of pixels. Which means it probably also runs on the last scanline of Vblank so that the first scanline has sprite data. However, I didn't design my object renderer like this just yet. For now, it creates a buffer of all 240 pixels right away at the start of the scanline. I know\!\! That's technically scanline-based. But it's only for fetching object tiledata, and it's only temporary. What needs to happen is I need a way to run something like a "mini libco thread" inside of the main thread, so that the object renderer can run in parallel with the rest of the PPU, yet not be a hideous abomination of a state machine, yet also not be horrendously slow as a full libco thread would be. I'm envisioning some kind of stackless yielding coroutine. But I'll need to think through how to design that, given the absence of coroutines even in C++17.
2017-06-04 03:16:44 +00:00
output = buffer[x];
//horizontal mosaic
if(!output.mosaic || ++mosaicOffset >= 1 + io.mosaicWidth) {
mosaicOffset = 0;
mosaic = output;
}
Update to v102r19 release. byuu says: Note: add `#undef OUT` to the top of higan/gba/ppu/ppu.hpp to compile on Windows (ugh ...) Now to await posts about this in four more threads again ;) Changelog: - GBA: rewrote PPU from a scanline-based renderer to a pixel-based renderer - ruby: fixed video/gdi bugs Note that there's an approximately 21% speed penalty compared to v102r18 for the pixel-based renderer. Also, horizontal mosaic effects are not yet implemented. But they should be prior to v103. This one is a little tricky as it currently works on fully rendered scanlines. I need to roll the mosaic into the background renderers, and then for sprites, well ... see below. The trickiest part by far of this new renderer is the object (sprite) system. Unlike every other system I emulate, the GBA supports affine rendering of its sprites. Or in other words, rotation effects. And it also has a very complex priority system. Right now, I can't see any way that the GBA PPU could render pixels in real-time like this. My belief is that there's a 240-entry buffer that fills up the next scanline's row of pixels. Which means it probably also runs on the last scanline of Vblank so that the first scanline has sprite data. However, I didn't design my object renderer like this just yet. For now, it creates a buffer of all 240 pixels right away at the start of the scanline. I know\!\! That's technically scanline-based. But it's only for fetching object tiledata, and it's only temporary. What needs to happen is I need a way to run something like a "mini libco thread" inside of the main thread, so that the object renderer can run in parallel with the rest of the PPU, yet not be a hideous abomination of a state machine, yet also not be horrendously slow as a full libco thread would be. I'm envisioning some kind of stackless yielding coroutine. But I'll need to think through how to design that, given the absence of coroutines even in C++17.
2017-06-04 03:16:44 +00:00
}
auto PPU::Objects::power() -> void {
memory::fill(&io, sizeof(IO));
for(auto& pixel : buffer) pixel = {};
output = {};
mosaic = {};
mosaicOffset = 0;
Update to v087r30 release. byuu says: Changelog: - DMA channel masks added (some are 27-bit source/target and some are 14-bit length -- hooray, varuint_t class.) - No more state.pending flags. Instead, we set dma.pending flag when we want a transfer (fixes GBA Video - Pokemon audio) [Cydrak] - fixed OBJ Vmosaic [Cydrak, krom] - OBJ cannot read <=0x13fff in BG modes 3-5 (fixes the garbled tile at the top-left of some games) - DMA timing should be much closer to hardware now, but probably not perfect - PPU frame blending uses blargg's bit-perfect, rounded method (slower, but what can you do?) - GBA carts really unload now - added nall/gba/cartridge.hpp: used when there is no manifest. Scans ROMs for library tags, and selects the first valid one found - added EEPROM auto-detection when EEPROM size=0. Forces disk/save state size to 8192 (otherwise states could crash between pre and post detect.) - detects first read after a set read address command when the size is zero, and sets all subsequent bit-lengths to that value, prints detected size to terminal - added nall/nes/cartridge.hpp: moves iNES detection out of emulation core. Important to note: long-term goal is to remove all nall/(system)/cartridge.hpp detections from the core and replace with databases. All in good time. Anyway, the GBA workarounds should work for ~98.5% of the library, if my pre-scanning was correct (~40 games with odd tags. I reject ones without numeric versions now, too.) I think we're basically at a point where we can release a new version now. Compatibility should be relatively high (at least for a first release), and fixes are only going to affect one or two games at a time. I'd like to start doing some major cleaning house internally (rename NES->Famicom, SNES->SuperFamicom and such.) Would be much wiser to do that on a .01 WIP to minimize regressions. The main problems with a release now: - speed is pretty bad, haven't really optimized much yet (not sure how much we can improve it yet, this usually isn't easy) - sound isn't -great-, but the GBA audio sucks anyway :P - couple of known bugs (Sonic X video, etc.)
2012-04-22 10:49:19 +00:00
}