bsnes/ruby/video/opengl/main.hpp

214 lines
7.4 KiB
C++
Raw Normal View History

Update to 20180728 release. byuu says: Sigh, I seem to be spiraling a bit here ... but the work is very important. Hopefully I can get a solid WIP together soon. But for now... I've integrated dynamic rate control into ruby::Audio via setDynamic(bool) for now. It's very demanding, as you would expect. When it's not in use, I realized the OSS driver's performance was pretty bad due to calling write() for every sample for every channel. I implemented a tiny 256-sample buffer and bsnes went from 290fps to 330fps on my FreeBSD desktop. It may be possible to do the same buffering with DRC, but for now, I'm not doing so, and adjusting the audio input frequency on every sample. I also added ruby::Video::setFlush(bool), which is available only in the OpenGL drivers, and this causes glFinish() to be called after swapping display buffers. I really couldn't think of a good name for this, "hard GPU sync" sounds kind of silly. In my view, flush is what commits queued events. Eg fflush(). OpenGL of course treats glFlush differently (I really don't even know what the point of it is even after reading the manual ...), and then has glFinish ... meh, whatever. It's setFlush(bool) until I come up with something better. Also as expected, this one's a big hit to performance. To implement the DRC, I started putting helper functions into the ruby video/audio/input core classes. And then the XVideo driver started crashing. It took hours and hours and hours to track down the problem: you have to clear XSetWindowAttributes to zero before calling XCreateWindow. No amount of `--sync`, `gdb break gdk_x_error`, `-Og`, etc will make Xlib be even remotely helpful in debugging errors like this. The GLX, GLX2, and XVideo drivers basically worked by chance before. If the stack frame had the right memory cleared, it worked. Otherwise it'd crash with BadValue, and my changing things broke that condition on the XVideo driver. So this has been fixed in all three now. Once XVideo was running again, I realized that non-power of two video sizes were completely broken for the YUV formats. It took a while, but I managed to fix all of that as well. At this point, most of ruby is going to be broken outside of FreeBSD, as I still need to finish updating all the drivers.
2018-07-28 11:21:39 +00:00
auto OpenGL::setShader(const string& pathname) -> void {
for(auto& program : programs) program.release();
programs.reset();
settings.reset();
format = inputFormat;
filter = GL_LINEAR;
wrap = GL_CLAMP_TO_BORDER;
absoluteWidth = 0, absoluteHeight = 0;
relativeWidth = 0, relativeHeight = 0;
uint historySize = 0;
if(pathname == "None") {
filter = GL_NEAREST;
} else if(pathname == "Blur") {
filter = GL_LINEAR;
} else if(directory::exists(pathname)) {
auto document = BML::unserialize(file::read({pathname, "manifest.bml"}));
for(auto node : document["settings"]) {
settings.insert({node.name(), node.text()});
}
for(auto node : document["input"]) {
if(node.name() == "history") historySize = node.natural();
if(node.name() == "format") format = glrFormat(node.text());
if(node.name() == "filter") filter = glrFilter(node.text());
if(node.name() == "wrap") wrap = glrWrap(node.text());
}
for(auto node : document["output"]) {
string text = node.text();
if(node.name() == "width") {
Update to v099r14 release. byuu says: Changelog: - (u)int(max,ptr) abbreviations removed; use _t suffix now [didn't feel like they were contributing enough to be worth it] - cleaned up nall::integer,natural,real functionality - toInteger, toNatural, toReal for parsing strings to numbers - fromInteger, fromNatural, fromReal for creating strings from numbers - (string,Markup::Node,SQL-based-classes)::(integer,natural,real) left unchanged - template<typename T> numeral(T value, long padding, char padchar) -> string for print() formatting - deduces integer,natural,real based on T ... cast the value if you want to override - there still exists binary,octal,hex,pointer for explicit print() formatting - lstring -> string_vector [but using lstring = string_vector; is declared] - would be nice to remove the using lstring eventually ... but that'd probably require 10,000 lines of changes >_> - format -> string_format [no using here; format was too ambiguous] - using integer = Integer<sizeof(int)*8>; and using natural = Natural<sizeof(uint)*8>; declared - for consistency with boolean. These three are meant for creating zero-initialized values implicitly (various uses) - R65816::io() -> idle() and SPC700::io() -> idle() [more clear; frees up struct IO {} io; naming] - SFC CPU, PPU, SMP use struct IO {} io; over struct (Status,Registers) {} (status,registers); now - still some CPU::Status status values ... they didn't really fit into IO functionality ... will have to think about this more - SFC CPU, PPU, SMP now use step() exclusively instead of addClocks() calling into step() - SFC CPU joypad1_bits, joypad2_bits were unused; killed them - SFC PPU CGRAM moved into PPU::Screen; since nothing else uses it - SFC PPU OAM moved into PPU::Object; since nothing else uses it - the raw uint8[544] array is gone. OAM::read() constructs values from the OAM::Object[512] table now - this avoids having to determine how we want to sub-divide the two OAM memory sections - this also eliminates the OAM::synchronize() functionality - probably more I'm forgetting The FPS fluctuations are driving me insane. This WIP went from 128fps to 137fps. Settled on 133.5fps for the final build. But nothing I changed should have affected performance at all. This level of fluctuation makes it damn near impossible to know whether I'm speeding things up or slowing things down with changes.
2016-07-01 11:50:32 +00:00
if(text.endsWith("%")) relativeWidth = toReal(text.trimRight("%", 1L)) / 100.0;
else absoluteWidth = text.natural();
}
if(node.name() == "height") {
Update to v099r14 release. byuu says: Changelog: - (u)int(max,ptr) abbreviations removed; use _t suffix now [didn't feel like they were contributing enough to be worth it] - cleaned up nall::integer,natural,real functionality - toInteger, toNatural, toReal for parsing strings to numbers - fromInteger, fromNatural, fromReal for creating strings from numbers - (string,Markup::Node,SQL-based-classes)::(integer,natural,real) left unchanged - template<typename T> numeral(T value, long padding, char padchar) -> string for print() formatting - deduces integer,natural,real based on T ... cast the value if you want to override - there still exists binary,octal,hex,pointer for explicit print() formatting - lstring -> string_vector [but using lstring = string_vector; is declared] - would be nice to remove the using lstring eventually ... but that'd probably require 10,000 lines of changes >_> - format -> string_format [no using here; format was too ambiguous] - using integer = Integer<sizeof(int)*8>; and using natural = Natural<sizeof(uint)*8>; declared - for consistency with boolean. These three are meant for creating zero-initialized values implicitly (various uses) - R65816::io() -> idle() and SPC700::io() -> idle() [more clear; frees up struct IO {} io; naming] - SFC CPU, PPU, SMP use struct IO {} io; over struct (Status,Registers) {} (status,registers); now - still some CPU::Status status values ... they didn't really fit into IO functionality ... will have to think about this more - SFC CPU, PPU, SMP now use step() exclusively instead of addClocks() calling into step() - SFC CPU joypad1_bits, joypad2_bits were unused; killed them - SFC PPU CGRAM moved into PPU::Screen; since nothing else uses it - SFC PPU OAM moved into PPU::Object; since nothing else uses it - the raw uint8[544] array is gone. OAM::read() constructs values from the OAM::Object[512] table now - this avoids having to determine how we want to sub-divide the two OAM memory sections - this also eliminates the OAM::synchronize() functionality - probably more I'm forgetting The FPS fluctuations are driving me insane. This WIP went from 128fps to 137fps. Settled on 133.5fps for the final build. But nothing I changed should have affected performance at all. This level of fluctuation makes it damn near impossible to know whether I'm speeding things up or slowing things down with changes.
2016-07-01 11:50:32 +00:00
if(text.endsWith("%")) relativeHeight = toReal(text.trimRight("%", 1L)) / 100.0;
else absoluteHeight = text.natural();
}
}
for(auto node : document.find("program")) {
uint n = programs.size();
programs(n).bind(this, node, pathname);
}
}
//changing shaders may change input format, which requires the input texture to be recreated
if(texture) { glDeleteTextures(1, &texture); texture = 0; }
glGenTextures(1, &texture);
glBindTexture(GL_TEXTURE_2D, texture);
glTexImage2D(GL_TEXTURE_2D, 0, format, width, height, 0, getFormat(), getType(), buffer);
allocateHistory(historySize);
}
auto OpenGL::allocateHistory(uint size) -> void {
for(auto& frame : history) glDeleteTextures(1, &frame.texture);
history.reset();
while(size--) {
OpenGLTexture frame;
frame.filter = filter;
frame.wrap = wrap;
glGenTextures(1, &frame.texture);
glBindTexture(GL_TEXTURE_2D, frame.texture);
glTexImage2D(GL_TEXTURE_2D, 0, format, frame.width = width, frame.height = height, 0, getFormat(), getType(), buffer);
history.append(frame);
}
}
Update to v094r23 release. byuu says: The library window is gone, and replaced with hiro::BrowserWindow::openFolder(). This gives navigation capabilities to game loading, and it also completes our slotted cart selection code. As an added bonus, it's less code this way, too. I also set the window size to consistent sizes between all emulated systems, so that switching between SFC and GB don't cause the window size to keep changing, and so that the scaling size is consistent (eg at normal scale, GB @ 3x is closer to SNES @ 2x.) This means black borders in GB/GBA mode, but it doesn't look that bad, and it's not like many people ever use these modes anyway. Finally, added the placeholder tabs for video, audio and timing. I don't intend to add the timing calculator code to v095 (it might be better as a separate tool), but I'll add the ability to set video/audio rates, at least. Glitch 1: despite selecting the first item in the BrowserDialog list, if you press enter when the window appears, it doesn't activate the item until you press an arrow key first. Glitch 2: in Game Boy mode, if you set the 4x window size, it's not honoring the full requested height because the viewport is smaller than the window. 8+ years of trying to get GTK+ and Qt to simply set the god damned window size I ask for, and I still can't get them to do it reliably. Remaining issues: - finish configuration panels (video, audio, timing) - fix ruby driver compilation on Windows - add DIP switch selection window (NSS) [I may end up punting this one to v096]
2015-05-30 11:39:09 +00:00
auto OpenGL::clear() -> void {
for(auto& p : programs) {
glUseProgram(p.program);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, p.framebuffer);
glClearColor(0, 0, 0, 1);
glClear(GL_COLOR_BUFFER_BIT);
}
glUseProgram(0);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);
glClearColor(0, 0, 0, 1);
glClear(GL_COLOR_BUFFER_BIT);
}
auto OpenGL::lock(uint32_t*& data, uint& pitch) -> bool {
pitch = width * sizeof(uint32_t);
return data = buffer;
}
auto OpenGL::output() -> void {
clear();
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, texture);
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, getFormat(), getType(), buffer);
struct Source {
GLuint texture;
uint width, height;
GLuint filter, wrap;
};
vector<Source> sources;
sources.prepend({texture, width, height, filter, wrap});
for(auto& p : programs) {
uint targetWidth = p.absoluteWidth ? p.absoluteWidth : outputWidth;
uint targetHeight = p.absoluteHeight ? p.absoluteHeight : outputHeight;
if(p.relativeWidth) targetWidth = sources[0].width * p.relativeWidth;
if(p.relativeHeight) targetHeight = sources[0].height * p.relativeHeight;
p.size(targetWidth, targetHeight);
glUseProgram(p.program);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, p.framebuffer);
glrUniform1i("phase", p.phase);
glrUniform1i("historyLength", history.size());
glrUniform1i("sourceLength", sources.size());
glrUniform1i("pixmapLength", p.pixmaps.size());
glrUniform4f("targetSize", targetWidth, targetHeight, 1.0 / targetWidth, 1.0 / targetHeight);
glrUniform4f("outputSize", outputWidth, outputHeight, 1.0 / outputWidth, 1.0 / outputHeight);
uint aid = 0;
for(auto& frame : history) {
glrUniform1i({"history[", aid, "]"}, aid);
glrUniform4f({"historySize[", aid, "]"}, frame.width, frame.height, 1.0 / frame.width, 1.0 / frame.height);
glActiveTexture(GL_TEXTURE0 + (aid++));
glBindTexture(GL_TEXTURE_2D, frame.texture);
glrParameters(frame.filter, frame.wrap);
}
uint bid = 0;
for(auto& source : sources) {
glrUniform1i({"source[", bid, "]"}, aid + bid);
glrUniform4f({"sourceSize[", bid, "]"}, source.width, source.height, 1.0 / source.width, 1.0 / source.height);
glActiveTexture(GL_TEXTURE0 + aid + (bid++));
glBindTexture(GL_TEXTURE_2D, source.texture);
glrParameters(source.filter, source.wrap);
}
uint cid = 0;
for(auto& pixmap : p.pixmaps) {
glrUniform1i({"pixmap[", cid, "]"}, aid + bid + cid);
glrUniform4f({"pixmapSize[", bid, "]"}, pixmap.width, pixmap.height, 1.0 / pixmap.width, 1.0 / pixmap.height);
glActiveTexture(GL_TEXTURE0 + aid + bid + (cid++));
glBindTexture(GL_TEXTURE_2D, pixmap.texture);
glrParameters(pixmap.filter, pixmap.wrap);
}
glActiveTexture(GL_TEXTURE0);
glrParameters(sources[0].filter, sources[0].wrap);
p.render(sources[0].width, sources[0].height, targetWidth, targetHeight);
glBindTexture(GL_TEXTURE_2D, p.texture);
p.phase = (p.phase + 1) % p.modulo;
sources.prepend({p.texture, p.width, p.height, p.filter, p.wrap});
}
uint targetWidth = absoluteWidth ? absoluteWidth : outputWidth;
uint targetHeight = absoluteHeight ? absoluteHeight : outputHeight;
if(relativeWidth) targetWidth = sources[0].width * relativeWidth;
if(relativeHeight) targetHeight = sources[0].height * relativeHeight;
glUseProgram(program);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);
glrUniform1i("source[0]", 0);
glrUniform4f("targetSize", targetWidth, targetHeight, 1.0 / targetWidth, 1.0 / targetHeight);
glrUniform4f("outputSize", outputWidth, outputHeight, 1.0 / outputWidth, 1.0 / outputHeight);
glrParameters(sources[0].filter, sources[0].wrap);
render(sources[0].width, sources[0].height, outputWidth, outputHeight);
if(history.size() > 0) {
OpenGLTexture frame = history.takeRight();
glBindTexture(GL_TEXTURE_2D, frame.texture);
if(width == frame.width && height == frame.height) {
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, width, height, getFormat(), getType(), buffer);
} else {
glTexImage2D(GL_TEXTURE_2D, 0, format, frame.width = width, frame.height = height, 0, getFormat(), getType(), buffer);
}
history.prepend(frame);
}
}
auto OpenGL::initialize(const string& shader) -> bool {
if(!OpenGLBind()) return false;
glDisable(GL_BLEND);
glDisable(GL_DEPTH_TEST);
glDisable(GL_POLYGON_SMOOTH);
glDisable(GL_STENCIL_TEST);
glEnable(GL_DITHER);
program = glCreateProgram();
vertex = glrCreateShader(program, GL_VERTEX_SHADER, OpenGLOutputVertexShader);
//geometry = glrCreateShader(program, GL_GEOMETRY_SHADER, OpenGLGeometryShader);
fragment = glrCreateShader(program, GL_FRAGMENT_SHADER, OpenGLFragmentShader);
OpenGLSurface::allocate();
glrLinkProgram(program);
setShader(shader);
return initialized = true;
}
auto OpenGL::terminate() -> void {
if(!initialized) return;
Update to 20180728 release. byuu says: Sigh, I seem to be spiraling a bit here ... but the work is very important. Hopefully I can get a solid WIP together soon. But for now... I've integrated dynamic rate control into ruby::Audio via setDynamic(bool) for now. It's very demanding, as you would expect. When it's not in use, I realized the OSS driver's performance was pretty bad due to calling write() for every sample for every channel. I implemented a tiny 256-sample buffer and bsnes went from 290fps to 330fps on my FreeBSD desktop. It may be possible to do the same buffering with DRC, but for now, I'm not doing so, and adjusting the audio input frequency on every sample. I also added ruby::Video::setFlush(bool), which is available only in the OpenGL drivers, and this causes glFinish() to be called after swapping display buffers. I really couldn't think of a good name for this, "hard GPU sync" sounds kind of silly. In my view, flush is what commits queued events. Eg fflush(). OpenGL of course treats glFlush differently (I really don't even know what the point of it is even after reading the manual ...), and then has glFinish ... meh, whatever. It's setFlush(bool) until I come up with something better. Also as expected, this one's a big hit to performance. To implement the DRC, I started putting helper functions into the ruby video/audio/input core classes. And then the XVideo driver started crashing. It took hours and hours and hours to track down the problem: you have to clear XSetWindowAttributes to zero before calling XCreateWindow. No amount of `--sync`, `gdb break gdk_x_error`, `-Og`, etc will make Xlib be even remotely helpful in debugging errors like this. The GLX, GLX2, and XVideo drivers basically worked by chance before. If the stack frame had the right memory cleared, it worked. Otherwise it'd crash with BadValue, and my changing things broke that condition on the XVideo driver. So this has been fixed in all three now. Once XVideo was running again, I realized that non-power of two video sizes were completely broken for the YUV formats. It took a while, but I managed to fix all of that as well. At this point, most of ruby is going to be broken outside of FreeBSD, as I still need to finish updating all the drivers.
2018-07-28 11:21:39 +00:00
setShader(""); //release shader resources (eg frame[] history)
OpenGLSurface::release();
if(buffer) { delete[] buffer; buffer = nullptr; }
initialized = false;
}