The FrameBufferManager::CreateTexture (from the OpenGL backend) method introduced by commit 69cedf41 incorrectly compares the texture variable (which contains a name provided by glGenTextures) against GL_TEXTURE_2D_MULTISAMPLE_ARRAY and GL_TEXTURE_2D_MULTISAMPLE.
It should instead use the texture_type variable for this (as done in the first branch of the if).
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
Keeps associated data together. It also eliminates the possibility of out
parameters not being initialized properly. For example, consider the
following example:
-- some FramebufferManager implementation --
void FBMgrImpl::GetTargetSize(u32* width, u32* height) override
{
// Do nothing
}
-- somewhere else where the function is used --
u32 width, height;
framebuffer_manager_instance->GetTargetSize(&width, &height);
if (texture_width != width) <-- Uninitialized variable usage
{
...
}
It makes it much more obvious to spot any initialization issues, because
it requires something to be returned, as opposed to allowing an
implementation to just not do anything.
SSAA relies on MSAA being active to work. We only supports 4x SSAA while in fact you can enable SSAA at any MSAA level.
I even managed to run 64xMSAA + SSAA on my Quadro which made some pretty sleek looking games. They were very cinematic though.
With this, it properly fixes up SSAA and MSAA support in GLES as well. Before they were broken when stereo rendering was enabled.
Now in GLES they can properly support MSAA and also stereo rendering with MSAA enabled(with proper extensions).
We are used to use the texture parameter for all util draw calls,
but AMD seems to have a bug where they use the sampler parameter
of stage 0 if no sampler is bound to the used stage.
So as workaround (and a bit as nicer code), we now use sampler
objects everywhere.