These are only ever used with ShaderCode instances and nothing else.
Given that, we can convert these helper functions to expect that type of
object as an argument and remove the need for templates, improving
compiler throughput a marginal amount, as the template instantiation
process doesn't need to be performed.
We can also move the definitions of these functions into the cpp file,
which allows us to remove a few inclusions from the ShaderGenCommon
header. This uncovered a few instances of indirect inclusions being
relied upon in other source files.
One other benefit is this allows changes to be made to the definitions
of the functions without needing to recompile all translation units that
make use of these functions, making change testing a little quicker.
Moving the definitions into the cpp file also allows us to completely
hide DefineOutputMember() from external view, given it's only ever used
inside of GenerateVSOutputMembers().
This was a regression from the remove-everything-static-from-renderer
PR. As the comment indicates, it would be nice to move all of this logic
out of the Renderer constructor, but this is a much larger change.
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
This stops the virtual method call from within the Renderer constructor.
The initialization here for GL had to be moved to VideoBackend, as the
Renderer constructor will not have been executed before the value is
required.
It's not available in OpenGL ES and officially it's not supported on OpenGL 3.0/3.1.
Fallback to old depth range code if there is no method to disable depth clipping.
It's more important to have correct clipping than to have accurate depth values.
Inaccurate depth values can be fixed by slow depth.