The loop was allocating one-too-many levels, as well as incorrect sizes
for each level. Probably not an issue as mipmapped render targets aren't
used, but the logic should be correct anyway.
This was a regression from the remove-everything-static-from-renderer
PR. As the comment indicates, it would be nice to move all of this logic
out of the Renderer constructor, but this is a much larger change.
The FrameBufferManager::CreateTexture (from the OpenGL backend) method introduced by commit 69cedf41 incorrectly compares the texture variable (which contains a name provided by glGenTextures) against GL_TEXTURE_2D_MULTISAMPLE_ARRAY and GL_TEXTURE_2D_MULTISAMPLE.
It should instead use the texture_type variable for this (as done in the first branch of the if).
This commit should have zero performance effect if SSBOs are supported.
If they aren't (e.g. on all Macs), this commit alters FramebufferManager
to attach a new stencil buffer and VertexManager to draw to it when
bounding box is active. `BBoxRead` gets the pixel data from the buffer
and dumbly loops through it to find the bounding box.
This patch can run Paper Mario: The Thousand-Year Door at almost full
speed (50–60 FPS) without Dual-Core enabled for all common bounding
box-using actions I tested (going through pipes, Plane Mode, Paper
Mode, Prof. Frankly's gate, combat, walking around the overworld, etc.)
on my computer (macOS 10.12.3, 2.8 GHz Intel Core i7, 16 GB 1600 MHz
DDR3, and Intel Iris 1536 MB).
A few more demanding scenes (e.g. the self-building bridge on the way
to Petalburg) slow to ~15% of their speed without this patch (though
they don't run quite at full speed even on master). The slowdown is
caused almost solely by `glReadPixels` in `OGL::BoundingBox::Get`.
Other implementation ideas:
- Use a stencil buffer that's separate from the depth buffer. This would
require ARB_texture_stencil8 / OpenGL 4.4, which isn't available on
macOS.
- Use `glGetTexImage` instead of `glReadPixels`. This is ~5 FPS slower
on my computer, presumably because it has to transfer the entire
combined depth-stencil buffer instead of only the stencil data.
Getting only stencil data from `glGetTexImage` requires
ARB_texture_stencil8 / OpenGL 4.4, which (again) is not available on
macOS.
- Don't use a PBO, and use `glReadPixels` synchronously. This has no
visible performance effect on my computer, and is theoretically
slower.
This stops the virtual method call from within the Renderer constructor.
The initialization here for GL had to be moved to VideoBackend, as the
Renderer constructor will not have been executed before the value is
required.
This moves all the byte swapping utilities into a header named Swap.h.
A dedicated header is much more preferable here due to the size of the
code itself. In general usage throughout the codebase, CommonFuncs.h was
generally only included for these functions anyway. These being in their
own header avoids dumping the lesser used utilities into scope. As well
as providing a localized area for more utilities related to byte
swapping in the future (should they be needed). This also makes it nicer
to identify which files depend on the byte swapping utilities in
particular.
Since this is a completely new header, moving the code uncovered a few
indirect includes, as well as making some other inclusions unnecessary.
Before #4581, an invocation of `SetBlendMode` could invoke
`glBlendEquationSeparate` and `glBlendFuncSeparate` even when it was
setting `glDisable(GL_BLEND)`. I couldn't figure out how to map the old
behavior over to the new BlendingState code, so I changed it to always
call the two blend functions.
Fixes https://bugs.dolphin-emu.org/issues/10120 : "Sonic Adventure 2
Battle: graphics crash when loading first Dark level".
Keeps associated data together. It also eliminates the possibility of out
parameters not being initialized properly. For example, consider the
following example:
-- some FramebufferManager implementation --
void FBMgrImpl::GetTargetSize(u32* width, u32* height) override
{
// Do nothing
}
-- somewhere else where the function is used --
u32 width, height;
framebuffer_manager_instance->GetTargetSize(&width, &height);
if (texture_width != width) <-- Uninitialized variable usage
{
...
}
It makes it much more obvious to spot any initialization issues, because
it requires something to be returned, as opposed to allowing an
implementation to just not do anything.
Making changes to ConfigManager.h has always been a pain, because
it means rebuilding half of Dolphin, since a lot of files depend on
and include this header.
However, it turns out some includes are unnecessary. This commit
removes ConfigManager includes from files which don't contain
SConfig or GPUDeterminismMode or GPU_DETERMINISM (which means the
ConfigManager include is not used).
(I've also had to get rid of some indirect includes.)
This fixes the screenshot stutter, as this needs more than a frame.
So we won't stall on the png writing at all until emulation stops or
a new screenshot is requested.
This is needed because for some reason the WSI for NV Vulkan drivers
doesn't return VK_ERROR_OUT_OF_DATE_KHR, so there is no other way to know
that a resize has occured apart from polling, which is a poor solution for
X11 (since it is blocking).
It's not available in OpenGL ES and officially it's not supported on OpenGL 3.0/3.1.
Fallback to old depth range code if there is no method to disable depth clipping.
It's more important to have correct clipping than to have accurate depth values.
Inaccurate depth values can be fixed by slow depth.
In TimeSplitters: Future Perfect, PerfQuery is used to detect
the visibility of lights and draw coronas. 25 points are drawn
for each light. However, the returned count was incorrectly
being divided by four leading to dim coronas.
Using 4x antialiasing was a workaround because of a bug where
antialiasing multiplied the PerfQuery results. This commit
fixes that bug too (but only for OpenGL).
Note: It's not 100% perfect, as some of the GPU capablities leak into the
pixel shader UID.
Currently our UIDs don't get exported, so there is no issue. But someone
might want to fix this in the future.
As much as possible, the asserts have been moved out of the GetUID
function. But there are some places where asserts depend on variables
that aren't stored in the shader UID.
Using glMapBufferRange to read back the contents of the SSBO is extremely
slow on NVIDIA drivers. This is more noticeable at higher internal
resolutions. Using glGetBufferSubData instead does not seem to exhibit
this slowdown.
The D3D backend was always forcing Anisotropic filtering when that is enabled regardless of how the game chose to configure the texture filtering registers; this causes the same issues as "Force Filtering" without Anisotropy, such as causing game UI elements to no longer line up adjacent correctly. Historically, OpenGL's Anisotropy support has always worked "better" than D3D's due to seeming to not have this problem; unfortunately, OpenGL's Anisotropy specification only gives GL_LINEAR based filtering modes defined behavior, with only the mipmap setting being required to be considered. Some OpenGL implementations were implicitly disabling Anisotropy when the min/mag filters were set to GL_NEAREST, but this behavior is not required by the spec so cannot be relied on.