Using glMapBufferRange to read back the contents of the SSBO is extremely
slow on NVIDIA drivers. This is more noticeable at higher internal
resolutions. Using glGetBufferSubData instead does not seem to exhibit
this slowdown.
The D3D backend was always forcing Anisotropic filtering when that is enabled regardless of how the game chose to configure the texture filtering registers; this causes the same issues as "Force Filtering" without Anisotropy, such as causing game UI elements to no longer line up adjacent correctly. Historically, OpenGL's Anisotropy support has always worked "better" than D3D's due to seeming to not have this problem; unfortunately, OpenGL's Anisotropy specification only gives GL_LINEAR based filtering modes defined behavior, with only the mipmap setting being required to be considered. Some OpenGL implementations were implicitly disabling Anisotropy when the min/mag filters were set to GL_NEAREST, but this behavior is not required by the spec so cannot be relied on.
By default unique_ptr will call delete on the given type if an array
qualifier isn't present, not delete[]. It's important to explicitly
specify an array is being handled.
This fixes the crashes occuring at startup with a non-empty shader cache.
Because LinearDiskCache reads/writes to the storage of ShaderUid, ShaderUid must be trivially copyable.
Additionally, adds a static assert to LinearDiskCache to ensure this doesn't happen in the future.
The initialization of ShaderUid data has been moved to the code generation functions, so the above condition holds true.
This reverts commit 81414b4fa2, reversing
changes made to b926061f64.
Conflicts:
Source/Core/DolphinWX/Frame.cpp
Source/Core/VideoCommon/VideoConfig.cpp
Source/Core/VideoCommon/VideoConfig.h
Approximately three or four times now, the issue of pointers being
in an inconsistent state been an issue in the video backend renderers
with regards to tripping up other developers.
Global (ugh) resources are put into a unique_ptr and will always have a
well-defined state of being - null or not null
This was relying on behaviour that GLExtensions was adding fake extensions to the supported list with ES.
This no longer happens so it needed to be changed.
The spec says that vendors can set the max texture size to be 65KB and we want 1MB.
Check the maximum supported and drop to the max if it is less than 1MB
They were only called at once, so no need to seperate them.
This also removes the only dereference of the NativeVertexFormat in VideoCommon, so backends may just return nullptr.
It was only implemented in OpenGL, though the option was visible in both
backends, leading to memory leaks if you enabled it in DirectX.
And it wasn't particularly useful as a debug feature as it only showed
where in the EFB the copies were taken from, not what format it was, or
what the copy was used for, or what content was in the EFB at that point
in time.
Also, it stretched the copy regions relative to the window, so the
on-screen regions don't even line up with the window unless the game used
the full EFB (some pal games) and you game image stretched to the full
window.
Removed Quality Levels from D3D AA options
Dropdown text now shows whether you're applying MSAA or SSAA
Added a description for SSAA
Moved SSAA checkbox
Cleaned up AA in backends slightly. Supported modes is now a list of ints.
Instead of having special case code for efb2tex that ignores hashes,
the only diffence between efb2tex and efb2ram now is that efb2tex
writes zeros to the memory instead of actual texture data.
Though keep in mind, all efb2tex copies will have hashes of zero as
their hash.
Addded a few duplicated depth copy texture formats to the enum
in TextureDecoder.h. These texture formats were already implemented
in TextureCacheBase and the ogl/dx11 texture cache implementations.
SSAA relies on MSAA being active to work. We only supports 4x SSAA while in fact you can enable SSAA at any MSAA level.
I even managed to run 64xMSAA + SSAA on my Quadro which made some pretty sleek looking games. They were very cinematic though.
With this, it properly fixes up SSAA and MSAA support in GLES as well. Before they were broken when stereo rendering was enabled.
Now in GLES they can properly support MSAA and also stereo rendering with MSAA enabled(with proper extensions).
OpenGL ES 3.2 adds this feature to core
It was available to GLES 3.1 as GL_{EXT, OES}_texture_buffer as well.
For the non-Nvidia vendors that implemented this is:
- Qualcomm's Adreno 4xx
- IMGTec's PowerVR Rogue
Samsung updated the video drivers on the SGS6 which introduced a bug when disabling vsync.
Both the driver versions are r5p0, but the md5sums of the blob differ.
To work around the issue, make sure to never disable vsync by calling eglSwapInterval.
We can't actually determine the driver version on Android yet.
So until the driver version lands that displays the driver version string in the GL_VERSION string
we will need to keep this workaround enabled at all times, which is a bit annoying.
Current mali drivers return the video driver version in one of the EGL strings you can query.
The issue with that is that Android eats all of those strings, so we can't query it.
OpenGL ES 3.2 adds a few things we care about supporting in core. In particular:
- GL_{ARB,EXT,OES}_draw_elements_base_vertex
- KHR_Debug
- Sample Shading
- GL_{ARB,EXT,OES,NV}_copy_image
- Geometry shaders
- Geometry shader instancing (If they support GL_{EXT,OES}_geometry_point_size)
Nvidia was the first to release an OpenGL ES 3.2 driver which I uesd to test this on.
This also enables GS Instancing on GLES 3.1 hardware if it supports all of the required extensions.
Their new driver that supports GLES3.1 + AEP has issues with it.
At the very least they don't implement all of the geometry shader features fully which causes shader linker issues when we attempt to use them.
I don't have a device so I can't fully test, so until I do I'm going to blanket disable the whole thing.
When calculating the size of the undisplayed margin in the case where
fbWidth != fbStride for RealXFB for displaying in the output window,
we do not scale by IR - RealXFB is implicitly 1x.
The default EGL_RENDERABLE_TYPE is GLES1, so vendors have the ability to choose between returning only the bits requested, or all of the bits
supported in addition to the one requested.
PowerVR chose to take the route where they only return the bits requested, everyone else returns all of the bits supported.
Instead of letting the vendor have control of this, let's incrementally go through each renderable type and make sure it supports everything we want.
This will cover all devices for now, and for the future.
I tried to change messages that contained instructions for users,
while avoiding messages that are so technical that most users
wouldn't understand them even if they were in the right language.
We are used to use the texture parameter for all util draw calls,
but AMD seems to have a bug where they use the sampler parameter
of stage 0 if no sampler is bound to the used stage.
So as workaround (and a bit as nicer code), we now use sampler
objects everywhere.
- FileSearch is now just one function, and it converts the original glob
into a regex on all platforms rather than relying on native Windows
pattern matching on there and a complete hack elsewhere. It now
supports recursion out of the box rather than manually expanding
into a full list of directories in multiple call sites.
- This adds a GCC >= 4.9 dependency due to older versions having
outright broken <regex>. MSVC is fine with it.
- ScanDirectoryTree returns the parent entry rather than filling parts
of it in via reference. The count is now stored in the entry like it
was for subdirectories.
- .glsl file search is now done with DoFileSearch.
- IOCTLV_READ_DIR now uses ScanDirectoryTree directly and sorts the
results after replacements for better determinism.
We are declaring we require ARB_shader_image_load_store in the shader, this isn't an extension on GLES because it is part of the GLSL ES 3.1 spec.
If we are running as GLES then just not put it in the shaders.
This drops the "feature" to load level 0 from the custom texture
and all other levels from the native one if the size matches.
But in my opinion, when a custom texture only provide one level,
no more should be used at all.
GLES3 spec is worthless and only returns a boolean result for occlusion queries. This is fine for simple cellular games but we need more than a
boolean result.
Thankfully Nvidia exposes GL_NV_occlusion_queries under a OpenGL ES extension, which allows us to get full samples rendered.
The only device this change affects is the Nexus 9, since it is an Nvidia K1 crippled to only support OpenGL ES.
No other OpenGL ES device that I know of supports this extension.
A number of games make an EFB copy in I4/I8 format, then use it as a
texture in C4/C8 format. Detect when this happens, and decode the copy on
the GPU using the specified palette.
This has a few advantages: it allows using EFB2Tex for a few more games,
it, it preserves the resolution of scaled EFB copies, and it's probably a
bit faster.
D3D only at the moment, but porting to OpenGL should be straightforward..
- Calculate ZSlope every flush but only set PixelShader Constant on Reset Buffer when zfreeze
- Fixed another Pixel Shader bug in D3D that was giving me grief
Results are still not correct, but things are getting closer.
* Don't cull CULLALL primitives so early so they can be used as reference
planes.
* Convert CalculateZSlope to screenspace coordinates.
* Convert Pixelshader to screenspace coordinates (instead of worldspace
xy coordinates, which is totally wrong)
* Divide depth by 2^24 instead of clamping to 0.0-1.0 as was done
before.
Progress:
* Rouge Squadron 2/3 appear correct in game (videos in rs2 save file
selection are missing)
* Shadows draw 100% correctly in NHL 2003.
* Mario golf menu renders correctly.
* NFS: HP2, shadows sometimes render on top of car or below the road.
* Mario Tennis, courts and shadows render correctly, but at wrong depth
* Blood Omen 2, doesn't work.
Based on the feedback from pull request #1767 I have put in most of
degasus's suggestions in here now.
I think we have a real winner here as moving the code to
VertexManagerBase for a function has allowed OGL to utilize zfreeze now
:)
Correct use of the vertex pointer has also corrected most of the issue
found in pull request #1767 that JMC47 stated. Which also for me now
has Mario Tennis working with no polygon spikes on the characters
anymore! Shadows are still an issue and probably in the other games
with shadow problems. Rebel Strike also seems better but random skybox
glitches can show up.
Initial port of original zfreeze branch (3.5-1729) by neobrain into
most recent build of Dolphin.
Makes Rogue Squadron 2 very playable at full speed thanks to recent core
speedups made to Dolphin. Works on DirectX Video plugin only for now.
Enjoy! and Merry Xmas!!
The maths appears to give crazy impossible answers without this fix, but the cause is all the ints being "promoted" to unsigned because of the single unsigned division at the end.
This reverts an optimization which isn't worth imo. Every texture uploads have to alloc vram and a staging buffer, so there is no need to do both in the same call.
This fixes running Dolphin on the Nexus 9.
Android's EGL stack has internal arrays that they use for tracking OpenGL function usage. Probably has something to do with their OpenGL profiling
garbage that used to be in ADT.
Android has three of these arrays, each statically allocated.
One array is for all GLES 1.x functions
One array is for all GLES 2.0/3.0/3.1 and a couple of extensions they deem worthy of being in this array.
The last array is for all function pointers grabbed via eglGetProcAddress that isn't in the other two arrays.
The last array is the issue that we are having problems with. This array is 256 members in length.
So if you are pulling more than 256 function pointers that Google doesn't track in their internal array, the function will return NULL and yell at you
in logcat.
The Nvidia Shield Tablet gets around this by replacing part of the EGL stack with their own implementation that doesn't have this garbage.
The Nexus 9 on the other hand doesn't get away with this. So we pull >100 more function pointers than the array can handle, and some of those we need
to use.
The workaround for this is to grab OpenGL 1.1 functions last because we won't actually be using those functions, so we get away with not grabbing the
function pointers.
Fixes a typo where the official IMGTec drivers were said to be the OSS driver support.
Removes Mali GPU family detection just like I removed the Adreno family detection.
We don't support Mali Utgard anyway.
If we need family detection we can properly add it, right now it isn't needed.
Adreno 300 and 400 have the same video driver performance issues because they are very similar architectures which use basically the same thing with
everything.
There isn't any need to detect the family of the driver with Qualcomm anyway. If we ever need family specific bugs then we can implement real support
for that.
Performance issue on Adreno 400 series was due to us only detecting Adreno 300 series, and with Adreno 400 it wouldn't use the bugs, which would cause
it to use glBufferSubData, causing the huge performance hit.