Samsung updated the video drivers on the SGS6 which introduced a bug when disabling vsync.
Both the driver versions are r5p0, but the md5sums of the blob differ.
To work around the issue, make sure to never disable vsync by calling eglSwapInterval.
We can't actually determine the driver version on Android yet.
So until the driver version lands that displays the driver version string in the GL_VERSION string
we will need to keep this workaround enabled at all times, which is a bit annoying.
Current mali drivers return the video driver version in one of the EGL strings you can query.
The issue with that is that Android eats all of those strings, so we can't query it.
OpenGL ES 3.2 adds a few things we care about supporting in core. In particular:
- GL_{ARB,EXT,OES}_draw_elements_base_vertex
- KHR_Debug
- Sample Shading
- GL_{ARB,EXT,OES,NV}_copy_image
- Geometry shaders
- Geometry shader instancing (If they support GL_{EXT,OES}_geometry_point_size)
Nvidia was the first to release an OpenGL ES 3.2 driver which I uesd to test this on.
This also enables GS Instancing on GLES 3.1 hardware if it supports all of the required extensions.
Their new driver that supports GLES3.1 + AEP has issues with it.
At the very least they don't implement all of the geometry shader features fully which causes shader linker issues when we attempt to use them.
I don't have a device so I can't fully test, so until I do I'm going to blanket disable the whole thing.
When calculating the size of the undisplayed margin in the case where
fbWidth != fbStride for RealXFB for displaying in the output window,
we do not scale by IR - RealXFB is implicitly 1x.
The default EGL_RENDERABLE_TYPE is GLES1, so vendors have the ability to choose between returning only the bits requested, or all of the bits
supported in addition to the one requested.
PowerVR chose to take the route where they only return the bits requested, everyone else returns all of the bits supported.
Instead of letting the vendor have control of this, let's incrementally go through each renderable type and make sure it supports everything we want.
This will cover all devices for now, and for the future.
- Refactored Light Attenuation into inline function in Software Renderer
- Corrected zero length light direction vector to resolve with normal direction (essentially becomes LIGHTDIF_NONE which was what I was after)
- Change the API of this shared function to use points for output variables (degasus)
- Fixes remaining lighting issues (Mario Tennis, etc)
- Apply same fixes to Software Renderer
- Corrected zero length light direction vector to resolve with normal direction (essentially becomes LIGHTDIF_NONE which was what I was after)
The new implementation has 3 options:
SyncGpuMaxDistance
SyncGpuMinDistance
SyncGpuOverclock
The MaxDistance controlls how many CPU cycles the CPU is allowed to be in front
of the GPU. Too low values will slow down extremly, too high values are as
unsynchronized and half of the games will crash.
The -MinDistance (negative) set how many cycles the GPU is allowed to be in
front of the CPU. As we are used to emulate an infinitiv fast GPU, this may be
set to any high (negative) number.
The last parameter is to hack a faster (>1.0) or slower(<1.0) GPU. As we don't
emulate GPU timing very well (eg skip the timings of the pixel stage completely),
an overclock factor of ~0.5 is often much more accurate than 1.0
I tried to change messages that contained instructions for users,
while avoiding messages that are so technical that most users
wouldn't understand them even if they were in the right language.
We are used to use the texture parameter for all util draw calls,
but AMD seems to have a bug where they use the sampler parameter
of stage 0 if no sampler is bound to the used stage.
So as workaround (and a bit as nicer code), we now use sampler
objects everywhere.
- FileSearch is now just one function, and it converts the original glob
into a regex on all platforms rather than relying on native Windows
pattern matching on there and a complete hack elsewhere. It now
supports recursion out of the box rather than manually expanding
into a full list of directories in multiple call sites.
- This adds a GCC >= 4.9 dependency due to older versions having
outright broken <regex>. MSVC is fine with it.
- ScanDirectoryTree returns the parent entry rather than filling parts
of it in via reference. The count is now stored in the entry like it
was for subdirectories.
- .glsl file search is now done with DoFileSearch.
- IOCTLV_READ_DIR now uses ScanDirectoryTree directly and sorts the
results after replacements for better determinism.
We are declaring we require ARB_shader_image_load_store in the shader, this isn't an extension on GLES because it is part of the GLSL ES 3.1 spec.
If we are running as GLES then just not put it in the shaders.
This drops the "feature" to load level 0 from the custom texture
and all other levels from the native one if the size matches.
But in my opinion, when a custom texture only provide one level,
no more should be used at all.
GLES3 spec is worthless and only returns a boolean result for occlusion queries. This is fine for simple cellular games but we need more than a
boolean result.
Thankfully Nvidia exposes GL_NV_occlusion_queries under a OpenGL ES extension, which allows us to get full samples rendered.
The only device this change affects is the Nexus 9, since it is an Nvidia K1 crippled to only support OpenGL ES.
No other OpenGL ES device that I know of supports this extension.
A number of games make an EFB copy in I4/I8 format, then use it as a
texture in C4/C8 format. Detect when this happens, and decode the copy on
the GPU using the specified palette.
This has a few advantages: it allows using EFB2Tex for a few more games,
it, it preserves the resolution of scaled EFB copies, and it's probably a
bit faster.
D3D only at the moment, but porting to OpenGL should be straightforward..
Rendering EFB textures currently crashes with the D3D backend when MSAA is enabled, because the depth texture wasn't correctly resolved. An example for a crash would be starting Pokemon Snap with D3D and MSAA enabled.