The D3D / OGL backends only ever used RGBA textures, and the Software
backend uses its own custom code for sampling. The ARGB path seems to
just be dead code.
Since ARGB and RGBA formats are similar, I don't think this will make
the code more difficult to read or unable to be used as
reference. Somebody who wants to use this code to output ARGB can simply
modify the MakeRGBA function to put the shift at the other end.
This causes glDrawArrays to fail in core profile, and thus on OS X, see:
http://renderingpipeline.com/2012/03/attribute-less-rendering/
There must be something bound, even though it is not used.
Fixes#7599. I'm not sure this is actually the best way to fix it,
since AFAICT it makes a nonobvious assumption that *something* will be
bound before the first attributeless rendering in
TextureConverter::DecodeToTexture, but it's what degasus suggested and
seems to work.
This is required to make packing consistent between compilers: with u32, MSVC
would not allocate a bitfield that spans two u32s (it would leave a "hole").
We were generating a texture without ever setting the data to a known value.
This happened on the old code as well, just that PP shaders are receiving some love and people are using it and noticing some of its issues.
The only possible functionality change is that s_efbAccessRequested and
s_swapRequested are no longer reset at init and shutdown of the OGL
backend (only; this is the only interaction any files other than
MainBase.cpp have with them). I am fairly certain this was entirely
vestigial.
Possible performance implications: efbAccessReady now uses an Event
rather than spinning, which might be slightly slower, but considering
the slow loop the flags are being checked in from the GPU thread, I
doubt it's noticeable.
Also, this uses sequentially consistent rather than release/acquire
memory order, which might be slightly slower, especially on ARM...
something to improve in Event/Flag, really.
This is effectively unused, as the window handles that we pass to the
GLInterface are window handles for the frame which isn't ever a real
toplevel window. Host_UpdateTitle is what actually sets the proper title
on the render window.
Now that MainNoGUI is properly architected and GLX doesn't need to
sometimes craft its own windows sometimes which we have to thread back
into MainNoGUI, we don't need to thread the window handle that GLX
creates at all.
This removes the reference to pass back here, and the g_pWindowHandle
always be the same as the window returned by Host_GetRenderHandle().
A future cleanup could remove g_pWindowHandle entirely.
Seems mesa has a quirk where
define THING(x) (#x)
is the same as
define THING(x) (##x)
Didn't realize I messed it up since it just worked since I only tested on Mesa.
This class loads all the common PP shader configuration options and passes those options through to a inherited class that OpenGL or D3D will have.
Makes it so all the common code for PP shaders is in VideoCommon instead of duplicating the code across each backend.
Fixes a bug where "Use Fullscreen" would initialize into exclusive fullscreen regardless of the borderless fullscreen setting.
Also relieves the need for the video renderer to check the borderless fullscreen setting each time.
It was only used for Windows XP and lower.
This also bumps the _WIN32_WINNT define in the stdafx precompiled headers to set the minimum version as Windows Vista.
The hack was needed because the Nvidia 3D Vision heuristics are documented to only support surfaces that are the same size as the backbuffer. This would be the case if you enabled the hack and selected the "Auto (Window Size)" internal resolution.
However, on recent drivers the same effect is achieved by selecting the "Auto (Multiple)" internal resolution. Therefore the hack is no longer required.
Also have the renderer remember its own fullscreen state. This is done to prevent a case where we exit exclusive fullscreen through the configuration and a focus shift at the same time. In this case the renderer would fail to detect that the fullscreen state was changed.
This ensures the transition from/to exclusive mode happens while the RenderFrame is fullscreen.
This prevents fullscreen loops and relieves us of having to restore the window size after we exit fullscreen.
In the cases where we support the binding layout keyword, use it for more than binding UBO location.
This changes it so it is supported for samplers as well.
Instances when this is enabled is if a device supports GL_ARB_shading_language_420pack, or if it supports GLES 3.10.
Old code dumped the efb, which was no-longer relevant since the
backend gained xfb support.
New code dumps the colour texture which is about to be rendered to
the screen so correctly reflects the bypassXFB option.
A previous PR changed a whole lot of min/maxes to std::min/std::max
but made a mistake here and used a templated min which cast it's
arguments to unsigned instead of casting return value.
This resulted in glitchy artifacts in bright areas (See issue 7439)
I rewrote the code to use a proper clamping function so it's cleaner
to read.
s_encodingPrograms is defined as an array with a length of 64
NUM_ENCODING_PROGRAMS is also defined as 64.
However 64 is out of bounds, so we want to be comparing for "equal to or
greater than here"
With strings, we don't need to care about passing in a length, since it internally stores it. So now, we don't even need a length parameter for these functions anymore as well.
This also kills off some sprintf_s calls.
- Isolate it into it's own namespace
- Shorten function names, the namespace self-documents.
- Just use the std I/O, we can just write directly to the stream for
logging.
gcc doesn't optimize this loops with -O2, so using memset now.
A flag to skip the clear funktion was added as the cache is already cleared most of the time.
These were only compiled in on Windows and x86_32.
They provided "optimized" copies and compares based on blocksizes for the AMD Athlon and Duron CPU families.
The code was taken from something that AMD provides with a as-is license.
Just get rid of this crap.
For whatever reason, the hardware doesn't do a full divide by 255, but
instead uses an approximation with shifting, similar to the way it is done
in TEV.
For quite a while this has been causing integer division to generate a warning as error, blocking shader compiling. This means probably no one has even been running D3D in debug builds...
I tried disabling the warning with a #pragma, but it doesn't seem to apply when this flag is used.
We need to pull in function pointers for OpenGL 3.0 in order to use glAttribIPointer.
This isn't too big of an issue, and this code will be gone in the future when we change over to libepoxy.
Just need to push code upstream to libepoxy to support Android with GLES and GL first.
This matches how ARM handles their naming in their drivers for different models.
Really it's that way because both Mali-T6xx and Mali-T7xx fall under Midgard.
While everything else (except Mali-55) fall under Utgard.
We need to explicitly round when converting colors from float to uint
because multiplying a normalized float by 255 might not result in a whole
number. (The exact result here may vary depending on your
drivers/hardware.)
Ideally, we shouldn't be using floating point here, but fixing that is a
much more complicated patch.
Fixes gxtest TEV tests using Intel HD 4000.
They are similar enough that they will share bugs with their drivers, so make them fall under the same Mali-Txxx umbrella of bug issues.
If there is ever a need in the future for having separate bugs depending on family, we can support that then.
This is the only way we can determine the video driver version with mali.
Really it's a good thing that they only push driver updates once every two years, makes it easy to determine what driver anybody is running.
CreateInputLayout requires a shader as an input, but it only cares about
the signature; we don't need to recompute it for different shaders with
the same inputs.
GLSL ES 3.10 adds implicit support for the binding layout qualifier that we use.
Changes our GLSL version enums to bit values so we can check for both ES versions easily.
Trying to use GetDepthMatrixProgram outside of
TCacheEntry::FromRenderTarget is a bad idea, so don't. Instead, use a
shader which only copies the input.
Fixes lens flare depth test in Twilight Princess. See
http://code.google.com/p/dolphin-emu/issues/detail?id=5999 .
The variable is already dereferenced both before and after this
check which means that if this variable would ever be zero it would
have crashed dolphin already.
Our defines were never clear between what meant 64bit or x86_64
This makes a clear cut between bitness and architecture.
This commit also has the side effect of bringing up aarch64 compiling support.
Fifoplayer depends on the old behaviour of videosoftware (and the other
hardware backends in non virtual/real xfb modes) where the framebuffer
gets rendered directly to the screen.
Really fifoplayer should call BeginFrame/EndFrame when it finished
rendering a frame, but adding this hack back in is simpler.
- remove unused variables
- reduce the scope where it makes sense
- correct limits (did you know that strcat()'s last parameter does not
include the \0 that is always added?)
- set some free()'d pointers to NULL
If there is an issue with a reported extension, disable it instead of failing out entirely.
Fixes an issue with buffer_storage that I had overlooked as well.
This changes from using logical and to bitwise and, which causes the compile time to drop from an absurd amount of time to around five seconds on my
crappy laptop.
This structure fields should match byte-to-byte the layout of MMIO registers:
it is addressed using the MMIO reg address when doing a CP MMIO read. This was
unfortunately not the case, causing CP reads to be mostly broken with the
software renderer.
This option was known to break every second game and only boost a bit.
It also seems to be broken because of streaming into pinned memory and buffer storage buffers.
v2: also remove dlc_desc
Older Qualcomm drivers rotated the framebuffer 90 degrees and this fix didn't work.
Now for some obscene reason it rotates a full 180 degrees.
This can at least be worked around by flipping around the image on our end.
On Windows, nvidia don't give us their driver version, so we can't workaround any issues.
As buffer_storage is broken on some drivers, we wanted to disble it for them.
So we can't.
Luckyly only "some" released driver versions are affected as this extension is only available since some months. Let's hope that nobody have to use one of this driver version, else they will get a black screen ...
OSX has their own driver, so performance issues aren't shared with the nvidia driver (unlike the closed source linux and windows nvidia driver). So now they'll also use the MapAndSync backend like all other osx drivers.
fixes issue 6596
I've also cleaned up the if/else block selecting the best backend a bit.
The only two devices that do this are Mesa software rasterizer and Intel Ironlake(With a few hacks).
Basically since it doesn't support OpenGL 3.0, it can't grab the version the new way.
So failing that, it sets to GL 2.1, and continues.
Further along, on Ironlake at least, it tries grabbing the extensions the new GL 3.0 way and fails.
So have a fallback that grabs the extensions string the old way, in probably the most elegant way possible.
The old way was to use big switch/case statements based on a type of buffer.
The new one is to use inheritance.
This change prohibits us to change the buffer type while running, but I doubt we'll ever do so.
Performance should also be a bit better. Also a nice cleanup.
Added some comments about this different kind of buffers.
This is a bit slower on map_and_* because of flushing and _very_ much slower on buffer(sub)?data because of a new memcpy.
But this design allow us to decode directly into a gpu buffer, eg vertexloader will profit :)
gl.h and glext.h provide most of the function pointer typedefs and defines for extensions and core features.
The only one it doesn't provide is GL 1.1 function typedefs, but this is to be expected.
If anything needs defines or typedefs in their header in the future, that's as easy as before.
This one was introduced to reduce the glBindTexture and glActiveTexture calls. But it was quite a bit of logic and only an improvment on uploading/creating a texture, which is done rarely.
This adds xfb support to the videosoftware backend, which increases it's
accuracy and more imporantly, enables the usage of many homebrew apps
which write directly to the xfb on the videosoftware backend.
Conflicts:
Source/Core/VideoBackends/Software/SWRenderer.cpp
Source/Core/VideoBackends/Software/SWmain.cpp
This branch is the final step of fully supporting both OpenGL and OpenGL ES in the same binary.
This of course only applies to EGL and won't work for GLX/AGL/WGL since they don't really support GL ES.
The changes here actually aren't too terrible, basically change every #ifdef USE_GLES to a runtime check.
This adds a DetectMode() function to the EGL context backend.
EGL will iterate through each of the configs and check for GL, GLES3_KHR, and GLES2 bits
After that it'll change the mode from _DETECT to whichever one is the best supported.
After that point we'll just create a context with the mode that was detected
We are used to render them out of order as long as everything else matches, but rendering order does matter, so we have to flush on primitive switch. This commit implements this flush.
Also as we flush on primitive switch, we don't have to create three different index buffers. All indices are now stored in one buffer.
This will slow down games which switch often primitive types (eg ztp), but it should be more accurate.
add the GL include (back) to Base.props
use a similar technique to GLX.cpp (by Sonic) in WGL.cpp to get
wglSwapIntervalEXT without the WGLEW check
Conflicts:
Source/Core/VideoBackends/OGL/OGL.vcxproj
Source/Core/VideoBackends/OGL/OGL.vcxproj.filters
Source/VSProps/Base.props
This "u32 components" is a list of flags which attributes of the vertex loader are present.
We are used to append this variable to lots of vertex generation functions, but some of them don't need it at all.
The usual way to handle this kind of request is to rise a flag which the gpu thread polls.
The gpu thread itself either generates the result or just write zeros if disabled.
After this, it rise another flag which says that this work is done.
So if disabled, we still have the cpu-gpu round trip time. This commit just returns 0 on the cpu thread
instead of playing ping pong...
fixes issue 6898
OpenGL defaults are GL_REPEAT, which is even more unlikely than GL_CLAMP_TO_EDGE.
As I can't test the behavoir of the real hardware, I changed it to how it works before,
but I guess just clip the texture makes more sense.
Real xfb didn't provide any read_stride, so there is a division by zero.
This commit calculates the correct read_stride for real_xfb, so there is also no hack for texture vs xfb needed.
This is done with a pixel buffer object. We still have to stall the GPU, but
we only do it once per efb2ram call.
As the cpu can't access the vram, it has to queue a memcpy for the gpu and
wait for the gpu to finish this copy. We did this for every cache line which
is just stupid. Now we copy the complete texture into a pbo and readback this
at once. So we don't have to wait for lots of round-trip-times.
Also use attributeless rendering. But we need the src rect, so set it by uniform.
If there is a slowdown here (I doubt as the driver likely has a fast path to update uniforms)
then we should check if this rect changes and only then update the uniform.
We use attributeless rendering, so officially we have to bind _any_ VAO.
As the state of this VAO doesn't matter, we don't have to switch it.
Also fix an AMD issue as they don't like to render from an empty VAO.
We neither scale nor render from subimages, so we by using gl_Position, we don't have to generate _any_ vertices for this converting.
Also remove the glTexSubImage optimization as every driver does it when needed. But there are some workflows (eg on APU) where it's better to realloc this texture instead of a second memcpy or stall.
This fixes Real XFB Jaggies in OpenGL on games which use yscaling, such
as most PAL games.
This fixes the last of the "Real XFB Macroblocking" issues for opengl,
see issue #6503
Seems OpenGL ES 3 Requires you must have an lod argument, while Desktop
versions require you must not have a lod argument if you are using a
Sampler2DRect (which doesn't do Mipmapping).
The pal version of SSBB has a 640x568 xfb, which is larger than the efb.
Increase the size of the static textures and put in some runtime checks
to prevent buffer overruns.
Now OGL doesn't rely on WX for PNG saving.
FlipImageData supports (pixel data len > 3) now.
TextureToPng is now in ImageWrite.cpp/h
Video Common depends on zlib and png.
D3D no longer depends on zlib and png.
Revert "Actually, filename really does need to be a parameter because of some random debug thing."
Revert "fix non-HAVE_WX case"
Revert "Handle screenshot saving in RenderBase. Removes dependency on D3DX11 for screenshots (texture dumping is still broken)."
This reverts commits 00fe5057f1, 74b5fb3ab4, cd46138d29 and 5f72542e06 because taking screenshots in D3D still crashed for me so there was no point in the code changes (which I found ugly anyway).
The Dolphin development team is incapable of providing sufficient replacement for its previous usage in Dolphin and the advantages of dropping the dependency do not justify the removal of screenshots and texture dumping.
From now on, d3dx11.h, d3dx11async.h, d3dx11core.h and d3dx11tex.h are required to be stored somewhere in the header include path. I don't know if this is the case for anyone else than me, but I can't really say that I care after having people randomly merge unfinished branches into master.
D3D doesn't allow bigger viewports than rendertargets. But flipper does, so the viewport will be clipped and the transformation matrix will be changed.
This was done in the D3D backend itself. This is now moved into VideoCommon. This don't reduce code, but in this way, VideoCommon doesn't depend on the backends.
* Currently there is no DEBUGFAST configuration. Defining DEBUGFAST as a preprocessor definition in Base.props (or a global header) enables it for now, pending a better method. This was done to make managing the build harder to screw up. However it may not even be an issue anymore with the new .props usage.
* D3DX11SaveTextureToFile usage is dropped and not replaced.
* If you have $(DXSDK_DIR) in your global property sheets (Microsoft.Cpp.$(PlatformName).user), you need to remove it. The build will error out with a message if it's configured incorrectly.
* If you are on Windows 8 or above, you no longer need the June 2010 DirectX SDK installed to build dolphin. If you are in this situation, it is still required if you want your built binaries to be able to use XAudio2 and XInput on previous Windows versions.
* GLew updated to 1.10.0
* compiler switches added: /volatile:iso, /d2Zi+
* LTCG available via msbuild property: DolphinRelease
* SDL updated to 2.0.0
* All Externals (excl. OpenAL and SDL) are built from source.
* Now uses STL version of std::{mutex,condition_variable,thread}
* Now uses Build as root directory for *all* intermediate files
* Binary directory is populated as post-build msbuild action
* .gitignore is simplified
* UnitTests project is no longer compiled
The upload in the backend isn't done, it's just pushed by the mostly removed SetMulti*SConstant4fv.
Also no optimizations was done on VideoCommon side, but I can start now :-)
Sorry for the hacky way, but I think this is a nice (working) snapshot for a much bigger change.