Note: It's not 100% perfect, as some of the GPU capablities leak into the
pixel shader UID.
Currently our UIDs don't get exported, so there is no issue. But someone
might want to fix this in the future.
As much as possible, the asserts have been moved out of the GetUID
function. But there are some places where asserts depend on variables
that aren't stored in the shader UID.
Bug Fix: It was theoretically possible for a shader with depth writes
disabled to map to the same UID as a shader with late depth
writes.
No known test cases trigger this.
Bug Fix: The normal stage UIDs were randomly overwriting indirect
stage texture map UID fields. It was possible for multiple
shaders with diffrent indirect texture targets to map to
the same UID.
Once again, it dpesn't look like this bug was ever triggered.
This frees up 21 bits and allows us to shorten the UID struct by an entire
32 bits.
It's not strictly needed (as it's encoded into the length) but I added a
bit for per-pixel lighiting to make my life easier in the following
commits.
The only code which touches xfmem is code which writes directly into
uid_data.
All the rest now read their parameters out of uid_data.
I also simplified the lighting code so it always generated seperate
codepaths for alpha and color channels instead of trying to combine
them on the off-chance that the same equation works for all 4 channels.
As modern (post 2008) GPUs generally don't calcualte all 4 channels
in a single vector, this optimisation is pointless. The shader compiler
will undo it during the GLSL/HLSL to IR step.
Bug Fix: The about optimisation was also broken, applying the color light
equation to the alpha light channel instead of the alpha light
euqation. But doesn't look like anything trigged this bug.
Fixes a major preformance regression in Skies of Arcadia during
battle transisions.
I had plans for a more advanced version of this code after 5.0,
but here is a minimal implemenation for now.
Using glMapBufferRange to read back the contents of the SSBO is extremely
slow on NVIDIA drivers. This is more noticeable at higher internal
resolutions. Using glGetBufferSubData instead does not seem to exhibit
this slowdown.
This is an oversight from pr https://github.com/dolphin-emu/dolphin/pull/3266 . Thanks to degasus for pointing this out.
It's possible that MAX_TEXTURE_BINARY_SIZE can be optimised, but i wanted to play it safe considering the 5.0 stable release.
Drivers that don't support GL_ARB_shading_language_420pack require that
the storage qualifier be specified even when inside an interface block.
AMD's driver throws a compile error when "centroid in/out" is used within
an interface block.
Our previous behavior was to include the storage qualifier regardless, but
this wasn't working on AMD, therefore we should check for the presence of
the extension and include based on this, instead.
I'm not entirely sure what is happening, but this optimisation is causing an issue in Sonic Riders: Zero Gravity. Apparently the issue would also be fixed by PR#3747, but this PR should also fix similar issues.
Games that use partial updates might get slower with this, so some performance regression testing would be nice. Games like New Super Mario Bros, RS2, Zelda TP and Silent Hill. Testing with high graphics settings makes sense, since this would mostly end up in more work for the GPU.
The D3D backend was always forcing Anisotropic filtering when that is enabled regardless of how the game chose to configure the texture filtering registers; this causes the same issues as "Force Filtering" without Anisotropy, such as causing game UI elements to no longer line up adjacent correctly. Historically, OpenGL's Anisotropy support has always worked "better" than D3D's due to seeming to not have this problem; unfortunately, OpenGL's Anisotropy specification only gives GL_LINEAR based filtering modes defined behavior, with only the mipmap setting being required to be considered. Some OpenGL implementations were implicitly disabling Anisotropy when the min/mag filters were set to GL_NEAREST, but this behavior is not required by the spec so cannot be relied on.
- remove an outdated comment about the efb to ram and scaled efb restriction
- when upscaling efb copies, mark the new texture as efb copy
- dx12 fixes for the src box, especially the number of layers for 3D
OS X's shader compiler has a bug with interface blocks where interface block members don't properly inherit the layout qualifier from the interface
block.
Work around this limitation by explicitly stating the layout qualifier on both the interface block and every single member inside of that block.
As confirmed by a hardware test if we are using the texgen type of COLOR_STRGBC0/STRGBC1 then it sets the texture coordinates to those values
regardless of what the input form or source row is.
Thanks to Ornox for testing again
Removes a couple asserts in the vertex shader gen when dealing with the input form.
Typically input form ABC1 is used, so it'll pull in the first three elements and always set the fourth to 1.0
The other input form available is AB11, which sets the last two components to 1.0 (Theoretically).
No titles actually use this input form that we know of except for Project M, but it can have some fairly drastic visual differences.
Confirmed correct by hardware test
This should get Donkey Kong Country Returns characters to be as broken as they should be. They will be fixed in a later pr.
Expected result is:
efbtex: characters are always flickering or invisible, no matter what scaling or IR setting
efb2ram: characters are always working properly at 1xIR, no matter what scaling or IR setting
Fast depth is now more accurate than slow depth and should always be used.
The option will be kept in a different form as it is still used as a hack to fix some games.
Also, the slow depth code path will still be relied upon by cards that don't support GL_ARB_clip_control.
CBoot::BootUp() did call CoreTiming::Advance which itself blocks on the GPU,
but the GPU thread wasn't started already. This commit moves the SyncGPU
initialization into the Fifo.cpp file and call it after BootUp().
Setting this is not required anymore as of commit 40cf1bbacc622 of
FFmpeg.
For users of older versions of the libavcodec library we guard the
change with an #if.
(x % y) is not defined in GLSL when sign(x) != sign(y).
This also has the added benefit of behaving the same as sampler wrapping modes, in regards to negative inputs.
This was causing crashes/driver resets when odd-dimension textures were
being loaded, due to the size we were uploading being larger than the size
of the higher-level texture calculated by the runtime.
People who make texture packs usually release them using a specific ID
(for instance SX4E01). Users who have a different version of the game
(like the PAL version SX4P01) then need to rename the custom texture
folder to match. This is a lot simpler than renaming every texture file,
as was required with the old texture format, but it's still something
that users can forget to do. To make that unnecessary, this change makes
it possible to use three-character region-free IDs for custom texture
folders, similarly to how game INIs can use three-character IDs. Once
most people have updated to Dolphin versions that include this change,
those who make texture packs will be able to name them with
three-character IDs, removing the need for users to rename anything.
We don't throttle by frames, we throttle by coretiming speed.
So looking up VI for calculating the speed was just very wrong.
The new ini option is a float, 1.0f for fullspeed.
In the GUI, percentual values are used.
This fixes the crashes occuring at startup with a non-empty shader cache.
Because LinearDiskCache reads/writes to the storage of ShaderUid, ShaderUid must be trivially copyable.
Additionally, adds a static assert to LinearDiskCache to ensure this doesn't happen in the future.
The initialization of ShaderUid data has been moved to the code generation functions, so the above condition holds true.
This reverts commit 81414b4fa2, reversing
changes made to b926061f64.
Conflicts:
Source/Core/DolphinWX/Frame.cpp
Source/Core/VideoCommon/VideoConfig.cpp
Source/Core/VideoCommon/VideoConfig.h
Approximately three or four times now, the issue of pointers being
in an inconsistent state been an issue in the video backend renderers
with regards to tripping up other developers.
Global (ugh) resources are put into a unique_ptr and will always have a
well-defined state of being - null or not null
This should fix this panic message I saw when playing Super Mario Strikers:
Failed to compile pixel shader [...]: error C7011: implicit cast from "int" to "float"
They were only called at once, so no need to seperate them.
This also removes the only dereference of the NativeVertexFormat in VideoCommon, so backends may just return nullptr.
It was only implemented in OpenGL, though the option was visible in both
backends, leading to memory leaks if you enabled it in DirectX.
And it wasn't particularly useful as a debug feature as it only showed
where in the EFB the copies were taken from, not what format it was, or
what the copy was used for, or what content was in the EFB at that point
in time.
Also, it stretched the copy regions relative to the window, so the
on-screen regions don't even line up with the window unless the game used
the full EFB (some pal games) and you game image stretched to the full
window.
On x86_64 and arm64 builds Common/MsgHandler.h and Common/Logging/Log.h are
indirectly included through the corresponding VertexLoaders, Emitters
and lastly Assert.h. Because the generic build does not build a vertex
loader JIT it does not include those and fails at compile time.
Thanks to HdkR and mibofra!
Gets rid of magic numbers in cases where the array size is known at compile time.
This is also useful for future entries that are stack allocated arrays as these
functions prevent incorrect sizes being provided.
Removed Quality Levels from D3D AA options
Dropdown text now shows whether you're applying MSAA or SSAA
Added a description for SSAA
Moved SSAA checkbox
Cleaned up AA in backends slightly. Supported modes is now a list of ints.
Added 3 depth/convergence presets. They are adjustable via (existing) hotkeys - changes to depth and convergence are applied to current preset.
Added 3 hotkeys for activating presets. Added hotkey for toggle between first and second preset.
Added OSD message for convergence/depth changes.
Presets are saved into per-game configs.
Texture updates have been moved into TextureCache, while
TMEM updates where moved into bpmem. Code for handling
efb2ram updates was added to TextureCache.
There was a bug for preloaded RGBA8 textures, it only copied
half the texture. The TODO was wrong too.
This checks every TEXTURE_KILL_THRESHOLD frames, to see if the hash for the memory area of the efb copy has hanged. If it has changed, the efb copy can be removed, it wouldn't be used anymore. Before this pr, some efb copies would never be deleted.
Fixes issue https://bugs.dolphin-emu.org/issues/6101 and possibly some other VRAM leaks.
Instead of having special case code for efb2tex that ignores hashes,
the only diffence between efb2tex and efb2ram now is that efb2tex
writes zeros to the memory instead of actual texture data.
Though keep in mind, all efb2tex copies will have hashes of zero as
their hash.
Addded a few duplicated depth copy texture formats to the enum
in TextureDecoder.h. These texture formats were already implemented
in TextureCacheBase and the ogl/dx11 texture cache implementations.
SSAA relies on MSAA being active to work. We only supports 4x SSAA while in fact you can enable SSAA at any MSAA level.
I even managed to run 64xMSAA + SSAA on my Quadro which made some pretty sleek looking games. They were very cinematic though.
With this, it properly fixes up SSAA and MSAA support in GLES as well. Before they were broken when stereo rendering was enabled.
Now in GLES they can properly support MSAA and also stereo rendering with MSAA enabled(with proper extensions).
Baten Kaitos allocates its XFBs from a tagged heap
structure. With the old calculation, too many lines
were being written so the tag of the allocation
after the XFB was being corrupted. Fixes crash
mentioned in this comment:
https://code.google.com/p/dolphin-emu/issues/detail?id=7734#c6
Samsung updated the video drivers on the SGS6 which introduced a bug when disabling vsync.
Both the driver versions are r5p0, but the md5sums of the blob differ.
To work around the issue, make sure to never disable vsync by calling eglSwapInterval.
We can't actually determine the driver version on Android yet.
So until the driver version lands that displays the driver version string in the GL_VERSION string
we will need to keep this workaround enabled at all times, which is a bit annoying.
Current mali drivers return the video driver version in one of the EGL strings you can query.
The issue with that is that Android eats all of those strings, so we can't query it.
Their new driver that supports GLES3.1 + AEP has issues with it.
At the very least they don't implement all of the geometry shader features fully which causes shader linker issues when we attempt to use them.
I don't have a device so I can't fully test, so until I do I'm going to blanket disable the whole thing.
The Star Wars games really push the hardware to its limits, which can cause the shaders that are produced to be 18kb or more.
Double our maximum shader size to compensate.
Fixes issue #8860
The last heuristic wasn't quite smart enough and had a few
false positives in Mario Kart: Double Dash and Metroid prime 2.
Now we only activate if the game is rendering a 16:9
projection to a 4:3 viewport.
Someone suggested on IRC that we should make a database of memory
locations in GameCube games which contain the 'Widescreen' setting
so we can automatically detect if the game is in 4:3 or 16:9 mode.
But that's hardly optimal, when the game actually tells the gpu
what aspect ratio to render in. 10 min and 6 lines of code later,
this is the result. Not only does it detect the correct aspect ratio
it does so on the fly.
I'm a little suprised nobody thought about doing this before.
When calculating the size of the undisplayed margin in the case where
fbWidth != fbStride for RealXFB for displaying in the output window,
we do not scale by IR - RealXFB is implicitly 1x.
This bug has been reported to IMGTec at https://pvrsupport.imgtec.com/ticket/472
The basic idea of the bug is that if you're doing a bitwise and of a constant value vector with a constant scalar value, this causes PowerVR's shader
compiler to fail out with a very non-descriptive message.
Working around the issue by making the value a vector that it is being masked by.
In particular this fixes the 6666 colour format
We were loading from the wrong location and it was causing /terrible/ colour changes.
This also fixes a bug in the all the colour formats(except 888) where the unaligned path was loading in to the wrong register.
- Fixes remaining lighting issues (Mario Tennis, etc)
- Apply same fixes to Software Renderer
- Corrected zero length light direction vector to resolve with normal direction (essentially becomes LIGHTDIF_NONE which was what I was after)
The new implementation has 3 options:
SyncGpuMaxDistance
SyncGpuMinDistance
SyncGpuOverclock
The MaxDistance controlls how many CPU cycles the CPU is allowed to be in front
of the GPU. Too low values will slow down extremly, too high values are as
unsynchronized and half of the games will crash.
The -MinDistance (negative) set how many cycles the GPU is allowed to be in
front of the CPU. As we are used to emulate an infinitiv fast GPU, this may be
set to any high (negative) number.
The last parameter is to hack a faster (>1.0) or slower(<1.0) GPU. As we don't
emulate GPU timing very well (eg skip the timings of the pixel stage completely),
an overclock factor of ~0.5 is often much more accurate than 1.0
This fixes issue 6563:
https://code.google.com/p/dolphin-emu/issues/detail?id=6563
This PR adds a 2nd map to texture cache, which uses the hash as key. Cache entries from this new map are used only if the address matches or if the texture was fully hashed. This restriction avoids false positive cache hits. This results in a possible situation where safe texture cache accuracy could be faster than the fast one.
Small textures means up to 1KB for fast texture cache accuracy, 4KB for medium, and all textures for safe accuracy.
Since this adds a small overhead to all texture cache handling, some regression testing would be nice. Games, which use a lot of textures the same time, should be affected the most.
I tried to change messages that contained instructions for users,
while avoiding messages that are so technical that most users
wouldn't understand them even if they were in the right language.
Address static memory relative to a base register, analog to what we're
doing with PPCSTATE in the CPU JIT. This allows executable memory for
the vertex loader JIT to be allocated anywhere, not just within 2 GiB of
static data.
Fixes issue 8180.
Yet another story of games loading weird shit into registers.
For some reason, Burnout 2 would (in rare situations) load invalid
addresses into cp_state.array_bases. What would the real hardware
do in this situation? Who knows, Burnout 2 doesn't actually enable
the vertex array with the invalid address so nothing kinky happens.
But dolphin tries to optimise things and starts using the address
as soon as it is loaded into memory. This causes GetPointer (which is
now much more vocal) to throw an error.
The Fix: We don't call GetPointer until we are sure the vertex array
has been enabled.
- FileSearch is now just one function, and it converts the original glob
into a regex on all platforms rather than relying on native Windows
pattern matching on there and a complete hack elsewhere. It now
supports recursion out of the box rather than manually expanding
into a full list of directories in multiple call sites.
- This adds a GCC >= 4.9 dependency due to older versions having
outright broken <regex>. MSVC is fine with it.
- ScanDirectoryTree returns the parent entry rather than filling parts
of it in via reference. The count is now stored in the entry like it
was for subdirectories.
- .glsl file search is now done with DoFileSearch.
- IOCTLV_READ_DIR now uses ScanDirectoryTree directly and sorts the
results after replacements for better determinism.
Through just returning the last written value sounds better, this crashes Paper Mario.
In my opinion, gfx issues are fine on older GPUs, but crashes should not happen.
There is no nice way to correctly "detect" the "used" memory, so we just say
we're fine to use 50% of the physical memory for custom textures.
This will fix out-of-memory crashes, but we still might run into swapping issues.
This was causing a race condition where the "absurdly large aux buffer"
panic alert would be triggered in the last bit of fifo processing on the
CPU thread in deterministic mode (i.e. netplay). SyncGPU is supposed to
move the auxiliary queue data to the beginning of the containing buffer
so we don't have to deal with wraparound; if GpuRunningState is false,
however, it just returns, because it's set to false by another thread -
thus it doesn't know whether RunGpuLoop is still executing (in which
case it can't just reset the pointers, because it may still be using the
buffer) or not (in which case the condition variable it normally waits
for to avoid the previous problem will never be signaled). However,
SyncGPU's caller PushFifoAuxBuffer wasn't aware of this, so if the
buffer was filling at just the right time, it'd stay full and that
function would complain that it was about to overflow it. Similar
problem with ReadDataFromFifoOnCPU afaik. Fix this by returning early
from those as well; other callers of SyncGPU should be safe. A
*slightly* cleaner alternative would be giving the CPU thread a way to
tell when RunGpuLoop has actually exited, but whatever, this works.
This drops the "feature" to load level 0 from the custom texture
and all other levels from the native one if the size matches.
But in my opinion, when a custom texture only provide one level,
no more should be used at all.
A number of games make an EFB copy in I4/I8 format, then use it as a
texture in C4/C8 format. Detect when this happens, and decode the copy on
the GPU using the specified palette.
This has a few advantages: it allows using EFB2Tex for a few more games,
it, it preserves the resolution of scaled EFB copies, and it's probably a
bit faster.
D3D only at the moment, but porting to OpenGL should be straightforward..
The obvious question here is, why does it matter if we round or truncate?
The key is that GC/Wii does fixed-point interpolation, where PC GPUs do
floating-point interpolation. Discarding fractional bits makes the conversion
from floating-point to fixed point give more consistent results.
I'm not confident this is really the right fix, or that my explanation is
completely correct; ideally, we don't want to depend on floating-point
interpolation at all.
This is the same trick which is used for Metroid's fonts/texts, but for all textures. If 2 different textures at the same address are loaded during the same frame, create a 2nd entry instead of overwriting the existing one. If the entry was overwritten in this case, there wouldn't be any caching, which results in a big performance drop.
The restriction to textures, which are loaded during the same frame, prevents creating lots of textures when textures are used in the regular way. This restriction is new. Overwriting textures, instead of creating new ones is faster, if the old ones are unlikely to be used again.
Since this would break efb copies, don't do it for efb copies.
Castlevania 3 goes from 80 fps to 115 fps for me.
There might be games that need a higher texture cache accuracy with this, but those games should also see a performance boost from this PR.
Some games, which use paletted textures, which are not efb copies, might be faster now. And also not require a higher texture cache accuracy anymore. (similar sitation as PR https://github.com/dolphin-emu/dolphin/pull/1916)
In nearly all direct loadstore cases we can use unscaled loadstores.
Still have a fallback in case we hit a situation that we /can't/ do a unscaled loadstore.
When enabled, the silent option will avoid popping up dialog boxes for
overwrite confirmation or codec selection. The codec selection defaults to
uncompressed RGB.
This is required for FifoCI on Windows which needs to drive Dolphin from the
command line exclusively.
We want to move the vertex by 1/12 pixel, but the old code
did miss the perspective division. So by multiplying with pos.w,
the position is moved correctly after the perspective division.
On D3D, we read from the depth buffer using the format
DXGI_FORMAT_R24_UNORM_X8_TYPELESS (essentially, the "r" component contains
the depth, and the other components contain nothing).
Well, that's not strictly true, but trying to memcpy between two buffers
using different row lengths and different strides is at minimum extremely
unintuitive.
This changes the behavior if both texture are available. The old code did
try to load the modfied texID, the new code tries the unmodified texID first.
- Calculate ZSlope every flush but only set PixelShader Constant on Reset Buffer when zfreeze
- Fixed another Pixel Shader bug in D3D that was giving me grief
Results are still not correct, but things are getting closer.
* Don't cull CULLALL primitives so early so they can be used as reference
planes.
* Convert CalculateZSlope to screenspace coordinates.
* Convert Pixelshader to screenspace coordinates (instead of worldspace
xy coordinates, which is totally wrong)
* Divide depth by 2^24 instead of clamping to 0.0-1.0 as was done
before.
Progress:
* Rouge Squadron 2/3 appear correct in game (videos in rs2 save file
selection are missing)
* Shadows draw 100% correctly in NHL 2003.
* Mario golf menu renders correctly.
* NFS: HP2, shadows sometimes render on top of car or below the road.
* Mario Tennis, courts and shadows render correctly, but at wrong depth
* Blood Omen 2, doesn't work.
Based on the feedback from pull request #1767 I have put in most of
degasus's suggestions in here now.
I think we have a real winner here as moving the code to
VertexManagerBase for a function has allowed OGL to utilize zfreeze now
:)
Correct use of the vertex pointer has also corrected most of the issue
found in pull request #1767 that JMC47 stated. Which also for me now
has Mario Tennis working with no polygon spikes on the characters
anymore! Shadows are still an issue and probably in the other games
with shadow problems. Rebel Strike also seems better but random skybox
glitches can show up.
Initial port of original zfreeze branch (3.5-1729) by neobrain into
most recent build of Dolphin.
Makes Rogue Squadron 2 very playable at full speed thanks to recent core
speedups made to Dolphin. Works on DirectX Video plugin only for now.
Enjoy! and Merry Xmas!!