Prior to this change, it's possible for m_wake_me_up_again to be used
while it's in an uninitialized state from the exposed API.
e.g.
- Using SetEnable after construction would perform an uninitialized read.
- Using PushEvent would perform an uninitialized read by way of operator |=.
internally, an uninitialized read can happen if PullEventsInternal() is
executed before other functions.
Just to avoid the whole possibility of performing uninitialized reads,
we just give the class member a default value of false.
This way, if we load a UID cache where the data was incomplete (e.g.
Dolphin crashed), we don't lose the existing UIDs which were previously
at the beginning.
This will create a merge conflict if two PRs try to increment the
cache version at the same time, which makes it noticeable that the
PR that gets merged last needs to increment the cache version again.
We already use this for savestates and the game list cache.
Minimizes repetition.
std::minmax_element can be used for the 256 * 2 case, as it's only performing byte comparisons
and thus, there will always be an element smaller than 0xffff, so it doesn't need to be included
in the set of compared values.
Skip ubershader mode works the same as hybrid ubershaders in that the
shaders are compiled asynchronously. However, instead of using the
ubershader to draw the object, it skips it entirely until the
specialized shader is made available.
This mode will likely result in broken effects where a game creates an
EFB copy, and does not redraw it every frame. Therefore, it is not a
recommended option, however, it may result in better performance on
low-end systems.
This wouldn't be much of a data reader if it can't access the
read-only data pointer in read-only contexts. Especially if it
can get a writable equivalent in contexts that aren't read-only.
It's questionable to not return a reference to the instance being
assigned to. It's also quite misleading in terms of expected behavior
relative to everything else. This fixes it to make it consistent with
other classes.
Allows the default constructor to be defaulted and ensures the default
values are associated with the member variables directly.
Also corrects a prefixed underscore in the two parameter constructor.
Given that this only contains functions from the VideoBackendBase class,
it makes more sense to move these to the relevant cpp file to keep them
all together.
This is only ever memset to zero and never used again.
This also gets rid of an instance of undefined behavior considering the
draft standard for C++17 (N4659) states at [dcl.type.cv] paragraph 5:
"
The semantics of an access through a volatile glvalue are implementation-defined.
If an attempt is made to access an object defined with a volatile-qualified type
through the use of a non-volatile glvalue, the behavior is undefined.
"
Currently, when immediately compile shaders is not enabled, the
ubershaders will be placed before any specialized shaders in the compile
queue in hybrid ubershaders mode. This means that Dolphin could
potentially use the ubershaders for a longer time than it would have if
we blocked startup until all shaders were compiled, leading to a drop in
performance.
- In D3D, shaders could be compiled on the main thread, blocking
startup.
- Reduced the latency between a pipeline being requested and used in all
backends in hybrid ubershader mode, when no shader stages were present.
- Fixed a case where async compilation could cause the same UID to be
appended multiple times to the UID cache.
- Fix incorrect number of threads being used when immediately compile
shaders was enabled.
Lowest hanging fruit I could find with a profiler.
Not sure this stuff actually needs to be done, but assuming it is, why
not do it quickly? 10x faster, goes from 1% CPU to 0.09%.
This enables shaders to be compiled while the game is starting, instead
of blocking startup. If a shader is needed before it is compiled,
emulation will block.
As these are stored in a map, operator< will become a hot function when
doing lookups, which happen every frame. std::tie generated a rather
large function here with quite a few branches.
tl;dr: This PR speedups dolphin on mobiles with the Mali GPU and ES 3.2
drivers by a factor of 10 by using the method with the biggest overhead.
Please keep care not to buy this shit!
The ARM driver team seems to care very well about their customers. But
bad luck, users and open source developers are *not* their customers. So
even device-independent feature requests are just ignored for *years*:
https://community.arm.com/graphics/f/discussions/4645/gl_ext_buffer_storage-support
The bad point, they neither implement any of the other common ways to
stream dynamic content in unextented GL:
- They just ignore the GL_MAP_UNSYNCHRONIZED_BIT flag
- They don't support on-device buffer updates and just stall with
glBufferSubData
It seems like no benchmark is using any dynamic content - and like no
customer cares about anything but benchmarks, or users...
We have a flag to disable the glBufferSubData way, this PR adds the flag
to also disable the unsychronized mapping way. The second one is
available since their ES 3.2 update, but slow as hell.
So how to continue? The last remaining technical way to stream dynamic
content at all is to alloc a new buffer per draw call with glBufferData.
This is very gross, but still a factor 10 speedup compared to stalling
the GPU. Small tests shows that you can expect another 3-5 times speedup
with EXT_buffer_data, so Mali would be on pair with Adreno here. So if
you have bought such a device unfortunately, please try to make noise on
your vendor forums/support and ask for this extension. If you are going
to buy a new mobile, I'd recormend to avoid *any* mobile with a Mali GPU
in it.
We now differentiate between a resize event and surface change/destroyed
event, reducing the overhead for resizes in the Vulkan backend. It is
also now now safe to change the surface multiple times if the video thread
is lagging behind.
The option is named DisableCopyToVRAM under the Hacks section in
GFX.ini. It is intentionally not exposed to the GUI, as users should not
need to use it under normal circumstances. The main use is debugging
issues in the EFB-to-RAM shaders.
The console appears to behave against standard IEEE754 specification
here, in particular around how NaNs are handled. NaNs appear to have no
effect on the result, and are treated the same as positive or negative
infinity, based on the sign bit.
However, when the result would be NaN (inf - inf, or (-inf) - (-inf)),
this results in a completely fogged color, or unfogged color
respectively. We handle this by returning a constant zero for the A
varaible, and positive or negative infinity for C depending on the sign
bits of the A and C registers. This ensures that no NaN value is passed
to the GPU in the first place, and that the result of the fog
calculation cannot be NaN.
Otherwise we might get UB if the value we cast is larger than the
max value of the underlying type that the compiled picked for the enum.
I haven't done any extensive check through Dolphin to find cases
of this, I'm just fixing the cases I already know of.
Tested on a linux Intel Skylake integrated graphics with
blend_func_extended force-disabled, as it's the only platform I have
that doesn't crash with ubershaders and supports fb_fetch
It seems it doesn't like modifying inout variables in place - so instead
use a temporary for ocol0/ocol1 and only write them once at the end of
the shader
GLES doesn't support C-style array initialisers, so stuff like:
Type var[2] = {
VALUE_A,
VALUE_B
};
isn't supported in GLES (it was added in
GL_ARB_shading_language_420pack).
The texture conversion shader used this, so would fail to compile on
GLES.
HLSL does not define roundEven(), only round(). This means that the
output may differ slightly for OpenGL vs Direct3D. However, it ensures
consistency across OpenGL drivers, as round() in GLSL can go either way.
This fixes the rendering of the scan visor in Metroid Prime 2: Echoes,
as seen in https://fifoci.dolphin-emu.org/dff/mp2-scanner/
The alpha channel was off-by-one on Ivy Bridge due to the rounding
after multiplication with colmat. This commit removes this matrix
altogether in most cases, making them simple GLSL swizzles.
This will generate one shader per copy format. For now, it is the same
shader with the colmat hard coded. So it should already improve the GPU
performance a bit, but a rewrite of the shader generator is suggested.
Half of the patch is done by linkmauve1:
VideoCommon: Reorganise the shader writes.
Also skips swapping the window system buffers in headless mode, as there
may not be a surface which can be swapped in the first place. Instead,
we call glFlush() at the end of a frame in this case.
Cel-damage uses the color from the lighting stage of the vertex pipeline
as texture coordinates, but sets numColorChans to zero.
We now calculate the colors in all cases, but override the color before
writing it from the vertex shader if numColorChans is set to a lower value.
This is already initialized in the class definition. This would
previously cause a -Wreorder warning on macOS, as m_config is
defined after m_currently_mapped.
We need this because VS currently doesn't consider
std::is_trivially_copyable<typename
std::remove_volatile<SCPFifoStruct>::type>::value
to be true and because no compiler should consider it
to be true if we replace the volatiles with atomics.
Some lines of code in Dolphin just plainly grabbed the value of
g_ActiveConfig.iEFBScale, which resulted in Auto being treated as
0x rather than the actual automatically selected scale.
There was a race condition between the video thread and the host thread,
if corrections need to be made by VerifyValidity(). Briefly, the config
will contain invalid values. Instead, pause emulation first, which will
flush the video thread, update the config and correct it, then resume
emulation, after which the video thread will detect the config has
changed and act accordingly.
Calling vkCmdClearAttachments with a partial rect, or specifying a
render area in a render pass with the load op set to clear can cause the
GPU to lock up, or raise a bounds violation. This only occurs on MSAA
framebuffers, and it seems when there are multiple clears in a single
command buffer. Worked around by back to the slow path (drawing quads)
when MSAA is enabled.
Currently, this is only the logic op bit, but this will be extended to
the framebuffer fetch/blend modes. In the future, when/if we move to
VideoCommon pipelines, this state will be part of the pipeline UID
anyway, and we can mask it out in the backend by using a two-level map,
so the shaders/programs are shared.
This optimisation doesn't work on PowerVR's Vulkan implementation. We
(incorrectly) disallow Framebuffer objects to be used with a different
load or store op than that which they were created with, despite the
spec allowing such.
This fixes the windwaker intro "smearing"
Modernizes the arrays and makes future simplifications possible (e.g. usages within the software renderer).
It also makes cases where we use array->pointer decay explicit.
The class NonCopyable is, like the name says, supposed to disallow
copying. But should it allow moving?
For a long time, NonCopyable used to not allow moving. (It declared
a deleted copy constructor and assigment operator without declaring
a move constructor and assignment operator, making the compiler
implicitly delete the move constructor and assignment operator.)
That's fine if the classes that inherit from NonCopyable don't need
to be movable or if writing the move constructor and assignment
operator by hand is fine, but that's not the case for all classes,
as I discovered when I was working on the DirectoryBlob PR.
Because of that, I decided to make NonCopyable movable in c7602cc,
allowing me to use NonCopyable in DirectoryBlob.h. That was however
an unfortunate decision, because some of the classes that inherit
from NonCopyable have incorrect behavior when moved by default-
generated move constructors and assignment operators, and do not
explicitly delete the move constructors and assignment operators,
relying on NonCopyable being non-movable.
So what can we do about this? There are four solutions that I can
think of:
1. Make NonCopyable non-movable and tell DirectoryBlob to suck it.
2. Keep allowing moving NonCopyable, and expect that classes that
don't support moving will delete the move constructor and
assignment operator manually. Not only is this inconsistent
(having classes disallow copying one way and disallow moving
another way), but deleting the move constructor and assignment
operator manually is too easy to forget compared to how tricky
the resulting problems are.
3. Have one "MovableNonCopyable" and one "NonMovableNonCopyable".
It works, but it feels rather silly...
4. Don't have a NonCopyable class at all. Considering that deleting
the copy constructor and assignment operator only takes two lines
of code, I don't see much of a reason to keep NonCopyable. I
suppose that there was more of a point in having NonCopyable back
in the pre-C++11 days, when it wasn't possible to use "= delete".
I decided to go with the fourth one (like the commit title says).
The implementation of the commit is fairly straight-forward, though
I would like to point out that I skipped adding "= delete" lines
for classes whose only reason for being uncopyable is that they
contain uncopyable classes like File::IOFile and std::unique_ptr,
because the compiler makes such classes uncopyable automatically.
The switch statements in these functions appear to get transformed into
an if..else chain on NVIDIA's OpenGL/Vulkan drivers, resulting in lower
performance than the D3D counterparts. Transforming the switch into a
binary tree of ifs can increase performance by up to 20%.
Settings that come from the SYSCONF are now included in Dolphin's
config system as part of the base layer. They are handled in a
special way compared to other settings to make sure they are only
loaded from and saved to the SYSCONF (to avoid different, possibly
contradicting sources of truth).
This was breaking Metroid Prime 2: Echoes’s scanner in some rooms, such
as the one in https://fifoci.dolphin-emu.org/dff/mp2-scanner/
It was found on Ivy Bridge on Mesa, the alpha value read back from the
EFB was off-by-4 in multiple objects, which was a conversion error
because int4() is equivalent to floor() and the value wasn’t always
higher.
This is supposed to get efb2tex to the same texture as efb2ram, by applying the related efb copies as updates after each other, in the order of their creation.
Improve bookkeeping around formats. Hopefully make code less confusing.
- Rename TlutFormat -> TLUTFormat to follow conventions.
- Use enum classes to prevent using a Texture format where an EFB Copy format
is expected or vice-versa.
- Use common EFBCopyFormat names regardless of depth and YUV configurations.
This was mainly included for debugging, but could end up being confusing
for users, as well as polluting the GL program cache with a mix of uber
and specialized shaders if the option was changed.
These vertex formats enable all attributes. Inactive attributes are set
to offset=0, and the smallest type possible. This "optimization" stops
the NV compiler from generating variants of vertex shaders.
This is a remake of https://github.com/dolphin-emu/dolphin/pull/3749
Full credit goes to phire.
Old message:
"If none of the texture registers have changed and TMEM hasn't been invalidated or changed in other ways, we can blindly reuse the old texture cache entries without rehashing.
Not only does this fix the bloom effect in Spyro: A Hero's Tail (The game abused texture cache) but it will also provide speedups for other games which use the same texture over multiple draw calls, especially when safe texture cache is in use."
Changed the pr per phire's instructions to only return the current texture(s) if none of the texture registers were changed. If any texture register was changed, fall back to the default hashing and rebuilding textures from memory.
Some code was calling more than one of these functions in a row
(in particular, FileUtil.cpp itself did it a lot...), which is
a waste since it's possible to call stat a single time and then
read all three values from the stat struct. This commit adds a
File::FileInfo class that calls stat once on construction and
then lets Exists/IsDirectory/GetSize be executed very quickly.
The performance improvement mostly matters for functions that
can be handling a lot of files, such as File::ScanDirectoryTree.
I've also done some cleanup in code that uses these functions.
For instance, some code had checks like !Exists() || !IsDirectory(),
which is functionally equivalent to !IsDirectory(), and some
code was using File::GetSize even though there was an IOFile
object that the code could call GetSize on.
This fixes the global-static fifo object causing infinite hangs in some
cases. Notably, failure to initialize a graphics backend would result in
BlockingLoop::Prepare being called but never executing Run(), leaving the
object in a bad state.
This code hadn't been touched since 2010. Nowadays, the panic alert
setting is loaded by ConfigManager and applied in UICommon.
VideoConfig has no business messing with it.
Use @Orphis's FindFFmpeg module from ppsspp:
2149d3db7f
From that commit:
> This new module should be able to handle both libraries in the regular
> paths and fallback to pkg-config.
> It is also able to find dynamic libraries, not just static libraries.
> It will generate imported targets with the name FFmpeg::<lib> that you
> can use in your scripts.
This removes the need for multiple texture files to store the mipmap
chain for a texture. As many mipmaps will be loaded as are present in
the DDS file, and any remaining mipmaps will fall back to the old
behavior.
This is because we re-use BlendingState for our internal drawing (e.g.
RasterFont) and for these shaders, we can't assume the presence of a
second color output.
Currently, we use the alpha channel from the EFB even if the current
format does not include an alpha channel. Now, the alpha channel is set
to 1 if the format does not have an alpha channel, as well as truncating
to 5/6 bits per channel. This matches the EFB-to-texture behavior.
Some widescreen hacks (see below) properly force anamorphic output, but
don't make the last projection in a frame 16:9, so Dolphin doesn't
display it correctly.
This changes the heuristic code to assume a frame is anamorphic based on
the total number of vertex flushes in 4:3 and 16:9 projections that
frame. It also adds a bit of "aspect ratio inertia" by making it harder
to switch aspect ratios, which takes care of aspect ratio flickering
that some games / widescreen hacks would be susceptible with the new
logic.
I've tested this on SSX Tricky's native anamorphic support, Tom Clancy's
Splinter Cell (it stayed in 4:3 the whole time), and on the following
widescreen hacks for which the heuristic doesn't currently work:
Paper Mario: The Thousand-Year Door (Gecko widescreen code from Nintendont)
C202F310 00000003
3DC08042 3DE03FD8
91EEF6D8 4E800020
60000000 00000000
04199598 4E800020
C200F500 00000004
3DE08082 3DC0402B
61CE12A2 91CFA1BC
60000000 387D015C
60000000 00000000
C200F508 00000004
3DE08082 3DC04063
61CEE8D3 91CFA1BC
60000000 7FC3F378
60000000 00000000
The Simpsons: Hit & Run (AR widescreen code from the wiki)
04004600 C002A604
04004604 C09F0014
04004608 FC002040
0400460C 4082000C
04004610 C002A608
04004614 EC630032
04004618 48220508
04041A5C 38600001
04224344 C002A60C
04224B1C 4BDDFAE4
044786B0 3FAAAAAB
04479F28 3FA33333
Fixes bug #10183 [0] introduced by 3bd184a / PR #4467 [1].
TextureCacheBase was no longer calling `entry->Load` for custom textures
since the compute shader decoding logic was added. This adds it back in.
It also slightly restructures the decoding if-group to match the one
below, which I think makes the logic more obvious.
(recommend viewing with `git diff -b` to ignore the indentation changes)
[0]: https://bugs.dolphin-emu.org/issues/10183
[1]: https://github.com/dolphin-emu/dolphin/pull/4467
New Super Mario Bros on PAL still renders at 60 fps, but skips every 5th XFB copy.
So our detection of "per frame" fails, and we require twice the amound of texture objects.
But our pool frees unused textures after 3 frames, so half of them needs to be reallocated
every few frames.
This commit removes the lock for render targets. It was introduced to not update a texture
while it is still in use. But render targets aren't updated while rendering, so this
lock isn't needed. Non-rendertarget textures however aren't as dynamic, so the lock should
have no performance update.
This stops the virtual method call from within the Renderer constructor.
The initialization here for GL had to be moved to VideoBackend, as the
Renderer constructor will not have been executed before the value is
required.
This moves all the byte swapping utilities into a header named Swap.h.
A dedicated header is much more preferable here due to the size of the
code itself. In general usage throughout the codebase, CommonFuncs.h was
generally only included for these functions anyway. These being in their
own header avoids dumping the lesser used utilities into scope. As well
as providing a localized area for more utilities related to byte
swapping in the future (should they be needed). This also makes it nicer
to identify which files depend on the byte swapping utilities in
particular.
Since this is a completely new header, moving the code uncovered a few
indirect includes, as well as making some other inclusions unnecessary.
Using the AVCodecContext contained in AVStream for muxing is officially
discouraged[1] and AVStream::codec was deprecated in favor of
AVStream::codecpar in libavformat 57.33.100 / 57.5.0.
1: [FFmpeg-cvslog] lavf: replace AVStream.codec with AVStream.codecpar: https://ffmpeg.org/pipermail/ffmpeg-cvslog/2016-April/099152.html
Keeps associated data together. It also eliminates the possibility of out
parameters not being initialized properly. For example, consider the
following example:
-- some FramebufferManager implementation --
void FBMgrImpl::GetTargetSize(u32* width, u32* height) override
{
// Do nothing
}
-- somewhere else where the function is used --
u32 width, height;
framebuffer_manager_instance->GetTargetSize(&width, &height);
if (texture_width != width) <-- Uninitialized variable usage
{
...
}
It makes it much more obvious to spot any initialization issues, because
it requires something to be returned, as opposed to allowing an
implementation to just not do anything.
From what I can tell, the emulated GPU places (0,0) at the lower left of
the image, and we were generating texture coordinates so that (0,0) was
at the upper-left in the expansion geometry shader, causing textures
used by point sprites to be flipped vertically.
Fixes the upside-down A button in Mario Golf.
These provide the same semantics, however aggregate initialization
doesn't force the structs to be trivially copyable. memset, on the other
hand, does.
Making changes to ConfigManager.h has always been a pain, because
it means rebuilding half of Dolphin, since a lot of files depend on
and include this header.
However, it turns out some includes are unnecessary. This commit
removes ConfigManager includes from files which don't contain
SConfig or GPUDeterminismMode or GPU_DETERMINISM (which means the
ConfigManager include is not used).
(I've also had to get rid of some indirect includes.)
This is mainly for potential Android fifoci usage, and thus is not
exposed anywhere in the UI. To enable, set DumpFramesAsImages under
Settings in GFX.ini.
This increase the performance of good backends a bit, but slows down the bads one a lot.
Let's fix those backends instead of forcing stupid memcpy in the common code.
This used an invalid pointer, which was only valid within AddFrame.
This drops a feature which shall dump the last frame as it might was dropped before.
A good implementation however should "overwrite" the last frame if the time matches.
But this needs to delay every frame a bit.
Commit 4969415 modified calls to GetInterpolationQualifier, but mistakenly changed the order of some boolean parameters: GetInterpolationQualifier(true, false) was changed to GetInterpolationQualifier(…, false, true).
Should fix#9783.
There's no official implementation of the Vulkan API,
and Dolphin currently isn't set-up to work with the
single, commercially-available third-party implementation.
This is an issue because a driver may have to maintain two copies of a
texture if it batches all uploads together at the start of a frame.
In the Vulkan backend, we do something similar to avoid breaking out of a
render pass to copy a texture from the streaming buffer to the destination
image.
This was causing issues in the sms-bubbles fifolog, where an EFB copy to
the same address of a previously-used texture caused the previous texture
to be re-used again for a different image later on in the frame, causing
the original contents to be discarded.
Some OSD messages were displayed in RenderBase.cpp using global variables and some code duplicated
in OnScreeDisplay.cpp.
Now all messages are displayed using functions in the OSD namepace.
* OSDChoice and OSDTime global variables are gone
* All OSD logic is kept at the same place
* All messages are properly aligned
* Clean characters for all OSD messages
Original commit:
commit f0ec61c057
Author: Aestek <thib.gilles@gmail.com>
Date: Sun Aug 7 16:08:41 2016 +0200
This is needed because for some reason the WSI for NV Vulkan drivers
doesn't return VK_ERROR_OUT_OF_DATE_KHR, so there is no other way to know
that a resize has occured apart from polling, which is a poor solution for
X11 (since it is blocking).
Some games will set q to a different value than 1.0 through
texture matrix manipulations. It seems the console will still
do the division in that case.
Some compilers don't have an automatic abs() overload for floats.
Doesn't really matter if they use the integer variant here, but
it's better to be explicit about the fact that we're using floats.
It's not available in OpenGL ES and officially it's not supported on OpenGL 3.0/3.1.
Fallback to old depth range code if there is no method to disable depth clipping.
It's more important to have correct clipping than to have accurate depth values.
Inaccurate depth values can be fixed by slow depth.
Replaces old and simple usages of std::atomic<bool> with Common::Flag
(which was introduced after the initial usage), so it's clear that
the variable is a flag and because Common::Flag is well tested.
This also replaces the ready logic in WiimoteReal with Common::Event
since it was basically just unnecessarily reimplementing Common::Event.
* Focus "Hash Code" / "IP address" text box by default in "Connect"
* Focus game list in "Host" tab
* RETURN keypress now host/join depending on selected tab
* Remember last hosted game
* Remove PanicAlertT:
* Simply log message to netplay window
* Remove them when they are useless
* Show some netplay message in OSD
* Chat messages
* Pad buffer changes
* Desync alerts
* Stop the game consistently when another player disconnects / crashes
* Prettify chat textbox
* Log netplay ping to OSD
Join scenario:
* Copy netplay code
* Open netplay
* Paste code
* Press enter
Host scenario:
* Open netplay
* Go to host tab
* Press enter
Note: It's not 100% perfect, as some of the GPU capablities leak into the
pixel shader UID.
Currently our UIDs don't get exported, so there is no issue. But someone
might want to fix this in the future.
As much as possible, the asserts have been moved out of the GetUID
function. But there are some places where asserts depend on variables
that aren't stored in the shader UID.
Bug Fix: It was theoretically possible for a shader with depth writes
disabled to map to the same UID as a shader with late depth
writes.
No known test cases trigger this.
Bug Fix: The normal stage UIDs were randomly overwriting indirect
stage texture map UID fields. It was possible for multiple
shaders with diffrent indirect texture targets to map to
the same UID.
Once again, it dpesn't look like this bug was ever triggered.
This frees up 21 bits and allows us to shorten the UID struct by an entire
32 bits.
It's not strictly needed (as it's encoded into the length) but I added a
bit for per-pixel lighiting to make my life easier in the following
commits.
The only code which touches xfmem is code which writes directly into
uid_data.
All the rest now read their parameters out of uid_data.
I also simplified the lighting code so it always generated seperate
codepaths for alpha and color channels instead of trying to combine
them on the off-chance that the same equation works for all 4 channels.
As modern (post 2008) GPUs generally don't calcualte all 4 channels
in a single vector, this optimisation is pointless. The shader compiler
will undo it during the GLSL/HLSL to IR step.
Bug Fix: The about optimisation was also broken, applying the color light
equation to the alpha light channel instead of the alpha light
euqation. But doesn't look like anything trigged this bug.
Fixes a major preformance regression in Skies of Arcadia during
battle transisions.
I had plans for a more advanced version of this code after 5.0,
but here is a minimal implemenation for now.
Using glMapBufferRange to read back the contents of the SSBO is extremely
slow on NVIDIA drivers. This is more noticeable at higher internal
resolutions. Using glGetBufferSubData instead does not seem to exhibit
this slowdown.
This is an oversight from pr https://github.com/dolphin-emu/dolphin/pull/3266 . Thanks to degasus for pointing this out.
It's possible that MAX_TEXTURE_BINARY_SIZE can be optimised, but i wanted to play it safe considering the 5.0 stable release.
Drivers that don't support GL_ARB_shading_language_420pack require that
the storage qualifier be specified even when inside an interface block.
AMD's driver throws a compile error when "centroid in/out" is used within
an interface block.
Our previous behavior was to include the storage qualifier regardless, but
this wasn't working on AMD, therefore we should check for the presence of
the extension and include based on this, instead.
I'm not entirely sure what is happening, but this optimisation is causing an issue in Sonic Riders: Zero Gravity. Apparently the issue would also be fixed by PR#3747, but this PR should also fix similar issues.
Games that use partial updates might get slower with this, so some performance regression testing would be nice. Games like New Super Mario Bros, RS2, Zelda TP and Silent Hill. Testing with high graphics settings makes sense, since this would mostly end up in more work for the GPU.
The D3D backend was always forcing Anisotropic filtering when that is enabled regardless of how the game chose to configure the texture filtering registers; this causes the same issues as "Force Filtering" without Anisotropy, such as causing game UI elements to no longer line up adjacent correctly. Historically, OpenGL's Anisotropy support has always worked "better" than D3D's due to seeming to not have this problem; unfortunately, OpenGL's Anisotropy specification only gives GL_LINEAR based filtering modes defined behavior, with only the mipmap setting being required to be considered. Some OpenGL implementations were implicitly disabling Anisotropy when the min/mag filters were set to GL_NEAREST, but this behavior is not required by the spec so cannot be relied on.
- remove an outdated comment about the efb to ram and scaled efb restriction
- when upscaling efb copies, mark the new texture as efb copy
- dx12 fixes for the src box, especially the number of layers for 3D
OS X's shader compiler has a bug with interface blocks where interface block members don't properly inherit the layout qualifier from the interface
block.
Work around this limitation by explicitly stating the layout qualifier on both the interface block and every single member inside of that block.
As confirmed by a hardware test if we are using the texgen type of COLOR_STRGBC0/STRGBC1 then it sets the texture coordinates to those values
regardless of what the input form or source row is.
Thanks to Ornox for testing again
Removes a couple asserts in the vertex shader gen when dealing with the input form.
Typically input form ABC1 is used, so it'll pull in the first three elements and always set the fourth to 1.0
The other input form available is AB11, which sets the last two components to 1.0 (Theoretically).
No titles actually use this input form that we know of except for Project M, but it can have some fairly drastic visual differences.
Confirmed correct by hardware test
This should get Donkey Kong Country Returns characters to be as broken as they should be. They will be fixed in a later pr.
Expected result is:
efbtex: characters are always flickering or invisible, no matter what scaling or IR setting
efb2ram: characters are always working properly at 1xIR, no matter what scaling or IR setting
Fast depth is now more accurate than slow depth and should always be used.
The option will be kept in a different form as it is still used as a hack to fix some games.
Also, the slow depth code path will still be relied upon by cards that don't support GL_ARB_clip_control.
CBoot::BootUp() did call CoreTiming::Advance which itself blocks on the GPU,
but the GPU thread wasn't started already. This commit moves the SyncGPU
initialization into the Fifo.cpp file and call it after BootUp().
Setting this is not required anymore as of commit 40cf1bbacc622 of
FFmpeg.
For users of older versions of the libavcodec library we guard the
change with an #if.
(x % y) is not defined in GLSL when sign(x) != sign(y).
This also has the added benefit of behaving the same as sampler wrapping modes, in regards to negative inputs.
This was causing crashes/driver resets when odd-dimension textures were
being loaded, due to the size we were uploading being larger than the size
of the higher-level texture calculated by the runtime.
People who make texture packs usually release them using a specific ID
(for instance SX4E01). Users who have a different version of the game
(like the PAL version SX4P01) then need to rename the custom texture
folder to match. This is a lot simpler than renaming every texture file,
as was required with the old texture format, but it's still something
that users can forget to do. To make that unnecessary, this change makes
it possible to use three-character region-free IDs for custom texture
folders, similarly to how game INIs can use three-character IDs. Once
most people have updated to Dolphin versions that include this change,
those who make texture packs will be able to name them with
three-character IDs, removing the need for users to rename anything.
We don't throttle by frames, we throttle by coretiming speed.
So looking up VI for calculating the speed was just very wrong.
The new ini option is a float, 1.0f for fullspeed.
In the GUI, percentual values are used.
This fixes the crashes occuring at startup with a non-empty shader cache.
Because LinearDiskCache reads/writes to the storage of ShaderUid, ShaderUid must be trivially copyable.
Additionally, adds a static assert to LinearDiskCache to ensure this doesn't happen in the future.
The initialization of ShaderUid data has been moved to the code generation functions, so the above condition holds true.
This reverts commit 81414b4fa2, reversing
changes made to b926061f64.
Conflicts:
Source/Core/DolphinWX/Frame.cpp
Source/Core/VideoCommon/VideoConfig.cpp
Source/Core/VideoCommon/VideoConfig.h
Approximately three or four times now, the issue of pointers being
in an inconsistent state been an issue in the video backend renderers
with regards to tripping up other developers.
Global (ugh) resources are put into a unique_ptr and will always have a
well-defined state of being - null or not null