Adding AmdPowerXpressRequestHighPerformance
This will allow AMD drivers to detect the request to use the dGPU instead of the iGPU on compatible hybrid graphics systems.
Reference: https://community.amd.com/thread/169965
Fixes https://bugs.dolphin-emu.org/issues/12245.
I considered making a change to DolphinQt instead of
the core, but then additional effort would've been
required to add the same fix to the Android GUI once
we start using the new config system there.
This is now unused. Seems like it was an improper fix
(there would be a race if saving the screenshot took longer
than 2 seconds) back when it was used too.
Other than the controller settings and JIT debug settings,
these are the only settings which were defined in Java code
but not defined in the new config system in C++. (There are
still a lot of settings that are defined in the new config
system but not yet saveable in the new config system, though.)
While manually capturing constexpr variables used in lambda
expressions does work, it's really easy to forget doing so since
we don't have a Windows CMake builder and the workaround isn't
necessary anywhere else. Fortunately, MSVC has a flag that fixes
the constexpr capture behavior, so let's use that instead.
These are only ever used with ShaderCode instances and nothing else.
Given that, we can convert these helper functions to expect that type of
object as an argument and remove the need for templates, improving
compiler throughput a marginal amount, as the template instantiation
process doesn't need to be performed.
We can also move the definitions of these functions into the cpp file,
which allows us to remove a few inclusions from the ShaderGenCommon
header. This uncovered a few instances of indirect inclusions being
relied upon in other source files.
One other benefit is this allows changes to be made to the definitions
of the functions without needing to recompile all translation units that
make use of these functions, making change testing a little quicker.
Moving the definitions into the cpp file also allows us to completely
hide DefineOutputMember() from external view, given it's only ever used
inside of GenerateVSOutputMembers().
A very trivial conversion, this simply converts calls to Write over to
WriteFmt and adjusts the formatting specifiers as necessary.
This also allows the const char* parameters to become std::string_view
instances, allowing for ease of use with other string types.
Lambda expressions with uncaptured constants were leading to errors,
and there were also some warnings about deprecated functions
(QFontMetrics::width and inet_ntoa).
It actually maps to postMtxInfo, not posMtxInfo (which isn't a thing).
This is especially confusing because there *are* position matrices (as
opposed to post-transform matrices).
It used to be the case that frame advance skipped duplicate frames
(i.e. it would take 30 frame advances to get through one second
of emulated time in a 30 fps game), but this broke in 9c5c3c0.
Skipping duplicate frames making TASing less annoying.
Hardware tests have shown that if the number of texgens/channels do not
match, you get garbage rendering. Presumably because the output
registers from the XF stage are fed into the incorrect input registers
for TEV/BP.
Currently, this causes Dolphin to crash/generate invalid shaders with an
assertion failure in the hardware backends. Instead, we log an error.
Perhaps in the future we should just spit out all texgens/colors anyway
from both stages, and let cross-stage optimization take care of DCE'ing
it away. But doing so would require changing the UIDs and invalidating
everyone's shader caches.
It seems that the newer version of fmt gets tripped up by bitfields
within structs. However, we can just specify the intended type where
necessary to get around this.
Migrates the shader generator off the use of a global array, eliminating
the use of some global state. This also allows us to move the shader
generation over to using fmt in a subsequent change.
Currently, we do not display every second frame in 25fps/30fps games
which run to vsync. This improves performance as there's less rendering
for the GPU to perform, but when combined with vsync, could cause frame
pacing issues.
This commit adds an option to force every frame generated by the console
to be displayed to the host, which may improve pacing for these games.
The frame number is incremented before the first frame is swapped out.
Fixes ffmpeg creating invalid video files on output if the emulator only
runs for a single frame, e.g. FifoCI.
Now that we have an actual interface to manage things, we can stop
duplicating the calls to to the pixel shader manager and remove the
need to remember to actually do so when disabling or enabling the
bounding box.
Rather than expose the bounding box members directly, we can instead
provide an interface for code to use. This makes it nicer to transition
from global data, as the interface function names are already in
place.
Now that we've extracted all of the stateless functions that can be
hidden, it's time to make the index generator a regular class with
active data members.
This can just be a member that sits within the vertex manager base
class. By deglobalizing the state of the index generator we also get rid
of the wonky dual-initializing that was going on within the OpenGL
backend.
Since the renderer is always initialized before the vertex manager, we
now only call Init() once throughout the execution lifecycle.
We can use if constexpr with the template functions that pass in a
non-type template parameter, allowing the removal of branches that
aren't taken at compile time.
Compilers will generally do this by default, however, we now give a
gentle prodding to the compiler if this would otherwise not be the case.
These don't rely on any of the static members within the IndexGenerator
class, so we can make all of these functions fully internal to the
translation unit.
We can make use of if constexpr in several scenarios here to allow
compilers to exise the relevant code paths out.
Technically a decent compiler would do this already, but now we can give
compilers a little more nudging here in the event that isn't the case.
cmd2 is a u32, so any bitwise arithmetic on it with a type of the same
size or smaller will result in a u32 value. This is also implicitly
converted to an unsigned type in the if statement as well, given that
size_t * int -> size_t.
This is just more explicit about the operations occurring and also
likely silences a sign conversion warning.
We only use these string streams to output into a final std::string
instance, we don't read into types with them. Because of this, we can
just make use of std::ostringstream, rather than the fully-fledged
std::stringstream.
No behavioral change. This is intended to make the transition to fmt
less noisy in subsequent changes by combining insertions of multiple
string literals into one where applicable.
Begins the conversion of the shader generators over to using fmt
formatting specifiers.
This also has a benefit over the older StringFromFormat-based API in
that all formatted data is appended to the existing buffer rather than
creating a completely separate string and then appending it to the
internal string buffer.
Migrates most of VideoCommon over to using fmt, with the exception being
the shader generator code. The shader generators are quite large and
have more corner cases to deal with in terms of conversion (shaders have
braces in them, so we need to make sure to escape them).
Because of the large amount of code that would need to be converted, the
conversion of VideoCommon will be in two parts:
- This change (which converts over the general case string formatting),
- A follow up change that will specifically deal with converting over
the shader generators.
OpenGL doesn't render to a 2-layer backbuffer like D3D/Vulkan for quad-buffered
stereo, instead drawing twice with the eye selected by glDrawBuffer()
(see OGL::Renderer::RenderXFBToScreen).
This is a giant hack which was previously removed because it causes
broken rendering. However, it seems that some devices still do not
support logical operations (looking at you, Adreno/Mali). Therefore, for
a handful of cases where the hack actually makes things slightly better,
we can use it.
... but not without spamming the log with warnings. With my warning
message PR, we can inform the users before emulation starts anyway.
Same behavior without hardcoding the type of the mutex within the lock
guards. This means the type of the mutex would be able to be changed
without needing to also change all occurrences lock guards are used.
Allows callers to std::move strings into the functions (or automatically
assume the move constructor/move assignment operator for rvalue
references, potentially avoiding copies altogether.
This header doesn't actually make use of MathUtil.h within itself, so
this can be removed. Many other source files used VideoCommon.h as an
indirect include to include MathUtil.h, so these includes can also be
adjusted.
While we're at it, we can also migrate valid inclusions of VideoCommon.h
into cpp files where it can feasibly be done to minimize propagating it
via other headers.
This isn't used anywhere, so it can be removed. This also potentially
fixes an underlying compilation error waiting to happen, given DECSTAT
could have potentially been used, someone disables statistics (for
whatever reason), then gets a compilation error due to the #else case
not containing an empty definition of DECSTAT.
Makes the global variable follow our convention of prefixing g_ on
global variables to make it obvious in surrounding code that it's not a
local variable.
Rather than making Statistics' member functions operate on the global
variable instance of itself, we can make these functions member
functions and operate on a by-instance state, removing the direct
dependency on the global variable itself.
This also makes for less reading, as there's no need to repeat "stats."
for all variable accesses.
Normalizes all variables related to statistics so that they follow our
coding style.
These are relatively low traffic areas, so this modification isn't too
noisy.
Now that the floating point members are assigned in bulk, we can remove
their setter macro. While we're at it, we can also remove the setter for
unsigned int, given it's not used.
ImGui::Text() assumes that the incoming text is intended to be
formatted, but we don't actually use it to format anything. We can be
explicit by using the relevant function.
This also has a plus of not needing to go through the formatter itself,
but the gains from that are probably minimal.
This fixes the problem where OBS game capture only grabs the region
inside an ImGui window whenever one is open, when using the OpenGL
backend. Shouldn't have any negative effects, as the scissor would've
been something completely arbitrary anyways.
This may affect other capture software that uses the same hooking
method, but I've only tested OBS.
Previously, this array potentially wouldn't be placed within the
read-only segment, since it wasn't marked const. We can make the lookup
table const, along with any other nearby variables.
Avoids dragging in IniFile, EXI device and SI device headers in this header which is
quite widely used throughout the codebase.
This also uncovered a few cases where indirect inclusions were being
relied upon, which this also fixes.
Since C++17, non-member std::size() is present in the standard library
which also operates on regular C arrays. Given that, we can just replace
usages of ArraySize with that where applicable.
In many cases, we can just change the actual C array ArraySize() was
called on into a std::array and just use its .size() member function
instead.
In some other cases, we can collapse the loops they were used in, into a
ranged-for loop, eliminating the need for en explicit bounds query.
We can use u32 instead of unsigned int to shorten up these definitions
and make them much nicer to read.
While we're at it, change the size array to house u32 elements
to match the return value of the function.
We can use u32 instead of unsigned int to shorten up these definitions
and make them much nicer to read.
While we're at it, change the size array to house u32 elements to match
the return value of the function.
This is only ever used to retrieve a raw view of the given UID data
structure, however it's already valid C++ to retrieve a char/unsigned
char view of an object for bytewise inspection.
u8 maps to unsigned char on all platforms we support, so we can just do
this directly with a reinterpret cast, simplifying the overall
interface.
Zero-initialization zeroes out all members and padding bits, so this is
safe to do. While we're at it, also add static assertions that enforce
the necessary requirements of a UID type explicitly within the ShaderUid
class.
This way, we can remove several memset calls around the shader
generation code that makes sure the underlying UID data is zeroed out.
Now our ShaderUid class enforces this for us, so we don't need to care about
it at the usage sites.
Greatly simplifies the overall interface when it comes to compiling
shaders. Also allows getting rid of a std::string overload of the same
name. Now std::string and const char* both go through the same function.
Makes VertexLoader_Normal completely stateless, eliminating the need for
an Init() function, and by extension, also gets rid of the need for the
FifoAnalyzer to have an Init() function.
Many of the arrays defined within this file weren't declared as
immutable, which can inhibit the strings being put into the read-only
segment. We can declare them constexpr to make them immutable.
While we're at it, we can use std::array, to allow bounds conditional
bounds checking with standard libraries. The declarations can also be
shortened in the future when all platform toolchain versions we use
support std::array deduction guides. Currently macOS and FreeBSD
builders fail on them.
Ensures that the destruction logic is kept local to the translation unit
(making it nicer when it comes to forward declaring non-trivial types).
It also doesn't really do much to define it in the header.
Given we're simply storing the std::string into a deque. We can emplace
it and move it. Completely avoiding copies with the current usage of the
function.
If bounding box is enabled when a UID cache is created, then later disabled,
we shouldn't emit the bounding box portion of the shader.
Fixes pipeline creation errors on D3D12 backend for this case.
Since the copy X and Y coordinates/sizes are 10-bit, the game can configure a
copy region up to 1024x1024. Hardware tests have found that the number of bytes
written does not depend on the configured stride, instead it is based on the
size registers, writing beyond the length of a single row. The data written
for the pixels which lie outside the EFB bounds does not wrap around instead
returning different colors based on the pixel format of the EFB.
This suggests it's not based on coordinates, but instead on memory addresses.
The effect of a within-bounds size but out-of-bounds offset
(e.g. offset 320,0, size 640,480) are the same.
As it would be difficult to emulate the exact behavior of out-of-bounds reads,
instead of writing the junk data, we don't write anything to RAM at all for
over-sized copies, and clamp to the EFB borders for over-offset copies.
When looking into the Faron Woods fifolog, I noticed this code was quite
high in the profile (~10%). Clearing 4096 entries from the vector isn't
needed every draw, we only need to do this when the cache was actually
valid in the first place.
Should provide a slight general performance boost.
This was causing the depth buffer to be discarded as well, which
has an effect on mobiles (doesn't get loaded into tile memory).
If we find this is hindering performance (remember, the EFB is
only a 640x528 texture), it may be worth changing the interface to
support discarding only the colour buffer.
for this kind of footage carrying alpha information makes no sense, and it additionally complicates things by hugely damaging compatibility of the resulting video. after this change alone the video becomes compatible with VfW/WinAPI and tools that rely on it (avisynth, virtualdub).
fixes https://bugs.dolphin-emu.org/issues/11141 and https://bugs.dolphin-emu.org/issues/10193
Unnecessary since b93b7ec. It was needed before that commit becase
RenderBase.cpp only was checking the value of aspect_mode, not
suggested_aspect_mode.
This way we don't end up with artifacts of the EFB's alpha values in
frame dumps. XFB copies loaded from RAM also set the alpha to 1, so this
will match.
Due to the current design, any of the GL state can be mutated after
calling this function, so we can't assume that the tracked state will
match if we call SetPipeline() after ResetAPIState().
Since we use the common pipelines here and draw vertices if a batch is
currently being built by the vertex loader, we end up trampling over its
pointer, as we share the buffer with the loader, and it has not been
unmapped yet. Force a pipeline flush to avoid this.
It doesn't feel great to let the value from a previous emulation session
linger around considering that the GC aspect ratio heuristic can use
the previous value of m_aspect_wide when calculating m_aspect_wide.
The Metal shader compiler fails to compile the atomic instructions
when operating on individual components of a vector. Spltting it
into four variables shouldn't make any difference for other
platforms, as they are accessed independently.
The path to the MoltenVK library can be specified by the
LIBMOLTENVK_PATH environment variable, otherwise it assumes it is
located in the application bundle's Contents/MacOS directory.
floatindex is clamped to the range [0, 9]. For non-negative numbers
floor() is equivalent to trunc(). Truncation happens implicitly when
converting to uint, so the floor() is unnecessary.
This avoids out-of-bounds warnings when replaying FIFO captures.
The value of XF_REGS_SIZE is written into the DFF header and we only
read the min of XF_REGS_SIZE and the header value, so this change is
backward compatible and doesn't break forward compatibility for old
Dolphin versions either.
The current approach results in the UI thread creating a graphics device
whilst the core is running, leading to races on function pointers, and
potentially crashing.
This fixes severe image flickering in some cutscenes of Twin Snakes. The game appears to sometimes load a previously made XFB copy as a texture before it is actually rendered to the screen, which we took as an invitation to invalidate the XFB copy.
If, for whatever reason, the XFB has to be loaded from console memory, it's possible that the texture is returned at native resolution instead of EFB-scaled resolution. In this case, our xfb_rect.right adjustment must also happen at native resolution instead of scaled resolution.
This was causing a warning in the shader compiler, as the rgb components
were not initialized. Which shouldn't be an issue, as the rgb is not
used in the blend equation, only the alpha. However, the lack of
initialization causes crashes in Intel's D3D shader compiler, so we'll
play nice and initialize all the channels.
Our usage of glFinish() can cause driver crashes and/or lockups.
Please note that this disables the background shader compilation (i.e.
all shaders will be compiled on boot). There is no way around this.
AVFormatContext::filename was deprecated in lavf 58.7.100 in favor
of AVFormatContext::url. Instead of adding version-checking logic,
just use the passed-in dump path instead.
We want this setting to invalidate the cache because it may affect the appearance of textures in the rendered scene, therefore one would expect changing it while the game is running to have the expected effect immediately.
This no longer converts from sRGB to linear for the reference mip
downsample - even if the original mipmap creation tool used an sRGB
colorspace (which isn't really guaranteed, and may even change per
game), this is a "fast" heuristic that's only an estimate anyway.
The average diff is also now stored in a u64, avoiding floating point
calculations in the per-pixel hot loop.
This should speed up the detection significantly, hopefully fixing
jank when loading in new textures.
Makes the values strongly-typed and gets more identifiers out of the
global namespace.
We are forced to use anything that is not "None" to mean none, because
X11 is garbage in that it has:
\#define None 0L
Because clearly no one else will ever want to use that identifier for
anything in their own code (and is why you should prefix literally
any and all preprocessor macros you expose to library users in public
headers).
This is much better as prefixed double underscores are reserved for the
implementation when it comes to identifiers. Another reason its better,
is that, on Windows, where __forceinline is a compiler built-in, with
the previous define, header inclusion software that detects unnecessary
includes will erroneously flag usages of Compiler.h as unnecessary
(despite being necessary on other platforms). So we define a macro
that's used by Windows and other platforms to ensure this doesn't
happen.
Instead of globbing things under an ambiguous Common.h header, move
compiler-specifics over to Compiler.h. This gives us a dedicated home
for anything related to compilers that we want to make functional across
all compilers that we support.
This moves us a little closer to eliminating Common.h entirely.
Fairly trivial to resolve, we just initialize the std::array with two
sets of braces (one set to create the array, the other to start and end the
aggregate data that we'll end up returning)
Coherent mappings have a lower overhead and less GL codes.
So enables coherent mapping by default for all drivers.
Both Qualcomm and ARM performs very bad with explicit flushing, so this change helps them as well.
AFAIK there was one GPU generation which was slower on coherent mapping: nvidia tesla
So Geforce 200 and 300 series should be tested with this PR before merging.
As this was last tested many years ago, this issue might have been fixed as well.
Those GPUs are close to 10 years old and not supported any more by nvidia.
D3D11 cannot handle block compressed textures where the first mip level
is not a multiple of the block size. The simple fix for texture pack
authors: leave these textures uncompressed. You can still use a .dds
container.
Using 8-bit integer math here lead to precision loss for depth copies,
which broke various effects in games, e.g. lens flare in MK:DD.
It's unlikely the console implements this as a floating-point multiply
(fixed-point perhaps), but since we have the float round trip in our
EFB2RAM shaders anyway, it's not going to make things any worse. If we
do rewrite our shaders to use integer math completely, then it might be
worth switching this conversion back to integers.
However, the range of the values (format) should be known, or we should
expand all values out to 24-bits first.
Also ensure that all members of the class are initialized on
construction as well. Previously the bool indicating if options are
dirty wouldn't be initialized, which could be read uninitialized if an
instance was constructed and then IsDirty() is called.
Given this is a base class, we should clearly state what the parameters
to the functions in its exposed interface actually mean or represent.
This avoids needing to hunt for the definition of the functions in cpp
files.
While we're at it, normalize said parameter names so they follow our
naming guidelines.
Hopefully this better matches the user's view of a texture - as large changes in
colour should be weighted higher than lots of very small changes
Note: This likely invalidates the current heuristic threshold default
This makes it possible to use enums as the config type.
Default values are now clearer and there's no need for casts
when calling Config::Get/Set anymore.
In order to add support for enums, the common code was updated to
handle enums by using the underlying type when loading/saving settings.
A copy constructor is also provided for conversions from
`ConfigInfo<Enum>` to `ConfigInfo<underlying_type<Enum>>`
so that enum settings can still easily work with code that doesn't care
about the actual enum values (like Graphics{Choice,Radio} in DolphinQt2
which only treat the setting as an integer).
Also move it to MathUtils where it belongs with the rest of the
power-of-two functions. This gets rid of pollution of the current scope
of any translation unit with b<value> macros that aren't intended to be
used directly.
Also makes y_scale a dynamic parameter for EFB copies, as it doesn't
make sense to keep it as part of the uid, otherwise we're generating
redundant shaders.
Prior to this change, it's possible for m_wake_me_up_again to be used
while it's in an uninitialized state from the exposed API.
e.g.
- Using SetEnable after construction would perform an uninitialized read.
- Using PushEvent would perform an uninitialized read by way of operator |=.
internally, an uninitialized read can happen if PullEventsInternal() is
executed before other functions.
Just to avoid the whole possibility of performing uninitialized reads,
we just give the class member a default value of false.