Performance optimization, along with making the code a little
neater. Saves us from performing a single -> double -> single
conversion when calling UpdateFPRFSingle.
If we already have to use a GPR, we might as well take advantage
of the nice immediate encodings provided by GPR ORR. This is
faster, smaller, and saves a register.
The same kind of change as the changes made in the previous
commit, but this change is more involved, in particular because
of how SyncProgramsJobService was using display names as keys.
Now that DOL and ELF files are assigned game IDs, all games have
game IDs. (Unless you intentionally craft an ISO file that has
the first bytes set to null, but if you do that I think you can
live with Dolphin creating a file in GameSettings called ".ini")
Some of the code used when the carry flag is known to be a
constant value is really not much better than just setting
the carry flag and then using the normal code, and with how
rarely this code runs, it isn't well tested either.
Might as well get rid of some of this code and simplify things.
These optimizations were already present, but only when d == a. They
also make sense when this condition does not hold.
- imm == 0
Before:
41 BB 00 00 00 00 mov r11d,0
45 2B DF sub r11d,r15d
After:
45 8B DF mov r11d,r15d
41 F7 DB neg r11d
- imm == -1
Before:
41 BD FF FF FF FF mov r13d,0FFFFFFFFh
44 2B EE sub r13d,esi
0F 93 45 68 setae byte ptr [rbp+68h]
After:
44 8B EE mov r13d,esi
41 F7 D5 not r13d
C6 45 68 01 mov byte ptr [rbp+68h],1
Without this, the code added in ac28b89 misbehaves and considers
AArch64 netplay clients to not have hardware FMA support, telling
all clients to disable FMA support, which causes a desync between
x64 and AArch64 due to JitArm64 not being able to disable FMA support.
Fixes a regression from 5.0-12066, where setting the GFXBackend variable
to one other than the current global backend would crash Dolphin upon
launching the game.
fcmpX only updates the FPCC bits, not the C bit.
This was already correctly implemented in the interpreter.
Not known to affect any games, but affects a hardware test.
This fixes bounding box shaders failing to compile under Vulkan, due to
differences between GLSL and HLSL in the return value of vector
comparisons and what types these functions accept. I included all() for
the sake of completeness.
At higher resolutions, our bounding box dimensions end up being
slightly larger than original hardware in some cases. This is not
necessarily wrong, it's just an artifact of rendering at a higher
resolution, due to bringing out detail that wouldn't have appeared on
original hardware. It causes a texel to fall partially on what would
have been a single pixel at native resolution, resulting in the
coordinates getting bumped up to the next valid value. In many cases,
these slightly larger bounding boxes are perfectly fine, as games don't
hard-code expected dimensions. It is problematic in Paper Mario TTYD
though, for a somewhat complicated reason.
Paper Mario TTYD frequently uses EFB copies to pre-render a bunch of
animation frames for a character sprite (especially in Chapter 2), so
that it can then render 100 or more of them without bringing the
GameCube to its knees. Based on my observation, the game seems to set
aside a region of memory to store these EFB copies. This region is
obviously fairly small, as the GameCube only has 24MB of RAM. There are
2 rooms in Chapter 2 where you fight a horde of as many as 100 Jabbies,
which are also rendered using EFB copies, so in this room the game ends
up making 130(!) EFB copies just for Puni and Jabbi sprites. This seems
to nearly fill the region of memory it set aside for them.
Unfortunately, our slightly larger bounding boxes at higher resolutions
results in overflowing this memory, causing very strange behavior. Some
EFB copies partially overlap game state, resulting in reading it as a
garbage RGB5A3 texture that constantly changes. Others apparently
somehow trigger a corner case in our persistent buffer mapping, causing
them to partially overwrite earlier EFB copies.
What this change does is only include the screen coordinates that align
with the equivalent native resolution pixel centers, which generally
results in the bounding boxes being more in line with original
hardware. It isn't perfect, but it's enough to fix Paper Mario TTYD's
Jabbi rooms by avoiding the buffer overflow. Notably, it is more
accurate at odd resolutions than at even resolutions. Native resolution
is completely unaffected by this change, as should be the case. This
change may also have a small positive impact on shader performance at
higher resolutions, as there will be less atomic operations performed.
Not doing this can cause desyncs when TASing. (I don't know
how common such desyncs would be, though. For games that
don't change rounding modes, they shouldn't be a problem.)
When I added the software FMA path in 2c38d64 and made us use
it when determinism is enabled, I was assuming that either the
performance impact of software FMA wouldn't be too large or CPUs
that were too old to have FMA instructions were too slow to run
Dolphin well anyway. This was wrong. To give an example, the
netplay performance went from 60 FPS to 30 FPS in one case.
This change makes netplay clients negotiate whether FMA should
be used. If all clients use an x64 CPU that supports FMA, or
AArch64, then FMA is enabled, and otherwise FMA is disabled.
In other words, we sacrifice accuracy if needed to avoid massive
slowdown, but not otherwise. When not using netplay, whether to
enable FMA is simply based on whether the host CPU supports it.
The only remaining case where the software FMA path gets used
under normal circumstances is when an input recording is created
on a CPU with FMA support and then played back on a CPU without.
This is not an especially common scenario (though it can happen),
and TASers are generally less picky about performance and more
picky about accuracy than other users anyway.
With this change, FMA desyncs are avoided between AArch64 and
modern x64 CPUs (unlike before 2c38d64), but we do get FMA
desyncs between AArch64 and old x64 CPUs (like before 2c38d64).
This desync can be avoided by adding a non-FMA path to JitArm64 as
an option, which I will wait with for another pull request so that
we can get the performance regression fixed as quickly as possible.
https://bugs.dolphin-emu.org/issues/12542
Back when I wrote this code, I believe I set it to use a custom path
so that the cache would end up in a directory which Android considers
to be a cache directory. But nowadays the directory which Dolphin's
C++ code considers to be the cache directory is such a directory,
so there's no longer any reason to override the default path.
progressMessage can have the invalid value of 0. That
progressMessage was being used for the thread name was
a typo anyway – it's supposed to use progressTitle.
this prevented some devices from being recreated correctly, as they were exclusive (e.g. DInput Joysticks)
This is achieved by calling Settings::ReleaseDevices(), which releases all the UI devices shared ptrs.
If we are the host (Qt) thread, DevicesChanged() is now called in line, to avoid devices being hanged onto by the UI.
For this, I had to add a method to check whether we are the Host Thread to Qt.
Avoid calling ControllerInterface::RefreshDevices() from the CPU thread if the emulation is running
and we manually refresh devices from Qt, as that is not necessary anymore.
Refactored the way IOWindow lists devices to make it clearer and hold onto disconnected devices.
There were so many issues with the previous code:
-Devices changes would not be reflected until the window was re-opened
-If there was no default device, it would fail to select the device at index 0
-It could have crashed if we had 0 devices
-The default device was not highlighted as such
This helps us keeping the most important devices (e.g. Mouse and Keyboard) on the top
of the list of devices (they still are on all OSes supported by dolphin
and to make hotplug devices like DSU appear at the bottom.
-Fix Add/Remove/Refresh device safety, devices could be added and removed at the same time, causing missing or duplicated devices (rare but possible)
-Fix other devices population race conditions in ControllerInterface
-Avoid re-creating all devices when dolphin is being shut down
-Avoid re-creating devices when the render window handle has changed (just the relevantr ones now)
-Avoid sending Devices Changed events if devices haven't actually changed
-Made most devices populations will be made async, to increase performance and avoid hanging the host or CPU thread on manual devices refresh
A "devices changed" callback could have ended up waiting on another thread that was also populating devices
and waiting on the previous thread to release the callbacks mutex.
Running the min/max operation on the upside down, quad-rounded pixel
coordinates before inverting them to the standard upper-left origin
produces wrong results. Therefore, we need to do the inversion before
rounding to pixel quads.
Fragment coordinates always have a 0.5 offset from a whole integer, as
that's where the pixel center is on modern GPUs. Therefore, we want to
always round the fragment coordinates down for bounding box
calculations. This also renders the pixel center offset useless, as 0.5
vs ~0.5833333 makes no difference when rounding down.
Add external include paths to ExternalIncludePath instead of
AdditionalIncludeDirectories. msbuild appends these paths to
EXTERNAL_INCLUDE env var, which is passed to /external:env:.
Specify /external:W0 and /external:templates-, with override for
DolphinQt for the template flag, since Qt 5.15.0 causes some warnings
in qmap.h