When I added the software FMA path in 2c38d64 and made us use
it when determinism is enabled, I was assuming that either the
performance impact of software FMA wouldn't be too large or CPUs
that were too old to have FMA instructions were too slow to run
Dolphin well anyway. This was wrong. To give an example, the
netplay performance went from 60 FPS to 30 FPS in one case.
This change makes netplay clients negotiate whether FMA should
be used. If all clients use an x64 CPU that supports FMA, or
AArch64, then FMA is enabled, and otherwise FMA is disabled.
In other words, we sacrifice accuracy if needed to avoid massive
slowdown, but not otherwise. When not using netplay, whether to
enable FMA is simply based on whether the host CPU supports it.
The only remaining case where the software FMA path gets used
under normal circumstances is when an input recording is created
on a CPU with FMA support and then played back on a CPU without.
This is not an especially common scenario (though it can happen),
and TASers are generally less picky about performance and more
picky about accuracy than other users anyway.
With this change, FMA desyncs are avoided between AArch64 and
modern x64 CPUs (unlike before 2c38d64), but we do get FMA
desyncs between AArch64 and old x64 CPUs (like before 2c38d64).
This desync can be avoided by adding a non-FMA path to JitArm64 as
an option, which I will wait with for another pull request so that
we can get the performance regression fixed as quickly as possible.
https://bugs.dolphin-emu.org/issues/12542
Back when I wrote this code, I believe I set it to use a custom path
so that the cache would end up in a directory which Android considers
to be a cache directory. But nowadays the directory which Dolphin's
C++ code considers to be the cache directory is such a directory,
so there's no longer any reason to override the default path.
this prevented some devices from being recreated correctly, as they were exclusive (e.g. DInput Joysticks)
This is achieved by calling Settings::ReleaseDevices(), which releases all the UI devices shared ptrs.
If we are the host (Qt) thread, DevicesChanged() is now called in line, to avoid devices being hanged onto by the UI.
For this, I had to add a method to check whether we are the Host Thread to Qt.
Avoid calling ControllerInterface::RefreshDevices() from the CPU thread if the emulation is running
and we manually refresh devices from Qt, as that is not necessary anymore.
Refactored the way IOWindow lists devices to make it clearer and hold onto disconnected devices.
There were so many issues with the previous code:
-Devices changes would not be reflected until the window was re-opened
-If there was no default device, it would fail to select the device at index 0
-It could have crashed if we had 0 devices
-The default device was not highlighted as such
This helps us keeping the most important devices (e.g. Mouse and Keyboard) on the top
of the list of devices (they still are on all OSes supported by dolphin
and to make hotplug devices like DSU appear at the bottom.
-Fix Add/Remove/Refresh device safety, devices could be added and removed at the same time, causing missing or duplicated devices (rare but possible)
-Fix other devices population race conditions in ControllerInterface
-Avoid re-creating all devices when dolphin is being shut down
-Avoid re-creating devices when the render window handle has changed (just the relevantr ones now)
-Avoid sending Devices Changed events if devices haven't actually changed
-Made most devices populations will be made async, to increase performance and avoid hanging the host or CPU thread on manual devices refresh
A "devices changed" callback could have ended up waiting on another thread that was also populating devices
and waiting on the previous thread to release the callbacks mutex.
Running the min/max operation on the upside down, quad-rounded pixel
coordinates before inverting them to the standard upper-left origin
produces wrong results. Therefore, we need to do the inversion before
rounding to pixel quads.
Fragment coordinates always have a 0.5 offset from a whole integer, as
that's where the pixel center is on modern GPUs. Therefore, we want to
always round the fragment coordinates down for bounding box
calculations. This also renders the pixel center offset useless, as 0.5
vs ~0.5833333 makes no difference when rounding down.
The SDK seems to write "default" bounding box values before every draw
(1023 0 1023 0 are the only values encountered so far, which happen to
be the extents allowed by the BP registers) to reset the registers for
comparison in the pixel engine, and presumably to detect whether GX has
updated the registers with real values. Handling these writes and
returning them on read when bounding box emulation is disabled or
unsupported, even without computing real values from rendering, seems
to prevent games from corrupting memory or crashing.
This obviously does not fix any effects that rely on bounding box
emulation, but having the game not clobber its own code/data or just
outright crash is a definite improvement.
-Reworked thread waits to never hang the Host thread for more than a really small time
(e.g. when disabling DSU its thread now closes almost immediately)
-Improve robustness when a large amount of devices are connected
-Add devices disconnection detection (they'd stay there forever until manually refreshed)
We also need to ensure the the CPU does not receive stale values
which have been updated by the GPU. Apparently the buffer here
is not coherent on NVIDIA drivers. Not sure if this is a driver
bug/spec violation or not, one would think that
glGetBufferSubData() would invalidate any caches as needed, but
this path is only used on NVIDIA anyway, so it's fine. A point
to note is that according to ARB_debug_report, it's moved from
video to host memory, which would explain why it needs the
cache invalidate.
This fixes rendering issues in Viewtiful Joe (https://bugs.dolphin-emu.org/issues/12525), but it is not entirely hardware accurate, as hardware testing showed other, more complex behavior in this case. However, it should be good enough for our purposes.
Looks like the option was added to the Wx UI at commit 198d3b69, which
was a few months after the advancedWidget was originally ported from
Wx to Qt, but before anyone was actually using Qt.
When determinism is enabled, we either want all CPUs to use FMA or
we want no CPUs to use FMA. Until now, Jit64 has been been doing
the latter. However, this is inaccurate behavior, all CPUs since
Haswell support FMA, and getting JitArm64 to match the exact
inaccurate rounding used by Jit64 would be a bit annoying. This
commit switches us over to using FMA on all CPUs when determinism
is enabled, with older CPUs calling the std::fma function.
MAX_XFB_WIDTH/HEIGHT are the largest XFB sizes seen in practice, but do not make sense to use for the backbuffer size, which should be the size of the window. The old code created screenshots with a size of 720x540 on NTSC games when "Dump Frames at Internal Resolution" is unchecked; now, the window size is used.
Adds a new PlatformID for universal builds. This will allow single architecture
builds to be updated through the single architecture path, and universal builds
to be updated with universal builds.
-add a way to reset their value (from the mappings UI)
-fix "memory leak" where they would never be cleaned,
one would be created every time you wrote a character after a "$"
-fix ability to create variables with an empty string by just writing "$" (+added error for it)
-Add $ operator to the UI operators list, to expose this functionality even more
This fixes a nasty issue where you can change the Dual Core setting
during emulation, if it has been overridden by GameINI or NetPlay, by
simply changing any of the non-disabled settings. This is because
changing any of the settings will write all of them to the config.
This issue is particularly nasty because managing to disable Dual Core
during emulation, and then stopping it, results in the emulator core
being totally deadlocked. It's impossible to recover from this state,
and Dolphin will remain as a zombie process on the system, consuming
resources and holding locks, until forcibly killed.