These optimizations were already present, but only when d == a. They
also make sense when this condition does not hold.
- imm == 0
Before:
41 BB 00 00 00 00 mov r11d,0
45 2B DF sub r11d,r15d
After:
45 8B DF mov r11d,r15d
41 F7 DB neg r11d
- imm == -1
Before:
41 BD FF FF FF FF mov r13d,0FFFFFFFFh
44 2B EE sub r13d,esi
0F 93 45 68 setae byte ptr [rbp+68h]
After:
44 8B EE mov r13d,esi
41 F7 D5 not r13d
C6 45 68 01 mov byte ptr [rbp+68h],1
Without this, the code added in ac28b89 misbehaves and considers
AArch64 netplay clients to not have hardware FMA support, telling
all clients to disable FMA support, which causes a desync between
x64 and AArch64 due to JitArm64 not being able to disable FMA support.
Fixes a regression from 5.0-12066, where setting the GFXBackend variable
to one other than the current global backend would crash Dolphin upon
launching the game.
fcmpX only updates the FPCC bits, not the C bit.
This was already correctly implemented in the interpreter.
Not known to affect any games, but affects a hardware test.
This fixes bounding box shaders failing to compile under Vulkan, due to
differences between GLSL and HLSL in the return value of vector
comparisons and what types these functions accept. I included all() for
the sake of completeness.
At higher resolutions, our bounding box dimensions end up being
slightly larger than original hardware in some cases. This is not
necessarily wrong, it's just an artifact of rendering at a higher
resolution, due to bringing out detail that wouldn't have appeared on
original hardware. It causes a texel to fall partially on what would
have been a single pixel at native resolution, resulting in the
coordinates getting bumped up to the next valid value. In many cases,
these slightly larger bounding boxes are perfectly fine, as games don't
hard-code expected dimensions. It is problematic in Paper Mario TTYD
though, for a somewhat complicated reason.
Paper Mario TTYD frequently uses EFB copies to pre-render a bunch of
animation frames for a character sprite (especially in Chapter 2), so
that it can then render 100 or more of them without bringing the
GameCube to its knees. Based on my observation, the game seems to set
aside a region of memory to store these EFB copies. This region is
obviously fairly small, as the GameCube only has 24MB of RAM. There are
2 rooms in Chapter 2 where you fight a horde of as many as 100 Jabbies,
which are also rendered using EFB copies, so in this room the game ends
up making 130(!) EFB copies just for Puni and Jabbi sprites. This seems
to nearly fill the region of memory it set aside for them.
Unfortunately, our slightly larger bounding boxes at higher resolutions
results in overflowing this memory, causing very strange behavior. Some
EFB copies partially overlap game state, resulting in reading it as a
garbage RGB5A3 texture that constantly changes. Others apparently
somehow trigger a corner case in our persistent buffer mapping, causing
them to partially overwrite earlier EFB copies.
What this change does is only include the screen coordinates that align
with the equivalent native resolution pixel centers, which generally
results in the bounding boxes being more in line with original
hardware. It isn't perfect, but it's enough to fix Paper Mario TTYD's
Jabbi rooms by avoiding the buffer overflow. Notably, it is more
accurate at odd resolutions than at even resolutions. Native resolution
is completely unaffected by this change, as should be the case. This
change may also have a small positive impact on shader performance at
higher resolutions, as there will be less atomic operations performed.
Not doing this can cause desyncs when TASing. (I don't know
how common such desyncs would be, though. For games that
don't change rounding modes, they shouldn't be a problem.)
When I added the software FMA path in 2c38d64 and made us use
it when determinism is enabled, I was assuming that either the
performance impact of software FMA wouldn't be too large or CPUs
that were too old to have FMA instructions were too slow to run
Dolphin well anyway. This was wrong. To give an example, the
netplay performance went from 60 FPS to 30 FPS in one case.
This change makes netplay clients negotiate whether FMA should
be used. If all clients use an x64 CPU that supports FMA, or
AArch64, then FMA is enabled, and otherwise FMA is disabled.
In other words, we sacrifice accuracy if needed to avoid massive
slowdown, but not otherwise. When not using netplay, whether to
enable FMA is simply based on whether the host CPU supports it.
The only remaining case where the software FMA path gets used
under normal circumstances is when an input recording is created
on a CPU with FMA support and then played back on a CPU without.
This is not an especially common scenario (though it can happen),
and TASers are generally less picky about performance and more
picky about accuracy than other users anyway.
With this change, FMA desyncs are avoided between AArch64 and
modern x64 CPUs (unlike before 2c38d64), but we do get FMA
desyncs between AArch64 and old x64 CPUs (like before 2c38d64).
This desync can be avoided by adding a non-FMA path to JitArm64 as
an option, which I will wait with for another pull request so that
we can get the performance regression fixed as quickly as possible.
https://bugs.dolphin-emu.org/issues/12542
Back when I wrote this code, I believe I set it to use a custom path
so that the cache would end up in a directory which Android considers
to be a cache directory. But nowadays the directory which Dolphin's
C++ code considers to be the cache directory is such a directory,
so there's no longer any reason to override the default path.
progressMessage can have the invalid value of 0. That
progressMessage was being used for the thread name was
a typo anyway – it's supposed to use progressTitle.
this prevented some devices from being recreated correctly, as they were exclusive (e.g. DInput Joysticks)
This is achieved by calling Settings::ReleaseDevices(), which releases all the UI devices shared ptrs.
If we are the host (Qt) thread, DevicesChanged() is now called in line, to avoid devices being hanged onto by the UI.
For this, I had to add a method to check whether we are the Host Thread to Qt.
Avoid calling ControllerInterface::RefreshDevices() from the CPU thread if the emulation is running
and we manually refresh devices from Qt, as that is not necessary anymore.
Refactored the way IOWindow lists devices to make it clearer and hold onto disconnected devices.
There were so many issues with the previous code:
-Devices changes would not be reflected until the window was re-opened
-If there was no default device, it would fail to select the device at index 0
-It could have crashed if we had 0 devices
-The default device was not highlighted as such
This helps us keeping the most important devices (e.g. Mouse and Keyboard) on the top
of the list of devices (they still are on all OSes supported by dolphin
and to make hotplug devices like DSU appear at the bottom.
-Fix Add/Remove/Refresh device safety, devices could be added and removed at the same time, causing missing or duplicated devices (rare but possible)
-Fix other devices population race conditions in ControllerInterface
-Avoid re-creating all devices when dolphin is being shut down
-Avoid re-creating devices when the render window handle has changed (just the relevantr ones now)
-Avoid sending Devices Changed events if devices haven't actually changed
-Made most devices populations will be made async, to increase performance and avoid hanging the host or CPU thread on manual devices refresh
A "devices changed" callback could have ended up waiting on another thread that was also populating devices
and waiting on the previous thread to release the callbacks mutex.
Running the min/max operation on the upside down, quad-rounded pixel
coordinates before inverting them to the standard upper-left origin
produces wrong results. Therefore, we need to do the inversion before
rounding to pixel quads.
Fragment coordinates always have a 0.5 offset from a whole integer, as
that's where the pixel center is on modern GPUs. Therefore, we want to
always round the fragment coordinates down for bounding box
calculations. This also renders the pixel center offset useless, as 0.5
vs ~0.5833333 makes no difference when rounding down.
Add external include paths to ExternalIncludePath instead of
AdditionalIncludeDirectories. msbuild appends these paths to
EXTERNAL_INCLUDE env var, which is passed to /external:env:.
Specify /external:W0 and /external:templates-, with override for
DolphinQt for the template flag, since Qt 5.15.0 causes some warnings
in qmap.h
The SDK seems to write "default" bounding box values before every draw
(1023 0 1023 0 are the only values encountered so far, which happen to
be the extents allowed by the BP registers) to reset the registers for
comparison in the pixel engine, and presumably to detect whether GX has
updated the registers with real values. Handling these writes and
returning them on read when bounding box emulation is disabled or
unsupported, even without computing real values from rendering, seems
to prevent games from corrupting memory or crashing.
This obviously does not fix any effects that rely on bounding box
emulation, but having the game not clobber its own code/data or just
outright crash is a definite improvement.
-Reworked thread waits to never hang the Host thread for more than a really small time
(e.g. when disabling DSU its thread now closes almost immediately)
-Improve robustness when a large amount of devices are connected
-Add devices disconnection detection (they'd stay there forever until manually refreshed)
We also need to ensure the the CPU does not receive stale values
which have been updated by the GPU. Apparently the buffer here
is not coherent on NVIDIA drivers. Not sure if this is a driver
bug/spec violation or not, one would think that
glGetBufferSubData() would invalidate any caches as needed, but
this path is only used on NVIDIA anyway, so it's fine. A point
to note is that according to ARB_debug_report, it's moved from
video to host memory, which would explain why it needs the
cache invalidate.
This fixes rendering issues in Viewtiful Joe (https://bugs.dolphin-emu.org/issues/12525), but it is not entirely hardware accurate, as hardware testing showed other, more complex behavior in this case. However, it should be good enough for our purposes.
Looks like the option was added to the Wx UI at commit 198d3b69, which
was a few months after the advancedWidget was originally ported from
Wx to Qt, but before anyone was actually using Qt.
This reduces the build time for incremental builds from about
2 minutes to about 20 seconds. Most people never run the
unit tests on Android anyway (I'm not aware of anyone other
than me ever having done it).
When determinism is enabled, we either want all CPUs to use FMA or
we want no CPUs to use FMA. Until now, Jit64 has been been doing
the latter. However, this is inaccurate behavior, all CPUs since
Haswell support FMA, and getting JitArm64 to match the exact
inaccurate rounding used by Jit64 would be a bit annoying. This
commit switches us over to using FMA on all CPUs when determinism
is enabled, with older CPUs calling the std::fma function.
Any local references get cleaned up when returning to the JVM,
but some of the functions in AndroidCommon return to C++ rather
than the JVM, and functions with loops risk running into the
limit of how many simultaneous local references are allowed.
MAX_XFB_WIDTH/HEIGHT are the largest XFB sizes seen in practice, but do not make sense to use for the backbuffer size, which should be the size of the window. The old code created screenshots with a size of 720x540 on NTSC games when "Dump Frames at Internal Resolution" is unchecked; now, the window size is used.
Adds a new PlatformID for universal builds. This will allow single architecture
builds to be updated through the single architecture path, and universal builds
to be updated with universal builds.
-add a way to reset their value (from the mappings UI)
-fix "memory leak" where they would never be cleaned,
one would be created every time you wrote a character after a "$"
-fix ability to create variables with an empty string by just writing "$" (+added error for it)
-Add $ operator to the UI operators list, to expose this functionality even more
This fixes a nasty issue where you can change the Dual Core setting
during emulation, if it has been overridden by GameINI or NetPlay, by
simply changing any of the non-disabled settings. This is because
changing any of the settings will write all of them to the config.
This issue is particularly nasty because managing to disable Dual Core
during emulation, and then stopping it, results in the emulator core
being totally deadlocked. It's impossible to recover from this state,
and Dolphin will remain as a zombie process on the system, consuming
resources and holding locks, until forcibly killed.
Added RAII wrapper around the the JITPageWriteEnableExecuteDisable() and
JITPageWriteDisableExecuteEnable() to make it so that it is harder to forget to
pair the calls in all code branches as suggested by leoetlino.
Removed the unavailable CPU core dialog box that asked users to change their
selected CPU core to one that is available. Instead, Dolphin now just overrides
the core to the default, and logs that it performed the override.
In MacOS 11.2 mprotect can no longer change the access protection settings of
pages that were previously marked as executable to anything but PROT_NONE. This
commit works around this new restriction by bypassing the mprotect based write
protection and instead relying on the write protection provided by MAP_JIT.
Analytics:
- Incorporated fix to allow the full set of analytics that was recommended by
spotlightishere
BuildMacOSUniversalBinary:
- The x86_64 slice for a universal binary is now built for 10.12
- The universal binary build script now can be configured though command line
options instead of modifying the script itself.
- os.system calls were replaced with equivalent subprocess calls
- Formatting was reworked to be more PEP 8 compliant
- The script was refactored to make it more modular
- The com.apple.security.cs.disable-library-validation entitlement was removed
Memory Management:
- Changed the JITPageWrite*Execute*() functions to incorporate support for
nesting
Other:
- Fixed several small lint errors
- Fixed doc and formatting mistakes
- Several small refactors to make things clearer
This commit adds support for compiling Dolphin for ARM on MacOS so that it can
run natively on the M1 processors without running through Rosseta2 emulation
providing a 30-50% performance speedup and less hitches from Rosseta2.
It consists of several key changes:
- Adding support for W^X allocation(MAP_JIT) for the ARM JIT
- Adding the machine context and config info to identify the M1 processor
- Additions to the build system and docs to support building universal binaries
- Adding code signing entitlements to access the MAP_JIT functionality
- Updating the MoltenVK libvulkan.dylib to a newer version with M1 support
Shouldn't have any behaviour change for regular usage as both masks are 32MB
by default.
But fixes theoretical buffer overrun when memory size override is used.
The interpreter implementation of fctiwx was treating rounding
mode 0 as "round to nearest, ties towards zero", which is not
an actual IEEE-754 rounding mode. The IBM document mentioned
in a comment at the top of the function, on the other hand,
treats rounding mode 0 as "round to nearest, ties to even",
which makes more sense.
This fixes one of JMC's console-recorded F-Zero GX replays on
JitArm64. (JitArm64 uses an interpreter fallback for fctiwx.)
The GC/Wii GPU rasterizes in 2x2 pixel groups, so bounding box values
will be rounded to the extents of these groups, rather than the exact
pixel. To account for this, we'll round the top/left down to even and
the bottom/right up to odd. I have verified that the values resulting
from this change exactly match a real Wii.
Any file which includes scmrev.h must be rebuilt when scmrev.h
is regenerated. By not including scmrev.h from any file other
than Version.cpp, incremental builds become a little faster.
When trying to do a small optimization in 8a0f5ea, I failed to
take into account that WeakFlush and FlushOne update m_query_count.
Only D3D11 and OGL had this problem, not D3D12 and Vulkan.
I was going to rename this to getSysDirectory to make it clearer
what the returned path actually is, but it turns out we're not
actually using this for anything anymore.
The Java implementation of getting the list of post-processing
shaders only looked in the Sys folder and not the User folder.
This could be fixed in the Java implementation, but it's
simpler to just call the C++ implementation instead.
Sorry, the fix I made to the empty string in a29660a was not
actually sufficient, as DolphinQt will call tr on the string
regardless of whether it's marked with _trans. The proper fix
is to use nullptr, which DolphinQt has a special check for.
Sending an empty string to the translation system will not
result in getting an empty string back, but rather a description
of the currently loaded translations file. So empty strings
should not be marked as translatable.
Also adding some i18n comments and rewording a string I thought
was hard to understand.
Center @2x and android banners, and adjust @1x banner to improve
consistency with other resolutions while maintaining sharpness.
Images created by MayImilae
casting a value to a u32 when it's originally an int, and it's exposed as int to users,
could end up in cases where a negative number would result as a positive one.
This doesn't really affect the value range of the attachment enum,
still I think the code was wrong.
Heavily tested.
If the path $(QtIncludeDir) has a space in it Visual Studio interprets
the first part as the full path and chokes on the second part. Quote the
path to fix the problem.
Settings.SECTION_INI_ANDROID and Settings.SECTION_BINDINGS
both have the value "Android", but we only want the former
to be marked as being handled by the new config system.
This change fixes a problem where controller settings were
not being properly saved to Dolphin.ini.
The STL has everything we need nowadays.
I have tried to not alter any behavior or semantics with this
change wherever possible. In particular, WriteLow and WriteHigh
in CommandProcessor retain the ability to accidentally undo
another thread's write to the upper half or lower half
respectively. If that should be fixed, it should be done in a
separate commit for clarity. One thing did change: The places
where we were using += on a volatile variable (not an atomic
operation) are now using fetch_add (actually an atomic operation).
Tested with single core and dual core on x86-64 and AArch64.
NumericSettings support a max, so let's use it.
It might not do much now, but the max and min values will be used to give visual feeback
in the UI in one of my upcoming input PRs
The control expression editor allows line breaks, but the serialization was
losing anything after the first line break (/r /n).
Instead of opting to encode them and decode them on serialization
(which I tried but was not safe, as it would lose /n written in the string by users),
I opted to replace them with a space.
and replacing it with a ":" prefix. Also remove white spaces and \n \t \r.
bugfix: fix EmulatedController::GetStateLock() not being aquired when reading the
expression reference
bugfix: MappingButton::UpdateIndicator() calling State(0) on outputs, breaking ongoing
rumbles if a game was running
Improvement: make expressions previews appear in Italic if they failed to parse correctly
Previously we set the texture coordinate to zero, now we set
the texture coordinate *index* to zero. This fixes the ripple
effect of the Mario painting in Luigi's Mansion.
This change should have no behavioral differences itself, but allows for changing the behavior of out of bounds tex coord indices more easily in the next commit. Without this change, returning tex0 for out of bounds cases and then applying the fixed-point logic would use the wrong tex dimension info (tex0 with I_TEXDIMS[1] or such), which is inaccurate.
Previously we set the texture coordinate to zero, now we set
the texture coordinate *index* to zero. This fixes the ripple
effect of the Mario painting in Luigi's Mansion.
Previously we set the texture coordinate to zero, now we set
the texture coordinate *index* to zero. This fixes the ripple
effect of the Mario painting in Luigi's Mansion.
Co-authored-by: Pokechu22 <Pokechu022@gmail.com>
Since the description updating is tied to the selection changing on the detail list, and the detail list is recreated on each object change, behavior was somewhat broken. Clearing the list changed the current row to zero, but nothing else (particularly m_object_data_offsets) had been updated, so the description was not necessarily correct (this is easier to observe now since the vertex data is at the end, so it's easier to get different lengths of register updates). Furthermore, subsequent clears did not update the current row since there was no visible selection, so it only changed the description once. The current row is now always set to zero, which forces an update (and also scrolls the list back to the top). The presence of FRAME_ROLE and OBJECT_ROLE are also checked so that the description is cleared if no object is selected.
- Only one search result is generated per command/line, even if there are multiple matches in that line.
- Pressing enter on the edit field begins a search, just like clicking the begin button.
- The next and previous buttons are disabled until a search is begun.
- The search results are cleared when changing objects or frames.
- The previous button once again works (a regression from the previous commit), and the register updates and graphics data for the correct object are searched.
- currentRow() never returns -1, so checking that is unnecessary (and misleading).
- The 'Invalid search parameters (no object selected)' previously never showed up before because FRAME_ROLE is present if and only if OBJECT_ROLE is present.
This way, it can be focused with the render window behind it, instead of having the main window show up and cover the render window. This is useful for adjusting the object range, among other things.
If the number of objects varied, this would result in either missing objects on some frames, or too many objects on some frames; the latter case could cause crashes. Since it used the current frame to get the count, if the FIFO is started before the FIFO analyzer is opened, then the current frame is effectively random, making it hard to reproduce consistently.
This issue has existed since the FIFO analyzer was implemented for Qt.
The 'zero frames in the range' check can be removed because now there is always at least 1 frame; of course that might be the same frame over and over again, but that's still useful for e.g. Free Look (and the 1 frame repeating effect already occurred when frame count was exclusive).
A single object can be selected instead of 2 (it was already inclusive internally), and the maximum value is the highest number of objects in any frame (minus 1) to reduce jank when multiple frames are being played back.
Now that this is only called when playback actually starts (and not on unpausing), this change makes the experience a bit better (no more missing objects from not having reset the from object after changing FIFOs).
It is no longer relevant for the current set of loaders after 7030542546. If it becomes relevant again, a static function named IsUsable or IsCompatibleWithCurrentMachine or something would be a better approach.
In 5a1a642, I explicitly set android:debuggable="true" for Dolphin.
(By default, it's set for debug builds but not release builds.)
The reason I made the change is because debuggable must be set to
true in order to allow adb backup to be used with apps which target
Android 12. We have no reason to want to stop users from debugging
Dolphin and certainly no reason to stop users from using adb backup
(especially not after forced storage is going to force us to store
the User folder in app-specific external storage!), but,
it turns out that setting debuggable to true is forbidden by
Google Play "for security reasons". Go figure. The beta build
which was tagged earlier today was rejected because of this.
By taking advantage of three-operand IMUL, we can eliminate a MOV
instruction. This is a small code size win. However, due to IMUL sign
extending the immediate value to 64 bits, we can only apply this when
the magic number's most significant bit is zero.
To ensure this can actually happen, we also minimize the magic number by
checking for trailing zeroes.
Example (Unsigned division by 18)
Before:
41 BE E4 38 8E E3 mov r14d,0E38E38E4h
4D 0F AF F5 imul r14,r13
49 C1 EE 24 shr r14,24h
After:
4D 69 F5 39 8E E3 38 imul r14,r13,38E38E39h
49 C1 EE 22 shr r14,22h
This isn't entirely necessary, as they are interpreted as barewords expressions,
but it's still nicer to have by default. And my upcoming input changes will
always put `` around single letter inputs.
-Add pause state to FPSCounter.
-Add ability to have more than one "OnStateChanged" callback in core.
-Add GetActualEmulationSpeed() to Core. Returns 1 by default. It's used by my input PRs.
The SaveToSYSCONF call in BootManager.cpp was unintentionally
overriding the temporary NAND set by the preceding
InitializeWiiRoot call. Fixes
https://bugs.dolphin-emu.org/issues/12500.
Verifying a Wii game creates an instance of IOS, and Dolphin
can't handle more than one instance of IOS at the same time.
Properly supporting it is probably more effort than it's worth.
Fixes https://bugs.dolphin-emu.org/issues/12494.
Avoids the need to copy the *.mo files manually *and* more importantly
this ensures that the mo files are always recreated if the build
output directory is cleared.
Update references was failing to update the references, causing input to stay nullptr and crashing.
I fixed the case that triggered that, though also added checks against nullptrs for safety.
(cherry picked from commit 4bdcf707555a5568eddff957fa3604975ffb6ed7)
I think the AArch64 JIT has come far enough that it doesn't have to
be called experimental anymore.
I'm also labeling the x86-64 JIT as x86-64 for consistence with the
AArch64 JIT. This will especially be helpful if we start supporting
AArch64 on macOS, as AArch64 macOS can run both the x86-64 JIT and
the AArch64 JIT depending on whether you enable Rosetta 2.
I haven't observed this breaking any game, but it didn't match
the behavior of the interpreter as far as I could tell from
reading the code, in that denormals weren't being flushed.
If we can prove that FCVT will provide a correct conversion,
we can use FCVT. This makes the common case a bit faster
and the less likely cases (unfortunately including zero,
which FCVT actually can convert correctly) a bit slower.
Preparation for following commits.
This commit intentionally doesn't touch paired stores,
since paired stores are supposed to flush to zero.
(Consistent with Jit64.)
This simplifies some of the following commits. It does require
an extra register, but hey, we have 32 of them.
Something I think would be nice to add to the register cache
in the future is the ability to keep both the single and double
version of a guest register in two different host registers
when that is useful. That way, the extra register we write to
here can be read by a later instruction, saving us from
having to perform the same conversion again.
Fixes https://bugs.dolphin-emu.org/issues/12388. Might also fix
other games that have problems with float/paired instructions
in JitArm64, but I haven't tested any.
-They might have never drawn if DrawMessages wasn't called before they actually expired
-Their fade was wrong if the duration of the message was less than the fade time
This makes them much more useful for debugging, I know there might be other means
of debugging like logs and imgui, but this was the simplest so that's what I used.
If you want to print the same message every frame, but with a slightly different value
to see the changes, it now work.
To compensate for the fact that they are now always rendered once,
so on start up a lot of old messages (printed while the emulation was off) could show up,
I've added a "drop" time, which means if a msg isn't rendered for the first
time within that time, it will be dropped and never rendered.
When the interpreter writes to a discarded register, its type
must be changed so that it is no longer considered discarded.
Fixes a 62ce1c7 regression.
We normally check for division by zero to know if we should set the
destination register to zero with a XOR. However, when the divisor and
destination registers are the same the explicit zeroing can be omitted.
In addition, some of the surrounding branching can be simplified as
well.
Before:
45 85 FF test r15d,r15d
75 05 jne normal_path
45 33 FF xor r15d,r15d
EB 0C jmp done
normal_path:
B8 5A 00 00 00 mov eax,5Ah
99 cdq
41 F7 FF idiv eax,r15d
44 8B F8 mov r15d,eax
done:
After:
45 85 FF test r15d,r15d
74 0C je done
B8 5A 00 00 00 mov eax,5Ah
99 cdq
41 F7 FF idiv eax,r15d
44 8B F8 mov r15d,eax
done:
Division by a power of two can be slightly improved when the
destination and dividend registers are the same.
Before:
8B C6 mov eax,esi
85 C0 test eax,eax
8D 70 03 lea esi,[rax+3]
0F 49 F0 cmovns esi,eax
C1 FE 02 sar esi,2
After:
85 F6 test esi,esi
8D 46 03 lea eax,[rsi+3]
0F 48 F0 cmovs esi,eax
C1 FE 02 sar esi,2
Repeated erase() + iteration on a std::multimap is extremely slow.
Slow enough that it causes a 7 second long stutter during some
transitions in F-Zero X (a N64 VC game that triggers many, many icache
invalidations).
And slow enough that JitBaseBlockCache::DestroyBlock shows up on a
flame graph as taking >50% of total CPU time on the CPU-GPU thread:
https://i.imgur.com/vvqiFL6.png
This commit optimises those block link queries by replacing the
std::multimap (which is typically implemented with red-black trees)
with hash tables.
Master: https://i.imgur.com/vvqiFL6.png / 7s stutters
(starting from 5.0-2021 and with branch following disabled)
This commit: https://i.imgur.com/hAO74fy.png / ~0.7s stutters, which
is pretty close to 5.0 stable. (5.0-2021 introduced the performance
regression and it is especially noticeable when branch following
is disabled, which is the case for all N64 VC games since 5.0-8377.)
VideoCommon: Change the type of BPMemory.scissorOffset to 10bit signed: S32X10Y10
VideoBackends: Fix Software Clipper.PerspectiveDivide function, use BPMemory.scissorOffset instead of hard code 342
Oversight from #9545, which moved the "new game has been loaded" logic
to a separate OnNewTitleLoad function that has to be called explicitly
*after* a title has loaded.
Coupled with the commit that makes Dolphin not clobber 0x1800-0x3000
when using MIOS, this fixes Wind Waker and other MIOS-patched games
when they are launched from the System Menu.
MIOS puts patch data in low MEM1 (0x1800-0x3000) for its own use.
Overwriting data in this range can cause the IPL to crash when
launching games that get patched by MIOS.
See https://bugs.dolphin-emu.org/issues/11952 for more info.
Not applying the Gecko HLE patches means that Gecko codes will not work
under MIOS, but this is better than the alternative of having specific
games crash.
This particular range is kind of bizarre, and would only interpret
interleave mode 2 as a valid mode, while rejecting interleave mode 1 and
the extension byte mode.
As far as I know, based off the information on Wiibrew, we should be
considering all three values within this range as valid.
texture serialization and deserialization used to involve many memory
allocations and deallocations, along with many copies to and from
those allocations. avoid those by reserving a memory region inside the
output and writing there directly, skipping the allocation and copy to
an intermediate buffer entirely.
We don't actually need to do this until we bump targetSdkVersion
to Android 12 (which we can't do yet since we're not compatible
with scoped storage), but I figured I'd get it out of the way early.
Not tested on Android 12, but tested to not break stuff on Android 10.
This adds a CMake option (DOLPHIN_DEFAULT_UPDATE_TRACK) to allow
configuring SCM_UPDATE_TRACK_STR. This is needed to enable auto-updates
in Windows CMake builds by default.
fmt/format.h is included in the PCH, so we need to make sure fmt is
actually in the include path.
Not sure how Visual Studio + CMake manages to build without this.
This adds a function to get the emulated or real Bluetooth device for
an active emulation instance. This lets us deduplicate all the
`ios->GetDeviceByName("/dev/usb/oh1/57e/305")` calls that are currently
scattered in the codebase and ensures Bluetooth passthrough is being
handled correctly.
This also fixes the broken check in WiimoteCommon::UpdateSource.
There was a confusion between "emulated Bluetooth" (as opposed to
"real Bluetooth" aka Bluetooth passthrough) and "emulated Wiimote".
[conv.fpint]/1:
> A prvalue of a floating-point type can be converted to a prvalue of
> an integer type. The conversion truncates; that is, the fractional
> part is discarded. The behavior is undefined if the truncated value
> cannot be represented in the destination type.
Specifically, 'Scooby-Doo! Mystery Mayhem', 'Scooby-Doo! Unmasked', 'Ed, Edd n Eddy: The Mis-Edventures', and the Wii version of 'Happy Feet'.
The JIT cache causes problems with emulated icache invalidation in these games, resulting in areas failing to load.
This avoids some warnings, which were originally fixed by ignoring loads with a value of zero (see 636bedb207 / #3242).
Note that FifoCI will report some changes, but only on the first frame; these seem to be timing related as they don't happen if a different write is used to replace skipped ones.
They appear to relate to perf queries, and combining them with truely unknown commands would probably hide useful information. Furthermore, 0x20 is issued by every title, so without this every title would be recorded as using an unknown command, which is very unhelpful.
The swaps are confusing and don't accomplish much.
It was originally written like this:
u32 pte = bswap(*(u32*)&base_mem[pteg_addr]);
then bswap was changed to Common::swap32, and then the array access
was replaced with Memory::Read_U32, leading to the useless swaps.
While 6xx_pem.pdf §7.6.1.1 mentions that the number of trailing
zeros in HTABORG must be equal to the number of trailing ones
in the mask (i.e. HTABORG must be properly aligned), this is actually
not a hard requirement. Real hardware will just OR the base address
anyway. Ignoring SDR changes would lead to incorrect emulation.
Logging a warning instead of dropping the SDR update silently is a
saner behaviour.
debaf63fe8 moved the "Sonic epsilon hack"
to vertex shaders. However, it was only done for targets with depth
clamping. If this is not available, for example the target is OpenGL ES,
the Sonic problem appears (https://bugs.dolphin-emu.org/issues/11897).
A version of the "Sonic epsilon hack" is added for targets without
depth clamping.
This changes FileSystemProxy::Open to return a file descriptor wrapper
that will ensure the FD is closed when it goes out of scope.
By using such a wrapper we make it more difficult to forget to close
file descriptors.
This fixes a leak in ReadBootContent. I should have added such a class
from the beginning... In practice, I don't think this would have caused
any obvious issue because ReadBootContent is only called after an IOS
relaunch -- which clears all FDs -- and most titles do not get close
to the FD limit.
JitArm64::DoJit contains a check where it prints a warning and tries
to pause emulation if instructed to compile code at address 0. I'm
assuming this was done in order to provide a nicer error behavior
in cases where PC was accidentally set to null. Unfortunately, it
has started causing us problems recently, as 688bd61 writes and runs
some code at address 0 to simulate the PPC being held in reset.
What makes this worse is that calling Core::SetState from the CPU
thread is actually not allowed and will cause a deadlock instead of
the intended behavior. I don't believe there is anything on a real
console that would stop you from executing code at address 0 (as
long as the MMU has been set up to allow it), and Jit64::DoJit
doesn't contain any check like this, so let's remove the check.
Many Android users want to disable SyncOnSkipIdle as a performance
hack, to the point where it's often suggested as something to
paste into Dolphin.ini (if not to use a fork). If adding it as
a setting in the GUI gives us an opportunity to explain what the
setting actually does and stops people from pasting stuff they
don't understand into INI files, I think it can be worth adding
despite how it can make games unstable. It not being in the GUI
doesn't seem to be stopping people from disabling it anyway.
The added setting in the GUI is a three-way setting called
"Synchronize GPU Thread" with the following alternatives:
"Never": SyncGPU = False, SyncOnIdleSkip = False
"On Idle Skipping": SyncGPU = False, SyncOnIdleSkip = True
"Always": SyncGPU = True, SyncOnIdleSkip = True
See PR 8203 for background on the game INI deletion prompt.
It's been almost two years since PR 8203 was merged, so you
would think that people are no longer creating game INIs that
contain a copy of every global setting, right? Unfortunately,
MMJ was forked not too long before that and never backported the
change, so right now there's a not insignificant number of people
online posting game INIs full of this garbage for others to use.
One thing that's been missing from the game INI deletion prompt
is a description of what the problem with having tons of extra
lines in a game INI actually is. This change adds that, in the
the hope that it will make people ignore the warning less often.
This option does in fact not enable and disable logging as a whole.
You can get logs through logcat regardless of this setting.
Also taking the opportunity to remove the reference to
the "dolphin-emu" folder name since we will no longer be
using that folder once scoped storage is applied to Dolphin.
This commit adds a new "discarded" state for registers.
Discarding a register is like flushing it, but without
actually writing its value back to memory. We can discard
a register only when it is guaranteed that no instruction
will read from the register before it is next written to.
Discarding reduces the register pressure a little, and can
also let us skip a few flushes on interpreter fallbacks.
The output of instructions like fabsx and ps_sel is store-safe
if and only if the relevant inputs are. The old code was always
marking the output as store-safe if the output was a single,
and never otherwise.
Also, the old code was treating the output of psq_l/psq_lu as
store-safe, which seems incorrect (if dequantization is disabled).
This improves the speed of verifying Wii WIA/RVZ files.
For me, the verification speed for LZMA2-compressed files
has gone from 11-12 MiB/s to 13-14 MiB/s.
One thing VolumeVerifier does to achieve parallelism is to
compute hashes for one chunk of data while reading the next
chunk of data. In master, when reading data from a Wii
partition, each such chunk is 32 KiB. This is normally fine,
but with WIA and RVZ it leads to rather lopsided read times
(without the compute times being lopsided): The first 32 KiB
of each 2 MiB takes a long time to read, and the remaining
part of the 2 MiB can be read nearly instantly. (The WIA/RVZ
code has to read the entire 2 MiB in order to compute hashes
which appear at the beginning of the 2 MiB, and then caches
the result afterwards.) This leads to us at times not doing
much reading and at other times not doing much computation.
To improve this, this change makes us use 2 MiB chunks
instead of 32 KiB chunks when reading from Wii partitions.
(block = 32 KiB, group = 2 MiB)
This can't actually happen in practice due to how WAD files work,
but it's very easy to add support for thanks to the last commit,
so we might as well add support for it.
The performance gains of doing this aren't too important since you
normally wouldn't run into any disc image that has overlapping blocks
(which by extension means overlapping partitions), but this change also
lets us get rid of things like VolumeVerifier's mutex that used to
exist just for the sake of handling overlapping blocks.
Panic alerts in DiscIO can potentially be very annoying since
large amounts of them can pop up when loading the game list
if you have some particularly weird files in your game list.
This was a much bigger problem back in 5.0 with its
"Tried to decrypt data from a non-Wii volume" panic alert, but
I figured I would take it all the way and remove the remaining
panic alerts that can show up when loading the game list.
I have exempted uses of ASSERT/ASSERT_MSG since they indicate
a bug in Dolphin rather than a malformed file.
If we know at compile time that the PPC carry flag definitely
has a certain value, we can bake that value into the emitted code
and skip having to read from PPCState.
GameFileCacheService.startRescan (in MainPresenter.onResume)
does nothing if called before GameFileCacheService.startLoad.
Fixes a 3f71c36 regression where already added games would not
show up after app launch under specific circumstances.
Unfortunately the loading indicator still doesn't show up
during a rescan initiated by app launch, but that would
be more annoying to fix, so I will leave it for now.
When a save state is loaded, the IOS device serving bluetooth
is cast as BluetoothEmuDevice. If, however, a real Wiimote
with BT passthrough is used, this caused the game to crash.
Now the proper device class is used.
At a first glance it may look like a part of the code I added to
srawx in efeda3b has a bug when a == s. The code actually happens
to work correctly, but in the interest of making the code easier
to reason about, I'd like to change the way it's implemented. This
change should improve the pipelining a little in the a == s case too.
Fix Gamelist context menu item 'Open Containing Folder' opening wrong
target on Windows when game parent folder is [foobar] and grandparent
folder contains file [foobar].bat or [foobar].exe
Add trailing directory separator to parent folder path to force Windows
to interpret path as directory.
Fixes https://bugs.dolphin-emu.org/issues/12411
21c152f added a small hack to DVDInterface to keep WBFS and CISO
files working with Nintendo's "Error #001" anti-piracy check.
Unfortunately I don't think it's possible to support WBFS and
CISO without any kind of hack or heuristic, but what we can do
is replace the 21c152f hack (which applies regardless of file
format) with a hack that only is active when using WBFS or CISO.
This change is similar to 2a5a399, but the disc size is
calculated in a different way.
Add ! before unused variables to 'use' them.
Ubuntu-x64 emits warnings for unused variables because gcc decides
it should ignore the void cast around them. See thread for discussion:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66425
Loop index int i was being compared against GetControllerCount() which
returned a size_t. This was the only place GetControllerCount() was
called from so the change of return type doesn't disturb anything else.
Changing the loop index to size_t wouldn't work as well since it's
passed into GetController(), which takes an int and is called from many
places, so it would need a cast anyway on an already busy line.
...and let's optimize a divisor of 2 ever so slightly for good measure.
I wouldn't have bothered, but most GameCube games seem to hit this on
launch.
- Division by 2
Before:
41 BE 02 00 00 00 mov r14d,2
41 8B C2 mov eax,r10d
45 85 F6 test r14d,r14d
74 0D je overflow
3D 00 00 00 80 cmp eax,80000000h
75 0E jne normal_path
41 83 FE FF cmp r14d,0FFFFFFFFh
75 08 jne normal_path
overflow:
C1 F8 1F sar eax,1Fh
44 8B F0 mov r14d,eax
EB 07 jmp done
normal_path:
99 cdq
41 F7 FE idiv eax,r14d
44 8B F0 mov r14d,eax
done:
After:
45 8B F2 mov r14d,r10d
41 C1 EE 1F shr r14d,1Fh
45 03 F2 add r14d,r10d
41 D1 FE sar r14d,1