std::function is internally allowed to allocate, and these functions
aren't being stored anywhere (only called), so we can freely get rid
of some minor overhead here by passing by reference.
This change also creates aliases for the functions, so that there isn't
a lot of visual noise when reading the function signatures.
We don't need to enforce the use of std::string instances with
AddSetting(). We can accept views and only construct one string,
rather than three temporaries.
The gate size is 79.37125 by default, and the step size is 0.5. Android
throws an exception if we try to show the slider with the value set to
something that isn't divisible by the step size. To avoid this problem,
round the value.
Internal details: The large region is split into individual same-sized blocks of memory. On creation, we allocate a single block of memory that will always remain zero, and map that into the entire memory region. Then, the first time any of these blocks is written to, we swap the mapped zero block out with a newly allocated block of memory. On clear, we swap back to the zero block and deallocate the data blocks. That way we only actually allocate one zero block as well as a handful of real data blocks where the JitCache actually writes to.
Allows the Coil memory cache to use up to 90% of the application's available memory. Previously this could cause problems with reloading images in very large libraries of games.
Combined with the previous commits, this finally fixes the bug where
Dolphin had a chance of crashing if you returned to it after Android
killed the Dolphin process.
This way, we ensure that game INI settings are properly applied. I don't
think we actually expose the affected settings on a per-game basis in
the UI, but still.
The active challenges, aka the primed achievements, are displayed on screen as a series of icons in the bottom right corner of the screen via OnScreenUI.
This way the Settings class doesn't contain a hardcoded reference to
a specific setting. And Settings.loadSettings no longer calls
getBoolean, which is a step towards fixing the crash when recreating
EmulationActivity after process death.
Emulation state changed signals also update the wiimote connection. The signal here is only needed for wiimote connects/disconnects.
Can fix erroneous debugger behavior during booting, as dolphin will sometimes incorrectly report the state as paused, which leads the debugger widgets to run when they shouldn't.
When an achievement is "primed", a challenge is active, for example completing a portion of the game in under a time limit or without taking damage or using certain items. This is stored in a map in the Achievement Manager (and removed when the achievement is unprimed) so a later commit can display it on screen.
The Disabled state sits between Game Closed and completely Shutdown - stronger than the former, as it refuses to let a game be opened again until AchievementManager is restored (which only happens upon a fresh core boot) but it isn't completely shut down and will still allow the player to be logged in and access the achievement settings and their (global) achievement header.
This change splits LoadGameAsync into three methods: two HashGame methods to accept either a string filepath or a volume, and a common LoadGameSync that both HashGames call to actually process the code. In the process, some minor cleanup, and the hash resolution now takes place on the work queue asynchronously.
Normally we only flush registers right at the end of each PPC
instruction. However, for PPC instructions that use a lot of registers
one at a time, it's beneficial to do this flushing work in the middle
of the instruction instead, reducing the risk of register starvation
and improving pipelining.
This should have been done when rebasing 6cc4f593e5 after the merge of
3a00ff625e. There are no correctness implications as far as I know,
only very minor performance implications.
This worked correctly on the JIT vertex loaders, and for the equivalent case with texture coordinates. I'm not aware of any games this affects (but libogc does mention a semi-related scenario at 6bc0317c7d/gc/ogc/gx.h (L1855-L1857).)
Jimmie Johnson's Anything with an Engine is known to use texture coordinate 7 (and only texture coordinate 7) in some cases. There are a lot of possible edge-cases, so this test brute-forces all combinations with coordinates 0, 1, and 2.
This test fails with the non-JIT vertex loader due to an issue fixed in a later commit in this PR. (Note that the non-JIT vertex loader is only used on machines where no JIT is available or if COMPARE_VERTEXLOADERS is enabled in VertexLoaderBase.cpp.)
This adds the actual switch to turn on Hardcore Mode to the settings tab of the Achievements dialog. It is accompanied by a large tooltip warning explaining what it does and when it can be enabled.
The switch is only enabled to be turned on when no game is running, so that games are started in hardcore mode and can only be loaded via the console's memory card, as in the original hardware. Hardcore may be turned off while a game is running, but cannot be turned back on until the game is disabled.
The toggle trigger for hardcore mode also automatically disables the settings that are not allowed during hardcore mode.
Finally, the original flag in AchievementSettingsWidget to set whether things are enabled in hardcore mode (primarily Leaderboards) is replaced with the actual Hardcore Mode setting.
Play Input Recording would potentially unlock achievements without any player input and needs to be disabled. If a recording is already playing, hardcore mode cannot be enabled.
The player getting a better view of their surroundings than the game would normally allow could possibly give the player an advantage over the original hardware, so Freelook is disabled in hardcore mode. To do this, I disable the config flag for Freelook when it is accessed, to make sure that it is disabled whether it was enabled before or after hardcore mode was enabled.
Memory patches would be an easy way to manipulate the memory needed to calculate achievement logic, so they must be disabled. Riivolution patches that do not affect memory are allowed, as they will be hashed with the game file.
Debug Mode gives players direct read and write access to memory, which could be used to completely manipulate RetroAchievements logic and therefore is not allowed in hardcore mode.
Slowing down the emulator would artificially improve reaction times and is therefore disallowed in hardcore mode. Speeds faster than 1x are allowed, but any speed below 1x is scaled back up to 1x.
Frame advancing is easily exploitable for slowing down a game and artificially improving reaction times and is not allowed in RetroAchievements hardcore mode.
While saving states is allowed (especially for the purpose of debugging), RetroAchievements does not allow loading saved states when hardcore mode is on.
This widget will be used in several places to notify the player that a feature has been disabled because hardcore mode is on. It includes a button to open the Achievement Settings so that Hardcore Mode may be turned off. Also included is the framework required to open AchievementsWindow specifically on the Settings tab.
Hardcore Mode is a RetroAchievements feature for enabling as close to original hardware as possible, to keep a fair, challenging, and competitive playing field for achievements (which get tallied differently and emphasized more in hardcore) and leaderboards (where it is mandatory) at the cost of several common emulator features that provide advantages, such as state loading and slower emulation speeds.
This commit just adds the flag to the AchievementSettings, with more to come.
Dolphin would crash when running with a fastmem arena but without
fastmem. This regression was caused by merging 9192128c50 without
adapting it after the merge of 0606433404.
This has the same effect in the end, but in my opinion, doing it this
way makes it more clear for the reader why we can read from ppcState at
JIT time, something that makes no sense for everything else in ppcState.
By making the JIT cache check if the current state of MMCR0 and MMRC1
matches the state they had at the time the JIT block was compiled, we
solve a correctness issue (marked in a comment as a speed hack).
Not known to affect any games.
This must have broken in a rebase of one of my recently merged PRs.
Dolphin still worked correctly with this bug, for two reasons:
1. Most AArch64 users are not on Windows, and therefore normally do have
the entry points map.
2. When the bug was triggered, Dolphin would fall back to the slower
path rather than crashing.
Instead of shifting left by 1, we can first shift right by 2 and then
left by 3. This is both faster and smaller, because we get the right
shift for free with the masking and the left shift for free with the
address calculation. It also happens to match the pseudocode more
closely, which is always nice for readability.
This slightly improves instruction-level parallelism in Jit64's slow
dispatcher by shifting the PC left instead of the MSR.
In the past, this also enabled an optimization in JitArm64's fast path
where we could use LDP to load normalEntry and msrBits in one
instruction, but this was superseded by fd9c970.
Just to make the code easier to understand at a glance. I especially
found it a bit annoying to reason about whether callee-saved registers
like W28 were being used because we needed a callee-saved register or
just for no reason in particular.
X8 and up is what compilers normally use when they're not register
starved.
AArch64's handling of NaNs in arithmetic instructions matches PowerPC's
as long as no more than one of the operands is NaN. If we know that all
inputs except the last input are non-NaN, we can therefore skip checking
the last input. This is an optimization that in principle only works for
non-SIMD operations, but ps_sumX effectively is non-SIMD as far as the
arithmetic part of it is concerned, so we can use it there too.
FL_EVIL is only used for blocking instructions from being reordered.
There are three types of instructions which have FL_EVIL set:
1. CR operations. The previous commits improved our CR analysis
and removed FL_EVIL from these instructions.
2. Load/store operations. These are always blocked from reordering
due to always having canCauseException set.
3. isync. I don't know if we actually need to prevent reordering
around this one, since as far as I know we only do reorderings
that are guaranteed to not change the behavior of the program.
But just in case, I've renamed FL_EVIL to FL_NO_REORDER instead of
removing it entirely, so that it can be used for this instruction.
Other than the CR instructions, which we now analyze properly,
all the covered instructions are not integer operations and also
have either FL_ENDBLOCK or FL_EVIL set, so there are two other
checks in CanSwapAdjacentOps that will reject them.
This brings the analysis done for condition registers
more in line with the analysis done for GPRs and FPRs.
This gets rid of the old wantsCR member, which wasn't actually
used anyway. In case someone wants it again in the future, they
can compute the bitwise inverse of crDiscardable.
Instead of materializing the quiet bit in a register and ORing the NaN
with it, we can perform an arithmetic operation on the NaN. This is a
cycle or two slower on some CPUs in cases where generating the quiet bit
pipelined well, but this is farcode that rarely runs, so instruction
fetch latency is the bigger concern. And for non-SIMD cases, we also
save a register.
By using MOVI2R+MOVI2R+CSEL in the zero case instead of doing bitwise
operations on the output of the other MOVI2R+MOVI2R+CSEL, we avoid using
BFI, an instruction that takes two cycles on most CPUs. The instruction
count is the same and the pipelining should be at least equally good.
This is a little trick I came up with that lets us restructure our float
classification code so we can exit earlier when the float is normal,
which is the case more often than not.
First we shift left by 1 to get rid of the sign bit, and then we count
the number of leading sign bits. If the result is less than 10 (for
doubles) or 7 (for floats), the float is normal. This is because, if the
float isn't normal, the exponent is either all zeroes or all ones.
Not sure if this was causing correctness issues – it depends on whether
the HLE code was actually reading the discarded registers – but it was
at least causing annoying assert messages in one piece of homebrew.
Nowadays, basically everything except for controller config is handled
by the new config system. Instead of enumerating the systems that are,
let's enumerate the systems that aren't.
I've intentionally not included Config::System::Session in the new list.
While it isn't intended to be saved, it is a setting that's fully
handled by the new config system. See
https://github.com/dolphin-emu/dolphin/pull/9804#discussion_r648949686.
This is used as a base pointer inside CustomPipelineAction, so this
should probably really have a virtual destructor to ensure derived
objects are torn down properly.
The point of farcode is to provide a separate location for code that
rarely runs, so that it doesn't pollute the icache. Taking a conditional
branch is something that happens very often, so the code for that
shouldn't be in farcode.
Bug: https://bugs.dolphin-emu.org/issues/13404
On macOS 13.6 / Intel HD 5000, Dolphin crashes with this message:
> -[MTLIGAccelDevice setShouldMaximizeConcurrentCompilation:]: unrecognized selector
This should be available on all macOS 13.3+ systems – but when using OCLP drivers,
some devices use an older version of Metal.framework, which doesn't expose the selector.
This concerns Intel Ivy Bridge, Haswell and Nvidia Kepler when using OCLP on macOS 13.3
or newer.
(See
34676702f4/docs/PATCHEXPLAIN.md?plain=1#L354C1-L354C83)
As the behavior is an optional optimization anyway, perform a dynamic
detection to avoid crashing if the feature is not available.
Some state changes are meant to be near instantanoues, before switching to something else. By reporting ithe instant switch, the UI will flicker between states (pause/play button) and the debugger will unnecessarily update. Skipping the callback avoids these issues.
This was implemented to prevent UI flickering due to the state rapidly switching between pause/play. Recently, it has been causing issues with debugger windows, which update during frame advance.
This way, the number of loop iterations is equal to the number of set
bits in the bitset rather than the number of bits in the bitset,
which is a good improvement because usually very few bits are set.
This caused us to update the indirect texture information in shaders more often than we needed to, which probably doesn't matter in practice since it's only used in ubershaders and copyyscale and stride are generally only updated before EFB/XFB copies, which generally will have other changes afterwards.
As far as I can tell, it has nothing to do with the mipmap/half_scale functionality, but does change based on the width of the destination texture (and the destination texture is half the width if half_scale is set). The comment that was there (which dates back to the initial megacommit) seems to not have accounted for the width aspect; it was first used as an actual stride in bbbe898839 (the first commit that used it at all).
The move assignment operator for a class is implicitly deleted when the
class has a non-static reference data member, which is true of
WiiSocket's m_socket_manager member.
Explicitly declaring the operator as default generates a
-Wdefaulted-function-deleted warning on Clang.
Delete the move constructor as well for consistency.
Fix -WSwitch warning about unhandled enum value SDL_NUM_LOG_PRIORITIES.
log_level is initialized to LNOTICE right before the switch statement so
this doesn't cause any behavior changes.
Use RunOnCPUThread instead of RunAsCPUThread in BeginRecordingInput.
Most OpenGL functions require an OpenGL context to have been created on
that thread before calling the function; when that isn't the case they
return invalid results which can cause crashes when passed into other
functions.
Dolphin creates the OpenGL context in the EmuThread which then becomes
either the CPU-GPU thread or the Video thread for single and dual core
respectively. OpenGL functions must therefore be called from that
thread.
Movie::BeginRecordingInput is called from the Host thread and runs a
block of code which ultimately creates a savestate, which in turn embeds
the framebuffer which requires calling various OpenGL functions.
In single core the use of RunAsCPUThread leads to this all happening on
the Host thread, eventually leading to invalid OpenGL calls and a crash.
In Dual core the crash is avoided because VideoBackendBase::DoState uses
the AsyncRequests::DO_SAVE_STATE event which causes VideoCommon_DoState
and its subsequent OpenGL calls to safely run on the Video thread.
This commit uses RunOnCPUThread instead of RunAsCPUThread, which causes
the subsequent code to run on the CPU-GPU thread in single core which
has the valid OpenGL context and so doesn't crash.
This makes it so that if you just want to reload the current style (eg. on program start, or in response to a system event), you don't need to know the name of the currently selected user style. It's also more consistent with the way the 'userstyle/enabled' flag works.
Before dbf5dca, the dirty flag had no meaning for an immediate value,
so we made sure to always set the dirty flag when switching a register
from Immediate to Register. But after dbf5dca, that is no longer the
case. If an immediate is marked as not dirty, we can keep the register
marked as not dirty after materializing the value. This way we skip
having to write it back to ppcState later.
Without this change, non-dirty immediates don't actually get flushed.
This can be a problem if we for instance are flushing all registers in
order to execute an interpreter fallback. If that interpreter fallback
writes to a register that contained a non-dirty immediate, the JIT will
keep using the old value instead of loading the updated value.
This required a change in the denormal path where, instead of
subtracting 11 before shifting left, we shift left immediately and then
shift right by 11. This shouldn't affect performance.
Instead of combining X2 (the exponent) and X3 (the mantissa) using an
ORR instruction, we can read X1, which already contains both.
This requires us to reconstruct X1 in the denormal path, but that's
an acceptable price.
Dolphin's JITs have a minor terminology problem: The term "fastmem" can
refer to either the system of switching between a fast path and a slow
path using backpatching, or to the fast path itself. To hopefully make
things clearer, I'm adding some new terms, defining the old and new
terms as follows:
Fastmem: The system of switching from a fast path to a slow path by
backpatching when an invalid memory access occurs.
Fast access: A code path that accesses guest memory without calling C++
code.
Slow access: A code path that accesses guest memory by calling C++ code.