Now that we've flipped the C++20 switch, let's start making use of
the nice new <bit> header.
I'm planning on handling this move away from BitUtils.h incrementally
in a series of PRs. There may be a few functions remaining in
BitUtils.h by the end that C++20 doesn't have any equivalents for.
The "vector shift by immediate" category encodes the shift amount for
right shifts as `size - amount`, whereas left shifts use `amount`.
We're not actually using SHRN/SHRN2 anywhere, which is why this has gone
undetected.
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.
1. In some cases, ps_merge01 can be implemented using one instruction.
2. When we need two instructions for ps_merge01, it's best to start with
a MOV to avoid false dependencies on the destination register.
3. ps_merge10 can be implemented using a single EXT instruction.
This new function is like MOVP2R, except it masks out the lower 12 bits,
returning them instead of writing them to the register. These lower
12 bits can then be used as an offset for LDR/STR. This lets us turn
ADRP+ADD+LDR sequences with a zero offset into ADRP+LDR sequences with
a non-zero offset, saving one instruction.
ARM64 can do perform various types of sign and zero extension on a
register value before using it. The Arm64Emitter already had support for
this, but it was kinda hidden away.
This commit exposes the functionality by making the ExtendSpecifier enum
available everywhere and adding a new ArithOption constructor.
The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
This was added in 385d8e2b15, but became somewhat redundant with Do in 4c7bbd96e4, and completely redundant now that std::is_trivially_copyable_v is well-supported.
Because of the previous commit, this is needed to stop DolphinQt from
forgetting that the user pressed ignore whenever any part of the config
is changed.
This commit also changes the behavior a bit on DolphinQt: "Ignore for
this session" now applies to the current emulation session instead of
the current Dolphin launch. This matches how it already worked on
Android, and is in my opinion better because it means the user won't
lose out on important panic alerts in a game becase they played another
game first that had repeated panic alerts that they wanted to ignore.
For Android, this commit isn't necessary, but it makes the code cleaner.
Per https://en.cppreference.com/w/cpp/preprocessor/replace#.23_and_.23.23_operators the `##` behavior is a nonstandard extension; this extension seems to be supported by all compilers we care about, but IntelliSense in visual studio doesn't correctly handle it, resulting in false errors in the IDE (but not when compiling).
Per https://en.cppreference.com/w/cpp/preprocessor/replace#Function-like_macros C++20 introduced a workaround, where `__VA_OPT__(, )` generates a comma if and only if `__VA_ARGS__` is non-empty.
This PR replaces all occurrences, with the exception of Externals, DSPSpy (which is not likely to be edited in MSVC and does not target C++20 currently), and JitArm64_Integer.cpp (which uses `Function(__VA_ARGS__)`, and thus does not ever need a comma).
walking the zip prevents minizip from re-reading the same
data repeatedly from the actual backing filesystem.
also improves most usages of minizip to allow for >4GB,
files altho we probably don't need it
This was causing a bug in the rounding of paired single multiplication
operands. If Force25BitPrecision was called for quad registers, the
element size of its ADD instruction would get treated as if it was 16
instead of the intended 64, which would cause the result of the
calculation to be incorrect if the carry had to pass a 16-bit boundary.
Fixes one of the two bugs reported in
https://bugs.dolphin-emu.org/issues/12998.
Heavily simplify logical immediate encoding.
This is based on the observation that if a valid repeating element
exists, it repeats through `value`. Thus it does not matter which
one you analyse. Thus we skip over the least significent element
if LSB = 1 by masking it out with `inverse_mask_from_trailing_ones`,
to avoid the degenerate case of a stretch of 1 bits going 'round
the end' of the word.
Ninja puts way more effort into compiling targets in parallel, and
ignores dependenceis until link time.
So we need to jump though hoops to force ninja to compile
pch.cpp before any targets which depend on the PCH.
I have no idea why cmake supports PUBLIC on target_sources,
but it does. It causes all targets that depend on this target
to try and include the files in their sources.
Except it doesn't take paths into account, so it breaks. Mabye
it would work if you used an abolute source? But I'm not sure
there is a sane usecase.
These messages hid other, more important, ones often. I have left AttemptMaxTimesWithExponentialDelay and GetSysDirectory/SetSysDirectory as info, since those are called infrequently and can be useful to the end-user.
Fixes https://bugs.dolphin-emu.org/issues/12827.
A description of what was going wrong:
JitArm64::Init first calls CodeBlock::AllocCodeSpace, after which
CodeBlock and Arm64Emitter consider us to have 96 MB of code space
available. JitArm64::Init then calls AddChildCodeSpace, which is
supposed to take 64 MiB of that space and give it to m_far_code.
CodeBlock's view of how much space there is gets updated from 96 MiB
to 32 MiB, but due to the missing call, Arm64Emitter keeps thinking
that it has 96 MiB of space available.
The last thing JitArm64::Init does is to call ResetFreeMemoryRanges.
This function asks Arm64Emitter how much code space is available and
stores a range of that size in m_free_ranges_near, meaning that
m_free_ranges_near ends up being backed by both nearcode and farcode!
This is a ticking time bomb; as soon as we grab memory from
m_free_ranges_near which is backed by farcode, we're in trouble.
The crash I ran into in my testing was caused by fastmem code being
allocated in farcode (our backpatch handler only handles accesses made
from nearcode), but you may as well get errors caused by code intended
for nearcode overwriting code intended for farcode or vice versa.
So why did NBA Live 2005 crash when most games had no problems,
and why was the bug bisected to the commit that increased the size
of far code from 16 MiB to 64 MiB? Well, as long as we're only
using the first 32 MiB of the big 96 MiB range, everything works.
What happens with NBA Live 2005 (I have not investigated exactly
through what mechanism this happens) is that at some point the range
in m_free_ranges_near gets split into two ranges, one which is
backed by nearcode and one which is backed by farcode. Dolphin
prefers to select the biggest range available (we don't want to
pick a tiny 1 KiB range that may not be able to fit the whole block
we're about to emit, after all), and after increasing the size of
farcode to 64 MiB, farcode is bigger than nearcode.
Fixes a crash that could occur if the static constructor function for
the MainSettings.cpp TU happened to run before the variables in
Common/Version.cpp are initialised. (This is known as the static
initialisation order fiasco.)
By using wrapper functions, those variables are now guaranteed to be
constructed on first use.
Using unsigned char* or signed char* results in a deprecation warning, which is treated as an error. It needs to be casted to regular char* for it to work.
This format string is by definition dynamic and can't be checked at compile time. There are other similar strings in the log handler and in asserts, but they use vformat and thus don't need fmt::runtime. We might be able to do a similar thing where the untranslated string is compile-time checked, but FmtFormatT is used in so few places that I don't want to handle that in this PR.
HRWrap now allows HRESULT to be formatted, giving useful information beyond "it failed" or a hex code that isn't obvious to most users. This commit does not add any uses of it, though.