This second stack leads to JNI problems on Android, because ART fetches
the address and size of the original stack using pthread functions
(see GetThreadStack in art/runtime/thread.cc), and (presumably) treats
stack addresses outside of the original stack as invalid. (What I don't
understand is why some JNI operations on the CPU thread work fine
despite this but others don't.)
Instead of creating a second stack, let's borrow the approach ART uses:
Use pthread functions to find out the stack's address and size, then
install guard pages at an appropriate location. This lets us get rid
of a workaround we had in the MsgAlert function.
Because we're no longer choosing the stack size ourselves, I've made some
tweaks to where the put the guard pages. Previously we had a stack of
2 MiB and a safe zone of 512 KiB. We now accept stacks as small as 512 KiB
(used on macOS) and use a safe zone of 256 KiB. I feel like this should
be fine, but haven't done much testing beyond "it seems to work".
By the way, on Windows it was already the case that we didn't create
a second stack... But there was a bug in the implementation!
The code for protecting the stack has to run on the CPU thread, since
it's the CPU thread's stack we want to protect, but it was actually
running on EmuThread. This commit fixes that, since now this bug
matters on other operating systems too.
This very much isn't a build configuration that we're going to ship,
but I want to be able to tell people that they can build it on their
own if they really want to see how terribly it performs :)
Just like before, you'll need to edit two lines in app/build.gradle to
define ENABLE_GENERIC=ON and actually enable armeabi-v7a if you want an
armeabi-v7a build. This commit just fixes some compilations errors that
crop up if you do so.
This fixes a problem I was having where using frame advance with the
debugger open would frequently cause panic alerts about invalid addresses
due to the CPU thread changing MSR.DR while the host thread was trying
to access memory.
To aid in tracking down all the places where we weren't properly locking
the CPU, I've created a new type (in Core.h) that you have to pass as a
reference or pointer to functions that require running as the CPU thread.
While the NV extension is totally fine, the KHR extension should be able to support more hardware.
For NVIDIA, the hardware either supports both or neither, it just needs a driver from the last two years.
For AMD, the drivers from late 2022-12 seems to bring support for the KHR extension.
For Intel, the KHR is also supported for some years.
- Cancel doesn't shut down anymore.
Allowing it to be used multiple times thoughout the life of
the WorkQueue
- Remove Clear, so we only have Cancel semantics
- Add IsCancelling so work items can abort early if cancelling
- Replace m_cancelled and m_thread.joinable() guars with m_shutdown.
- Rename Flush to WaitForCompletion (As it's ambiguous if a function
called flush should be blocking or not)
- Add documentation
A lot of the remaining complexity in Renderer is the massive Swap function
which tries to handle a bunch of FrameBegin/FrameEnd events.
Rather than create a new place for it. This event system will try
to distribute it all over the place
Macros that expand to include the standard define macro are undefined.
This is pretty trivial to fix. We can just do the test and then define
the name itself if it's true, rather than making the set of definition
checks the macro itself.
Now that we've flipped the C++20 switch, let's start making use of
the nice new <bit> header.
I'm planning on handling this move away from BitUtils.h incrementally
in a series of PRs. There may be a few functions remaining in
BitUtils.h by the end that C++20 doesn't have any equivalents for.
The "vector shift by immediate" category encodes the shift amount for
right shifts as `size - amount`, whereas left shifts use `amount`.
We're not actually using SHRN/SHRN2 anywhere, which is why this has gone
undetected.
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.
1. In some cases, ps_merge01 can be implemented using one instruction.
2. When we need two instructions for ps_merge01, it's best to start with
a MOV to avoid false dependencies on the destination register.
3. ps_merge10 can be implemented using a single EXT instruction.
This new function is like MOVP2R, except it masks out the lower 12 bits,
returning them instead of writing them to the register. These lower
12 bits can then be used as an offset for LDR/STR. This lets us turn
ADRP+ADD+LDR sequences with a zero offset into ADRP+LDR sequences with
a non-zero offset, saving one instruction.
ARM64 can do perform various types of sign and zero extension on a
register value before using it. The Arm64Emitter already had support for
this, but it was kinda hidden away.
This commit exposes the functionality by making the ExtendSpecifier enum
available everywhere and adding a new ArithOption constructor.
The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
This was added in 385d8e2b15, but became somewhat redundant with Do in 4c7bbd96e4, and completely redundant now that std::is_trivially_copyable_v is well-supported.
Because of the previous commit, this is needed to stop DolphinQt from
forgetting that the user pressed ignore whenever any part of the config
is changed.
This commit also changes the behavior a bit on DolphinQt: "Ignore for
this session" now applies to the current emulation session instead of
the current Dolphin launch. This matches how it already worked on
Android, and is in my opinion better because it means the user won't
lose out on important panic alerts in a game becase they played another
game first that had repeated panic alerts that they wanted to ignore.
For Android, this commit isn't necessary, but it makes the code cleaner.
Per https://en.cppreference.com/w/cpp/preprocessor/replace#.23_and_.23.23_operators the `##` behavior is a nonstandard extension; this extension seems to be supported by all compilers we care about, but IntelliSense in visual studio doesn't correctly handle it, resulting in false errors in the IDE (but not when compiling).
Per https://en.cppreference.com/w/cpp/preprocessor/replace#Function-like_macros C++20 introduced a workaround, where `__VA_OPT__(, )` generates a comma if and only if `__VA_ARGS__` is non-empty.
This PR replaces all occurrences, with the exception of Externals, DSPSpy (which is not likely to be edited in MSVC and does not target C++20 currently), and JitArm64_Integer.cpp (which uses `Function(__VA_ARGS__)`, and thus does not ever need a comma).
walking the zip prevents minizip from re-reading the same
data repeatedly from the actual backing filesystem.
also improves most usages of minizip to allow for >4GB,
files altho we probably don't need it
This was causing a bug in the rounding of paired single multiplication
operands. If Force25BitPrecision was called for quad registers, the
element size of its ADD instruction would get treated as if it was 16
instead of the intended 64, which would cause the result of the
calculation to be incorrect if the carry had to pass a 16-bit boundary.
Fixes one of the two bugs reported in
https://bugs.dolphin-emu.org/issues/12998.
Heavily simplify logical immediate encoding.
This is based on the observation that if a valid repeating element
exists, it repeats through `value`. Thus it does not matter which
one you analyse. Thus we skip over the least significent element
if LSB = 1 by masking it out with `inverse_mask_from_trailing_ones`,
to avoid the degenerate case of a stretch of 1 bits going 'round
the end' of the word.
Ninja puts way more effort into compiling targets in parallel, and
ignores dependenceis until link time.
So we need to jump though hoops to force ninja to compile
pch.cpp before any targets which depend on the PCH.
I have no idea why cmake supports PUBLIC on target_sources,
but it does. It causes all targets that depend on this target
to try and include the files in their sources.
Except it doesn't take paths into account, so it breaks. Mabye
it would work if you used an abolute source? But I'm not sure
there is a sane usecase.
These messages hid other, more important, ones often. I have left AttemptMaxTimesWithExponentialDelay and GetSysDirectory/SetSysDirectory as info, since those are called infrequently and can be useful to the end-user.
Fixes https://bugs.dolphin-emu.org/issues/12827.
A description of what was going wrong:
JitArm64::Init first calls CodeBlock::AllocCodeSpace, after which
CodeBlock and Arm64Emitter consider us to have 96 MB of code space
available. JitArm64::Init then calls AddChildCodeSpace, which is
supposed to take 64 MiB of that space and give it to m_far_code.
CodeBlock's view of how much space there is gets updated from 96 MiB
to 32 MiB, but due to the missing call, Arm64Emitter keeps thinking
that it has 96 MiB of space available.
The last thing JitArm64::Init does is to call ResetFreeMemoryRanges.
This function asks Arm64Emitter how much code space is available and
stores a range of that size in m_free_ranges_near, meaning that
m_free_ranges_near ends up being backed by both nearcode and farcode!
This is a ticking time bomb; as soon as we grab memory from
m_free_ranges_near which is backed by farcode, we're in trouble.
The crash I ran into in my testing was caused by fastmem code being
allocated in farcode (our backpatch handler only handles accesses made
from nearcode), but you may as well get errors caused by code intended
for nearcode overwriting code intended for farcode or vice versa.
So why did NBA Live 2005 crash when most games had no problems,
and why was the bug bisected to the commit that increased the size
of far code from 16 MiB to 64 MiB? Well, as long as we're only
using the first 32 MiB of the big 96 MiB range, everything works.
What happens with NBA Live 2005 (I have not investigated exactly
through what mechanism this happens) is that at some point the range
in m_free_ranges_near gets split into two ranges, one which is
backed by nearcode and one which is backed by farcode. Dolphin
prefers to select the biggest range available (we don't want to
pick a tiny 1 KiB range that may not be able to fit the whole block
we're about to emit, after all), and after increasing the size of
farcode to 64 MiB, farcode is bigger than nearcode.
Fixes a crash that could occur if the static constructor function for
the MainSettings.cpp TU happened to run before the variables in
Common/Version.cpp are initialised. (This is known as the static
initialisation order fiasco.)
By using wrapper functions, those variables are now guaranteed to be
constructed on first use.
Using unsigned char* or signed char* results in a deprecation warning, which is treated as an error. It needs to be casted to regular char* for it to work.
This format string is by definition dynamic and can't be checked at compile time. There are other similar strings in the log handler and in asserts, but they use vformat and thus don't need fmt::runtime. We might be able to do a similar thing where the untranslated string is compile-time checked, but FmtFormatT is used in so few places that I don't want to handle that in this PR.
HRWrap now allows HRESULT to be formatted, giving useful information beyond "it failed" or a hex code that isn't obvious to most users. This commit does not add any uses of it, though.
Now, enums are properly displayed, and BitFieldArray is also displayed nicely. Signed values also work correctly, and 1-bit fields are not treated as bools unless the bitfield is explicitly marked as a bool.
PNG_FORMAT_RGB and PNG_COLOR_TYPE_RGB both evaluate to 2, but PNG_FORMAT_RGBA evaluates to 3 while PNG_COLOR_TYPE_RGBA evaluates to 6; the bit indicating a palette is 1 while the bit indicating alpha is 4.