Unlike PCs, Android doesn't really have any input method (not counting
touch) that can reasonably be expected to exist on most devices.
Because of this, I don't think shipping with a default mapping for the
buttons and sticks of GameCube controllers and Wii Remotes makes sense.
I would however like to ship default mappings for a few things:
1. Mapping the Wii Remote's accelerometer and gyroscope to the device's
accelerometer and gyroscope. This functionality is useful mainly
for people who use the touchscreen, but can also be useful when
using a clip-on controller. The disadvantage of having this mapped
by default is that games disable pointer input if the accelerometer
reports that the Wii Remote is pointed at the ceiling.
2. Mapping GC keyboards for use with a physical keyboard, like on PC.
After all, there's no other way of mapping them that makes sense.
3. Mapping rumble to the device's vibrator.
Aside from the GC keyboards, this approach is effectively the same as
what we were doing before the input overhaul.
This is a battery-saving measure. Whether a sensor should be suspended
is determined in the same way as whether key events and motion events
should be handled by the OS rather than consumed by Dolphin.
When Android presents an input event to an app, it wants the app to
return true or false depending on whether the app handled the event or
not. If the event wasn't handled by the app, it will be passed on to
the system, which may decide to take an action depending on what kind
of input event it is. For instance, if a B button press is passed on to
the system, it will be turned into a Back press. But if an R1 press is
passed on to the system, nothing in particular happens.
It's important that we get this return value right in Dolphin. For
instance, the user generally wouldn't want a B button press to open
the EmulationActivity menu, so B button presses usually shouldn't be
passed on to the system - but volume button presses usually should be
passed on to the system, since it would be hard to adjust the volume
otherwise. What ButtonManager did was to pass on input events that are
for a button which the user has not mapped, which I think makes sense.
But exactly how to implement that is more complicated in the new input
backend than in ButtonManager, because now we have a separation between
the input backend and the code that keeps track of the user's mappings.
What I'm going with in this commit is to treat an input as mapped if
it has been polled recently. In part I chose this because it seemed
like a simple way of implementing it that wouldn't cause too many
layering violations, but it also has two useful side effects:
1. If a controller is not being polled (e.g. GameCube controllers in
Wii games that don't use them), its mappings will not be considered.
2. Once sensor input is implemented in the Android input backend,
we will be able to use this "polled recently" tracking to power down
the sensors at times when the game is using a Wii Remote reporting
mode that doesn't include motion data. (Assuming that the sensor
inputs only are mapped to Wii Remote motion controls, that is.)
Android doesn't let us poll inputs whenever we want. Instead, we
listen to input events (activities will have to forward them to the
input backend), and store the received values in atomic variables
in the Input classes. This is similar in concept to how ButtonManager
worked, but without its homegrown second input mapping system.
ButtonManager is very different from how a normal input backend works,
and is making it hard for us to improve controller support on Android.
The following commits will add a new input backend in its place.
When that setting is enabled, m_xfb_entry is initially not present (during the phase where a shader compilation progress bar would be shown). The main path checks for m_xfb_entry, but the software renderer fallback path didn't.
Fixes another aspect of https://bugs.dolphin-emu.org/issues/13172.
Before, it used a fallback where it returned a default object, where the width and height were set to 0. Presenter::Initialize() used GetSurfaceInfo to set the backbuffer size, then used that size when initializing the on-screen UI (even for the software renderer, where the on-screen UI isn't currently present), which meant that ImGui got a window size of 0 and thus resulted in a failed assertion.
Although BindBackbuffer checks for size changes, it doesn't help because ImGui has already been initialized, and the size hasn't actually changed since initialization occured.
Fixes one aspect of https://bugs.dolphin-emu.org/issues/13172.
This second stack leads to JNI problems on Android, because ART fetches
the address and size of the original stack using pthread functions
(see GetThreadStack in art/runtime/thread.cc), and (presumably) treats
stack addresses outside of the original stack as invalid. (What I don't
understand is why some JNI operations on the CPU thread work fine
despite this but others don't.)
Instead of creating a second stack, let's borrow the approach ART uses:
Use pthread functions to find out the stack's address and size, then
install guard pages at an appropriate location. This lets us get rid
of a workaround we had in the MsgAlert function.
Because we're no longer choosing the stack size ourselves, I've made some
tweaks to where the put the guard pages. Previously we had a stack of
2 MiB and a safe zone of 512 KiB. We now accept stacks as small as 512 KiB
(used on macOS) and use a safe zone of 256 KiB. I feel like this should
be fine, but haven't done much testing beyond "it seems to work".
By the way, on Windows it was already the case that we didn't create
a second stack... But there was a bug in the implementation!
The code for protecting the stack has to run on the CPU thread, since
it's the CPU thread's stack we want to protect, but it was actually
running on EmuThread. This commit fixes that, since now this bug
matters on other operating systems too.
I also changed LoadConfig, but that change doesn't affect correctness,
it's only so it looks neat by matching SaveConfig.
This bug was added in 18a4afb053, the
commit that introduced DefaultValue::Disabled. The bug can't actually be
triggered in master, but it can be triggered in the Android input
overhaul PR.
The HLSL compiler incorrectly decides isnan can't be true, so this
workaround was originally added in 52c82733 but lost during the
conversion to SPIR-V.
This broke formatting the system memory; see https://bugs.dolphin-emu.org/issues/13176. After calling ticket.DeleteTicket(), ticket.m_bytes was 0-length, but calling ticket.IsV1Ticket() still attempted to read from m_bytes.
This was introduced in 2fd9852ca8, although it didn't actually cause a crash until 929fba08e7.
When faced with this error, users often don't try disabling dual core,
even though the error message suggests it. Perhaps the message is just
too long and lists too many things?
To try to improve the situation, I'm rewording the message and making it
say different things depending on what settings you are using.
The LEA that the signal handler is trying to undo the effects of is a
32-bit instruction, and the value in the register prior to the LEA is
also 32-bit, so the signal handler should use a 32-bit write.
(Actually, in the end this doesn't really matter, because the first
instruction that reads this value after backpatching is also a 32-bit
instruction...)
This resulted in the labels being solid black even when audio stretching is disabled the first time the settings are opened, but then properly being greyed out after changing a setting (even the audio backend or DSP emulation engine, not just whether audio stretching is enabled).
This very much isn't a build configuration that we're going to ship,
but I want to be able to tell people that they can build it on their
own if they really want to see how terribly it performs :)
Just like before, you'll need to edit two lines in app/build.gradle to
define ENABLE_GENERIC=ON and actually enable armeabi-v7a if you want an
armeabi-v7a build. This commit just fixes some compilations errors that
crop up if you do so.
Before these changes you could tell Dolphin to convert a game file into the same format it is already in, leading to the FileDialog using the input path as the default destination path
An unsuspecting user could then click Save and Dolphin would try to convert the input file by writing the destination file on top of it... leading to an I/O error and the input file being entirely removed
This fixes a regression from 592ba31. When `a` was a constant 0 and `b`
was a non-constant 0x80000000, the 32-bit negation operation would
overflow, causing an incorrect result. The sign extension needs to happen
before the negation to avoid overflow.
Note that I can't merge the SXTW and NEG into one instruction.
NEG is an alias for SUB with the first operand being set to ZR,
but "SUB (extended register)" treats register 31 as SP instead of ZR.
I've also changed the order for the case where `a` is a constant
0xFFFFFFFF. I don't think the order actually affects correctness here,
but let's use the same order for all the cases since it makes the code
easier to reason about.
This works around an Intel driver bug where, on D3D12 only, dual-source blending behaves incorrectly if the second source is unused on. This bug is visible in skyboxes in Super Mario Sunshine, which first draw clouds and sun flare in greyscale and then draw the sky afterwards with a source factor of 1 and a dest factor of 1-src_color (this results in the clouds being tinted blue). This process is done on an RGB888 framebuffer, so alpha update is disabled. (Color update is enabled; note that if you look at this in Dolphin's fifo analyzer, it won't be enabled because they use the BP mask functionality to only change the blending functions and not alpha/color update, for whatever reason.)
The previous code only updated the PLRU on cache misses, which made it so that the least recently inserted cache block was evicted, instead of the least recently used/hit one.
This regressed in 9d39647f9e (part of #11183, but it was fine in e97d380437), although beforehand it was only implemented for the instruction cache, and the instruction cache hit extremely infrequently when the JIT or cached interpreter is in use, which generally keeps it from behaving correctly (the pure interpreter behaves correctly with it).
I'm not aware of any games that are affected by this, though I did not do extensive testing.
Previously we would only backpatch overflowed address calculations
if the overflow was 0x1000 or less. Now we can handle the full 2 GiB
of overflow in both directions.
I'm also making equivalent changes to JitArm64's code. This isn't because
it needs it – JitArm64 address calculations should never overflow – but
because I wanted to get rid of the 0x100001000 inherited from Jit64 that
makes even less sense for JitArm64 than for Jit64.
This avoids a pseudo infinite loop where CodeWidget::UpdateCallstack
would lock the CPU in order to read the call stack, causing the CPU to
call Host_UpdateDisasmDialog because it's transitioning from running to
pausing, causing Host::UpdateDisasmDialog to be emitted, causing
CodeWidget::Update to be called, once again causing
CodeWidget::UpdateCallstack to be called, repeating the cycle.
Dolphin didn't go completely unresponsive during this, because
Host_UpdateDisasmDialog schedules the emitting of Host::UpdateDisasmDialog
to happen on another thread without blocking, but it was stopping certain
operations like exiting emulation from working.
This fixes a problem I was having where using frame advance with the
debugger open would frequently cause panic alerts about invalid addresses
due to the CPU thread changing MSR.DR while the host thread was trying
to access memory.
To aid in tracking down all the places where we weren't properly locking
the CPU, I've created a new type (in Core.h) that you have to pass as a
reference or pointer to functions that require running as the CPU thread.
This code doesn't need to be portable (since the goal is to have a smaller offset for x64 codegen), so if it's not supported there are other problems. Similar code exists in e.g. DSP.cpp.
While the NV extension is totally fine, the KHR extension should be able to support more hardware.
For NVIDIA, the hardware either supports both or neither, it just needs a driver from the last two years.
For AMD, the drivers from late 2022-12 seems to bring support for the KHR extension.
For Intel, the KHR is also supported for some years.
Frame duplicate detection was inverted. Huge problem for 60fps games
where it would see all frames as "duplicates" and nothing would ever be
presented.
The logging was broken in 958cbf38a4 (DSPTool doesn't use dolphin's logging system, so it just produced nothing; the same thing affected comparing before 693a29f8ce).
AssemblerError::LabelAlreadyExists (previously ERR_LABEL_EXISTS) simply was never used.
- Cancel doesn't shut down anymore.
Allowing it to be used multiple times thoughout the life of
the WorkQueue
- Remove Clear, so we only have Cancel semantics
- Add IsCancelling so work items can abort early if cancelling
- Replace m_cancelled and m_thread.joinable() guars with m_shutdown.
- Rename Flush to WaitForCompletion (As it's ambiguous if a function
called flush should be blocking or not)
- Add documentation
CMakeSettings.json is a Visual Studio only extention to cmake that isn't
supported anywhere else. Not even Visual Studio Code.
So we set CMAKE_PREFIX_PATH inside DolphinQt's CMakeLists.txt instead.
A lot of the remaining complexity in Renderer is the massive Swap function
which tries to handle a bunch of FrameBegin/FrameEnd events.
Rather than create a new place for it. This event system will try
to distribute it all over the place
Almost all the virtual functions in Renderer are part of dolphin's
"graphics api abstraction layer", which has slowly formed over the
last decade or two.
Most of the work was done previously with the introduction of the
various "AbstractX" classes, associated with texture cache cleanups
and implementation of newer graphics APIs (Direct3D 12, Vulkan, Metal).
We are simply taking the last step and yeeting these functions out
of Renderer.
This "AbstractGfx" class is now completely agnostic of any details
from the flipper/hollywood GPU we are emulating, though somewhat
specialized.
(Will not build, this commit only contains changes outside VideoBackends)
Texture cache occasionally mutates textures for efficiency.
Which is awkward if we want to borrow those textures from texture cache
to do something else, such as a graphics debugger, or async presentation
on another thread.
Content locking provides a way to signal that the contents of a texture
cache entry should not change. Texture cache will be forced to use
alternative strategies.
The whole ownership model was getting a bit of a mess, with a some
of special cases to deal with. And I'm planning to make it even more
complex in the future.
So here is some upfront work to convert it over to reference counted
pointers.
Macros that expand to include the standard define macro are undefined.
This is pretty trivial to fix. We can just do the test and then define
the name itself if it's true, rather than making the set of definition
checks the macro itself.
This reverts commit e140516130. This assert triggers for AX and AXWii uCode games (including the Wii System Menu) for various addresses that seem to be 4-byte aligned. Worse still, if the DSP thread is in use (i.e. for DSP LLE recompiler, but not for DSP LLE interpreter), Dolphin completely hangs after the panic alert. Perhaps the data DMA has fewer restrictions compared to the instruction DMA?
The change to DSPTool (e391a28102) has not been reverted, as it still fixes broken behavior for DSPSpy at -O0 on real hardware.
Gives two members without explicit initialization default values to be
consistent with the rest of the class and also ensuring deterministic
values on construction.
Though less important compared to #11470, save states also show the full path in the OSD message and could potentially dox a streamer who is playing in Dolphin. This is a simple fix - it removes the path from the message and only displays the file name.
Capitalize Skylander in tr strings
Lint and validation method fixes
Proper Attach and Change Interface method
Re-jig code to exit early and read easier
I haven't tested this extensively on real hardware, but I do know that bad things happen if the address isn't properly aligned, and libogc says it should be 32-byte aligned.
Turns out all OSD messages, every single one, are written to the titlebar. We've just never seen them because the FPS is in the title bar and it replaces it in a fraction of a second. This was only visible when saving savestates because it halts emulation for a moment while writing.
This is dumb, let's not do that anymore.
A few weeks ago, a vtuber tweeted that they had to remove a vod of their stream because Dolphin Emulator showed some personal information during the steam, and left a warning to everyone else that Dolphin shows the account name of the computer. And yea, we do, we show the full directory of the memory card every time a memory card is written, and due to mandatory Microsoft account nonsense, that is very likely to contain someone’s real name.
Fortunately this is very easy for us to solve. This change simply removes the filename from wrote memory card contents string. That’s it. All functionality of the wrote memory card OSD message remains the same, it just doesn’t say where the memory card is anymore.
There are lots of other potential solutions to this but after talking on IRC it seems the simplest one is the best.
Skylander code tidy ups
Convert c array to std::array and fix comments
Formatting fixes/review changes
Variable comment
Migrate portal to System Impl and code tidy ups
Use struct
Restore review changes
Minor fix to schedule transfer method
Change descriptors to hex and fix comments
Ported the code from RPCS3, with improvements made to the handling of control messages and audio transfers, Co-Authored with @mandar1jn
Missing new line chars
Co-Authored-By: mandar1jn <49076509+mandar1jn@users.noreply.github.com>
We've decided this track will never be used in the future. Releases will
continue using the "beta" branch internally, though we'll have the
user-visible strings use a different name instead.
(Note: Dolphin provided builds have always defaulted to 'beta' as the
auto-update track, so anyone who set 'stable' did so manually.)
This should be a fairly easy merge, assuming I didn’t mess anything up. TL:DR no one uses it and it’s not great.
Boot from DVD Backup is an ancient feature with origins in the Megacommit. Back then, GameCube and Wii games were quite large relative to drives of the time. For example, in 2008, the most common hard drive sizes were 320GB and 512GB. On the 320GB drive I personally had at the time, as little as 42 Wii ISOs could have filled it entirely! And that’s ignoring any other files one might want to put onto a drive. Backup DVDs allowed users to burn relatively cheap DVD media and store their GameCube and Wii dumps in a Dolphin accessible way that didn’t eat into their precious HDD space. It had compromises, even then, but in 2008… I mean honestly users probably wouldn’t even notice those compromises with how Dolphin barely even worked at all back then.
Obviously, today the storage space concerns are not as big of an issue. According to seagate the average hard drive it sells today is 8TB. For typical laptops purchased now, the -minimum- selection for storage is usually 1TB. You can even buy a name brand 4TB external hard drive for $100. GC and Wii ISOs are not as big as they once were, relatively anyway. Plus flash drives and SD cards are super cheap and way faster than disc drives ever were. For anyone that has limited drive space, removable flash media can fulfill this offloading role far better than backup DVD media ever could.
Also no one has DVD drives anymore. That’s kind of an important detail.
But to see if Booting from DVD Backup even still worked, I decided to give it a try. I have an ASUS BW-16D1HT, a badass Bluray XL reading and burning drive, connected to my Windows 11 Threadripper 5975WX machine. A super fast drive on a super fast machine is as good as it possibly can get for this feature. So I bought a spindle of DVD-Rs, burned a couple of discs and gave it a try. Surprisingly, it does still work. However, as expected, it introduces a lot of stuttering. Testing Prime 1 and Prime 3, in both games stuttering was introduced whenever the DVD Drive had to suddenly seek. Spikes of 50ms occurred constantly, but I observed 150ms and even over 1000ms stutters! The worst was a three second stutter, when loading Elysia in Prime 3. I could even hear the stutters - any time the drive suddenly made a harsh seeking noise, the game would have a hard stutter. It worked but, it has some serious compromises.
Boot from DVD Backup isn’t great, using removable flash media or external hard drives is a FAR better option for anyone with limited storage space today, and no one can even use this feature anymore because their computers don’t even have disc drives. It’s time for Boot from DVD Backup to go!
So I did my best on the cleanup but I’m bound to have left some bits. Especially in translation - I didn’t get any warnings or anything there that could help point me to where to clean that up. Please review!
See the comment added by this commit. We were previously guarding against
overshooting in address calculations, but not against undershooting.
Perhaps someone assumed that the displacement of an x86 loadstore was
treated as unsigned?
Note: While the comment says we can undershoot by up to 2 GiB, in
practice Jit64 as it currently behaves won't actually undershoot by more
than 0x8000 if my analysis is correct. But address space is cheap, so
let's guard the full 2 GiB.
This is the behavior in the x64 and ARM64 vertex loaders. I don't know if it makes sense (the whole skipped vertex system seems jank, but several games behave incorrectly without it).
This generated a warning on GCC about the operation being potentially undefined (-Wsequence-point). I'm not sure if that was actually the case, but either way it is a mistake.
Before, it was also compiled on ARM builds, but since it was unused it wasn't linked (and thus its dependency on the nonexistent x64Emitter didn't cause any link issues).
Fixes incorrect logspam when the buffer needed to be reset on flushes (which we already were doing, but 52feed04db moved it to after the check was made). This is https://bugs.dolphin-emu.org/issues/10312.
I also converted it to an assert, as if this does happen, things are going to render incorrectly, so we want to make it obvious.
This was added in #10394 for both the hardware and software backends to work around an issue with Mario Kart Wii, Fortune Street, and Baten Kaitos. However, it seems like the software renderer handles blending well enough that we don't need this (and in any case, it's easy to change blending in the software renderer).
Some experimentation with #11387 (not pushed) showed that the software renderer's logic would also produce correct results on the hardware backends with this hack removed, but would require fbfetch (currently); if a better solution is found the hack can also be removed from the hardware backends.
Otherwise, texelFetch() will use an out-of-bounds layer for game textures (that have 1 layer; EFB copies have 2 layers in stereoscopic 3D mode), which is undefined behavior (often resulting in a black image). The fast texture sampling path uses texture(), which always clamps (see https://www.khronos.org/opengl/wiki/Array_Texture#Access_in_shaders), so it was unaffected by this difference.
The former is deprecated and pretty much all modern drivers
support VK_EXT_debug_utils.
Android drivers dont support it. On those drivers,
we use the implementation provided by the validation layers.
Plus two miscellaneous debugger features that I found along the way when
reading Jit64's code for comparison: bJITNoBlockLinking and tracing.
Fixes https://bugs.dolphin-emu.org/issues/13127.
Small optimization. By not calling WriteExit, the block linking system
never finds out about the exit we're doing, saving us from having to
disable block linking.
We should expose Enable Controller Input and the turbo settings for
GBA just like we do for GameCube controllers and Wii Remotes.
I just forgot about it when implementing the GBA TAS input window.
Previously, if a user on Windows launched Dolphin from the command line
and specified a path to an M3U file and included backslashes in this path,
Dolphin would fail to resolve relative paths in the M3U file.
The calculation of each address in lmw/stmw currently has a dependency
on the calculation of the previous address. By removing this dependency,
the host CPU should be able to pipeline the loads/stores better. The cost
we pay for this is up to one extra register and one extra MOV instruction
per guest instruction, but often nothing.
Making EmitBackpatchRoutine support using any register as the address
register would let us get rid of the MOV, but I consider that to be too
big of a task to do in one go at the same time as this.
Now that we've flipped the C++20 switch, let's start making use of
the nice new <bit> header.
I'm planning on handling this move away from BitUtils.h incrementally
in a series of PRs. There may be a few functions remaining in
BitUtils.h by the end that C++20 doesn't have any equivalents for.
This reverts commit 351d095fff.
In hindsight, my attempted optimization messes with the return
predictor, unlike real tail calls. So I think it does more bad than
good.
The "vector shift by immediate" category encodes the shift amount for
right shifts as `size - amount`, whereas left shifts use `amount`.
We're not actually using SHRN/SHRN2 anywhere, which is why this has gone
undetected.
Use: callstack(0x80000000).
!callstack(value) works as a 'does not contain'.
Add strings to expr.h conditionals.
Use quotations: callstack("anim") to check symbols/name.
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.
This affected the memory and registers widgets (and possibly others). I'm pretty sure it regressed in 5f629abd8b.
The SetCodeVisible line is a new fix, but the equivalent already existed in the memory widget.
The call to analyzer.Analyze breaks when it attempts to read an instruction, as it eventually tries to read memory when Memory::m_pRAM is nullptr. Trying to read when execution is not paused in general seems like a bad idea (especially as analyzer.Analyze uses PowerPC::TryReadInstruction which can update icache - this is probably still a problem).
Operations that have two operands and can't generate a default NaN,
i.e. addition and subtraction, already have the desired NaN handling
on x86. We just need to make sure to not reverse the operands.
This fixes ps_sum0/ps_sum1 outputting NaNs in cases where they shouldn't.
(HandleNaNs assumes that a NaN in a ps0 input always results in a NaN in
the ps0 output, and correspondingly for ps1.)
1. In some cases, ps_merge01 can be implemented using one instruction.
2. When we need two instructions for ps_merge01, it's best to start with
a MOV to avoid false dependencies on the destination register.
3. ps_merge10 can be implemented using a single EXT instruction.
This regressed in 0a906f553f, I think (though I haven't confirmed it). Mario Tennis and Luigi's Mansion both use these for some reason (as far as I can tell, the data isn't actually used; it's just extra data included for no reason)
DataReader is generally jank - it has a start and end pointer, but the end pointer is generally not used, and all of the vertex loaders mostly bypassed it anyways.
Wrapper code (the vertex loaer test, as well as Fifo.cpp and OpcodeDecoding.cpp) still uses it, as does the software vertex loader (which is not a subclass of VertexLoader). These can probably be eliminated later.
This new function is like MOVP2R, except it masks out the lower 12 bits,
returning them instead of writing them to the register. These lower
12 bits can then be used as an offset for LDR/STR. This lets us turn
ADRP+ADD+LDR sequences with a zero offset into ADRP+LDR sequences with
a non-zero offset, saving one instruction.
When emulated GBAs were added to Dolphin, it was possible to control them
using the GC TAS input window. (Z was mapped to Select.) Unaware of this,
I broke the functionality in b296248.
To make it possible to control emulated GBAs using TAS input again,
I'm adding a proper TAS input window for GBAs, with a real Select button
and no analog controls.
0e02ddcf52 removed separate logic for tiled versus non-tiled EFB peek caches, and as part of that made it so that color peeks updated the frame access mask even when a non-tiled cache is in use. However, the same change was not made for depth peeks. I'm not sure if this affected anything in practice.
`ImGui::GetIO` performs an assertion that a context exists, and if one doesn't then things will likely crash. Unfortunately this crash is hard to consistently reproduce.
I recently talked to a homebrew developer who was trying to add exception
handlers at link time but found out that Dolphin was overwriting their
exception handlers. I figure that's not the usual way to do exception
handlers, but... making us load the executable after setting up memory
rather than before is easy, and matches what we do when booting discs,
so I suppose there's no reason not to do it. It also matches the intent
of why Dolphin is writing default exception handlers – we're writing
them because some homebrew relies on exception handlers being left
around from whatever program was running before it (see 3dd777be70).
Let's take advantage of ARM64's input register shifting one last time,
shall we?
Before:
0x1280005b mov w27, #-0x3
0x1b1b7f18 mul w24, w24, w27
After:
0x4b180b18 sub w24, w24, w24, lsl #2
ARM64's flexible shifting of input registers also allows us to calculate
a negative power of two in one instruction; shift the input of a NEG
instruction.
Before:
0x128001f7 mov w23, #-0x10
0x1b1a7efa mul w26, w23, w26
0x93407f58 sxtw x24, w26
After:
0x4b1a13fa neg w26, w26, lsl #4
0x93407f58 sxtw x24, w26
If the destination register doesn't equal the input register, using it
to temporarily hold the immediate value is fair game as it'll be
overwritten with the result of the multiplication anyway. This can
slightly reduce register pressure.
Before:
0x52800659 mov w25, #0x32
0x1b197f5b mul w27, w26, w25
After:
0x5280065b mov w27, #0x32
0x1b1b7f5b mul w27, w26, w27
By taking advantage of ARM64's ability to shift an input register by any
amount, we can calculate multiplication by a number that is one more
than a power of two with a single instruction.
Before:
0x52800838 mov w24, #0x41
0x1b187f7b mul w27, w27, w24
After:
0x0b1b1b7b add w27, w27, w27, lsl #6
Turn multiplications by a power of two into bitshifts.
Before:
0x52800817 mov w23, #0x40
0x1b167ef6 mul w22, w23, w22
After:
0x531a66d6 lsl w22, w22, #6
Multiplication by one is also trivial. Depending on the registers
involved, either a single MOV or no instructions will be generated.
Before:
0x52800038 mov w24, #0x1
0x1b1a7f1b mul w27, w24, w26
After:
0x2a1a03fb mov w27, w26
Before:
0x52800039 mov w25, #0x1
0x1b1a7f3a mul w26, w25, w26
After:
Nothing!
Add a new function that will handle all the special cases regarding
multiplication. It does nothing for now, but will be expanded in
follow-up commits.
We can merge an SXTW with the SUB, eliminating one instruction. In
addition, it is no longer necessary to allocate a temporary register,
reducing register pressure.
Before:
0x93407f59 sxtw x25, w26
0x93407ebb sxtw x27, w21
0xcb1b033b sub x27, x25, x27
After:
0x93407f5b sxtw x27, w26
0xcb35c37b sub x27, x27, w21, sxtw
ARM64 can do perform various types of sign and zero extension on a
register value before using it. The Arm64Emitter already had support for
this, but it was kinda hidden away.
This commit exposes the functionality by making the ExtendSpecifier enum
available everywhere and adding a new ArithOption constructor.
[ VUID-VkDescriptorPoolCreateInfo-maxSets-00301 ] Object 0:
handle = 0x7f1,b8d,3cd,e70, type = VK_OBJECT_TYPE_DEVICE; |
MessageID = 0xa1,70e,236 | vkCreateDescriptorPool():
pCreateInfo->maxSets is not greater than 0.
The Vulkan spec states: maxSets must be greater than 0
BindFramebuffer depends on the pipeline which might not be set yet.
That's why the framebuffer dirty flag exists in the first place.
I assume BindFramebuffer was called directly here, in order to handle
the texture state transitions necessary for DiscardResource.
The state is tracked anyway, so we can just issue those transitions there
too and defer binding the actual framebuffer.
Fixes an issue in Zelda Twilight Princess with EFB depth peeks.
Dolphin would bind a frame buffer which doesn't have an integer format
descriptor for the color target before binding the new pipeline.
So it would accidentally use the 0 descriptor.
Debug layer error:
D3D12 ERROR: ID3D12CommandList::OMSetRenderTargets:
Specified CPU descriptor handle ptr=0x0000000000000000 does not refer to
a location in a descriptor heap. pRenderTargetDescriptors[0] is the issue.
[ EXECUTION ERROR #646: INVALID_DESCRIPTOR_HANDLE]
Fixes the following error in the D3D12 debug layer:
D3D12 WARNING: ID3D12Device::CreateCommittedResource:
Ignoring InitialState D3D12_RESOURCE_STATE_UNORDERED_ACCESS.
Buffers are effectively created in state D3D12_RESOURCE_STATE_COMMON.
[ STATE_CREATION WARNING #1328: CREATERESOURCE_STATE_IGNORED]
Fixes the following error in the D3D12 debug layer:
D3D12 ERROR: ID3D12DescriptorHeap::GetGPUDescriptorHandleForHeapStart:
GetGPUDescriptorHandleForHeapStart is invalid to call on a descriptor
heap that does not have DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE set.
If the heap is not supposed to be shader visible, then
GetCPUDescriptorHandleForHeapStart would be the appropriate method
to call. That call is valid both for shader visible and non shader
visible descriptor heaps.
[ STATE_GETTING ERROR #1315: DESCRIPTOR_HEAP_NOT_SHADER_VISIBLE]
When searching for a disc where the revision doesn't match any disc in
the datfile, the loop would never get to the part where serials_exist is
set to true, leading to a bogus error message.
Because of the previous commit, `regs_in_use` must not include `dest_reg`
when calling MMIOLoadToReg. There are also some other registers we can
skip including in regs_in_use just for efficiency's sake.
The `addr_reg_set = false` statements that I've added in this commit are
technically redundant – if `mmio_address` is non-zero then `addr_reg_set`
is already false – but it's just a coincidence that that's the case.
I originally added these in 2b1d1038a6, for both the TPipelineFunction and the size. The size was moved into the header in fdcd2b7d00 (making the size functions obsolete), but it seems that the functions themselves are no longer needed now.
I think I didn't use this approach before because it would have required ComponentFormatTable and ComponentCountRow to be templated, which would end up resulting in lines that were too long and thus wrapped in awkward places. (I *think* they didn't get inferred properly.) Now that we only need TPipelineFunction, the templating is not needed, and this ends up being a more readable version of the version with the wrapper functions.