This new function is like MOVP2R, except it masks out the lower 12 bits,
returning them instead of writing them to the register. These lower
12 bits can then be used as an offset for LDR/STR. This lets us turn
ADRP+ADD+LDR sequences with a zero offset into ADRP+LDR sequences with
a non-zero offset, saving one instruction.
When emulated GBAs were added to Dolphin, it was possible to control them
using the GC TAS input window. (Z was mapped to Select.) Unaware of this,
I broke the functionality in b296248.
To make it possible to control emulated GBAs using TAS input again,
I'm adding a proper TAS input window for GBAs, with a real Select button
and no analog controls.
I recently talked to a homebrew developer who was trying to add exception
handlers at link time but found out that Dolphin was overwriting their
exception handlers. I figure that's not the usual way to do exception
handlers, but... making us load the executable after setting up memory
rather than before is easy, and matches what we do when booting discs,
so I suppose there's no reason not to do it. It also matches the intent
of why Dolphin is writing default exception handlers – we're writing
them because some homebrew relies on exception handlers being left
around from whatever program was running before it (see 3dd777be70).
Let's take advantage of ARM64's input register shifting one last time,
shall we?
Before:
0x1280005b mov w27, #-0x3
0x1b1b7f18 mul w24, w24, w27
After:
0x4b180b18 sub w24, w24, w24, lsl #2
ARM64's flexible shifting of input registers also allows us to calculate
a negative power of two in one instruction; shift the input of a NEG
instruction.
Before:
0x128001f7 mov w23, #-0x10
0x1b1a7efa mul w26, w23, w26
0x93407f58 sxtw x24, w26
After:
0x4b1a13fa neg w26, w26, lsl #4
0x93407f58 sxtw x24, w26
If the destination register doesn't equal the input register, using it
to temporarily hold the immediate value is fair game as it'll be
overwritten with the result of the multiplication anyway. This can
slightly reduce register pressure.
Before:
0x52800659 mov w25, #0x32
0x1b197f5b mul w27, w26, w25
After:
0x5280065b mov w27, #0x32
0x1b1b7f5b mul w27, w26, w27
By taking advantage of ARM64's ability to shift an input register by any
amount, we can calculate multiplication by a number that is one more
than a power of two with a single instruction.
Before:
0x52800838 mov w24, #0x41
0x1b187f7b mul w27, w27, w24
After:
0x0b1b1b7b add w27, w27, w27, lsl #6
Turn multiplications by a power of two into bitshifts.
Before:
0x52800817 mov w23, #0x40
0x1b167ef6 mul w22, w23, w22
After:
0x531a66d6 lsl w22, w22, #6
Multiplication by one is also trivial. Depending on the registers
involved, either a single MOV or no instructions will be generated.
Before:
0x52800038 mov w24, #0x1
0x1b1a7f1b mul w27, w24, w26
After:
0x2a1a03fb mov w27, w26
Before:
0x52800039 mov w25, #0x1
0x1b1a7f3a mul w26, w25, w26
After:
Nothing!
Add a new function that will handle all the special cases regarding
multiplication. It does nothing for now, but will be expanded in
follow-up commits.
We can merge an SXTW with the SUB, eliminating one instruction. In
addition, it is no longer necessary to allocate a temporary register,
reducing register pressure.
Before:
0x93407f59 sxtw x25, w26
0x93407ebb sxtw x27, w21
0xcb1b033b sub x27, x25, x27
After:
0x93407f5b sxtw x27, w26
0xcb35c37b sub x27, x27, w21, sxtw
Because of the previous commit, `regs_in_use` must not include `dest_reg`
when calling MMIOLoadToReg. There are also some other registers we can
skip including in regs_in_use just for efficiency's sake.
The `addr_reg_set = false` statements that I've added in this commit are
technically redundant – if `mmio_address` is non-zero then `addr_reg_set`
is already false – but it's just a coincidence that that's the case.
The old calculation was stride * (max_index + 1), which fails if stride is less than the size of a component (for instance, if float XYZ positions are used, and the stride was set to 4 (i.e. sizeof(float)) instead of 12 (i.e. 3 * sizeof(float)), it would be missing the last 8 bytes of the final element in the array. Or, if stride was set to 0, then no bytes would be recorded at all (though that's not a useful configuration so it's unlikely to actually exist).
I'm not aware of any games affected by this issue.
This should fix recording the wall in the staircase leading to the basement in Luigi's Mansion (though I haven't tested it, as I don't own a copy of Luigi's Mansion). This uses NormalIndex3, and the index for the normal vector (generally 0x02XX or 0x01XX) there is always lower than the tangent or binormal (generally 0x07XX). Other games seem to usually have a similar range of indices for the normal, tangent, and binormal, so this issue wouldn't affect them.
In most cases, games will use the same type for all vertex components (either Index8 or Index16 or Direct). However, RS2's deflection towers use Index16 for the texture coordinate and Index8 for everything else, meaning the texture coordinates were recorded incorrectly (the first byte was used, so only indices 0 and 1 were recorded instead of 0 through 0x0192). Worse still, some background elements in RS2 use direct positions but indexed normals or texture coordinates, and those would not be recorded at all.
This is a regression from b5fd35f951.
The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
The new `dispatcher_no_timing_check` is the same as `dispatcher_no_check`
except it includes the "stepping check" in debug mode. This lets us avoid
the `m_enable_debugging ? dispatcher : dispatcher_no_check` dance.
Maybe "tail call" isn't quite the right term for what this code
is doing, since it's jumping to the dispatcher rather than
returning, but it's the same optimization as for a tail call.
fregsIn will include FD for double-precision instructions, since for
dependency tracking purposes the instruction does read the upper
half of FD. This is not what we want in HandleNaNs.
The consequence of this bug is that if an instruction was supposed to
output a NaN and FD happens to contain a NaN and FD happens to be the
same register as an unused register in the instruction encoding, the
NaN in FD could get used as the output instead of the correct NaN.
This isn't known to affect any games, which isn't especially surprising
considering that there's only one game that needs AccurateNaNs anyway.
Jumping to `dispatcher` requires first subtracting the downcount,
otherwise `dispatcher` may unpredictably jump to CoreTiming::Advance,
which could break determinism compatibility with JitArm64. We should
jump to `dispatcher_no_check` instead.
The breakpoint check in Jit.cpp makes it redundant.
Normally this redundant check doesn't cause any issues, but if you
create a breakpoint and enable logging without breaking, you get two
log messages if the breakpoint is at the beginning of a block. See
https://bugs.dolphin-emu.org/issues/13044.
This is also a tiny performance improvement for when debugging is
active, since we no longer check for breakpoints for blocks that never
had any breakpoints to begin with.
base is an unsigned variable, so we can make things little more
consistent by making the loop index unsigned so we aren't doing bit
arithmetic with signed types.
MemoryInterface already does this, so we can leave it alone.
No behavioral changes, just a consistency thing.
Micro-optimization. Some CPUs can fuse CMP+B, TST+B, arith+CBZ, etc.
I also moved things around for CMP+CSET and TST+CSET - which I'm not sure
if any CPUs support - but it doesn't hurt anything, so I might as well.
Improves accuracy but isn't known to affect any games.
This turned out to be fairly convenient to implement; ORing with the
PPC default NaN will quieten SNaNs and do nothing to QNaNs.
This existed in the initial megacommit (though I don't know why) as IO_SIZE. It was used in Memmap's Init() to compute totalMemSize, but I don't know if it actually did anything then. That use was removed in 2d0f714546, but the constant persisted until cc858c63b8, when it became a static variable.
This was added in 385d8e2b15, but became somewhat redundant with Do in 4c7bbd96e4, and completely redundant now that std::is_trivially_copyable_v is well-supported.
This lets the TAS input code use a higher-level interface for
overriding inputs instead of having to fiddle with raw bits.
WiiTASInputWindow in particular was messy with how much
controller code it had to re-implement.
Fixes a Rogue Squadron II regression from 9d73583.
This set_dirty stuff is pretty tricky to reason about. I thought I
was clever when coming up with set_dirty, but maybe I was too clever
for my own good...
In case the register we're binding is the same as the immediate register,
we should fetch the immediate before calling BindToRegister. The way
the register cache currently works, calling GetImm after BindToRegister
actually does work, but it's better to not rely on it.
Tested on an official DOL-014 (251 blocks) memory card by executing the
0xf4 command on a card with content along its entire length and then
dumping the whole card: it reads as 0xff all the way through.
Therefor, the current implementation is already consistent with hardware.
Texture dumping can already be done using VideoCommon's system (and in fact the same setting already enabled *both* of these). Dumping objects/TEV stages/texture fetches doesn't currently have an equivalent, but could be added to the FIFO player instead.
A (partial) port of #9481 to ARM64. This commit adds special cases for
immediate values equal to 0 or 0xFFFFFFFF, allowing for more efficient
or no code to be generated.
When a guest register is an immediate, it may be necessary to move this
value into a register. This is handled by gpr.R(), which lacks context
on how the register will be used. This leads to cases where the
immediate is written to a register, only for it to be overwritten. Take
for example this code generated by srwx:
0x5280031b mov w27, #0x18
0x53187edb lsr w27, w22, #24
gpr.BindToRegister() does have this context through the do_load
parameter, but didn't handle immediates. By adding this logic, we can
intelligently skip the write when do_load is false.
Fixes https://bugs.dolphin-emu.org/issues/13017. With uCode switching, the existing instance of AXUCode is re-activated when GBAUCode is done, but if the state remains as WaitingForNextTask, it won't be able to do anything. Instead, it needs to be in WaitingForCmdListSize.
(When the AX uCode is resumed, startpc is set to 0x0030, at least for 0x07f88145; this is the same location as MAIL_RESUME jumps to, so DSP_RESUME should be sent when the resuming happens; that's already handled by AXUCode::Update.)
dir_path is used by PanicAlertFormatT, which prior to PR 10209 used a
lambda. Before c++20, referring to structured bindings in lambda captures
was forbidden. The problem is now doubly fixed, so put the structured
binding back in.
Fixes the Dolphin bug mentioned in
https://github.com/dolphin-emu/hwtests/issues/45.
Because this doesn't fix any observed behavior in games (no, 1080°
Avalanche isn't affected), I haven't implemented this in the JITs,
so as to not cause unnecessary performance degradations.
This command does not upload the MAIN buffers to CPU memory. This was
functionally fixed in f11a40f858 without
updating the comments and variable names.
Previously, we had WBFS and CISO which both returned an upper bound
of the size, and other formats which returned an accurate size. But
now we also have NFS, which returns a lower bound of the size. To
allow VolumeVerifier to make better informed decisions for NFS, let's
use an enum instead of a bool for the type of data size a blob has.
For a few years now, I've been thinking it would be nice to make Dolphin
support reading Wii games in the format they come in when you download
them from the Wii U eShop. The Wii U eShop has some good deals on Wii
games (Metroid Prime Trilogy especially is rather expensive if you try
to buy it physically!), and it's the only place right now where you can
buy Wii games digitally.
Of course, Nintendo being Nintendo, next year they're going to shut down
this only place where you can buy Wii games digitally. I kind of wish I
had implemented this feature earlier so that people would've had ample
time to buy the games they want, but... better late than never, right?
I used MIT-licensed code from the NOD library as a reference when
implementing this. None of the code has been directly copied, but
you may notice that the names of the struct members are very similar.
c1635245b8/lib/DiscIONFS.cpp
Needed for the next commit. NFS disc images are hashed but not encrypted.
While we're at it, also get rid of SupportsIntegrityCheck.
It does the same thing as old IsEncryptedAndHashed and new HasWiiHashes.
CARDUCode, GBAUCode, and INITUCode previously didn't have an implementation of it. In practice it's unlikely that this caused an issue, since these uCodes are only active for a few frames at most, but now that GBAUCode doesn't have global state, we can implement it there. I also implemented it for CARDUCode, although our CARDUCode implementation does not have all states handled yet - this is simply future-proofing so that when the card uCode is properly implemented, the save state version does not need to be bumped. INITUCode does not have any state to save, though.
The accuracy improvements are:
* The request mail must be 0xabba0000 exactly; both the low and high parts are checked
* The address is masked with 0x0fffffff
* Before, the global state meant that after the GBA uCode had been used once, it would accept 0xcdd1 commands immediately. Now, it only accepts them after execution has finished.
These lookup tables total 4 megabytes, and contain data that's entirely redundant to the actual cache state (as part of an optimization, though I'm not sure whether the optimization actually is useful). This change instead recomputes these lookup tables when loading the state (which involves filling the lookup table with a marker (0xff), and then setting the 128 * 8 valid entries (1 kilobyte)).
Before, we used a replace hook and didn't write anything there. Now, we write a BLR instruction to immediately return, and then use a start hook. This makes the behavior a bit clearer (though it shoudln't matter in practice).
All of our BBA options are technically built in, so it made the BBA
Built In option kind of confusing as to what it did. So rename it to
BBA HLE to make it more clear what it is doing and why it doesn't need a
TAP.
I'm not sure what the XMM0 check was supposed to be, but the 0xCC008000 one is for the fifo and is handled elsewhere now (look for `optimizeGatherPipe`).
We currently have two different code paths for initializing controllers:
Either the frontend (DolphinQt) can do it, or if the frontend doesn't do
it, the core will do it automatically when booting. Having these two
paths has caused problems in the past due to only one frontend being
tested (see de7ef47548). I would like to get rid of the latter path to
avoid further problems like this.
The movie config layer is not active for recording, only playback. Thus, recording ends up stuck with default SYSCONF settings.
The fix is simply to add in the movie config layer when recording. The way it's done is a bit hacky, but seems to work.
This also changes the behavior for the invalid gamma value, which was confirmed to behave the same as 2.2.
Note that currently, the gamma value is only used for XFB copies, even though hardware testing indicates it also works for EFB copies. This will be changed in a later commit.
Before, Free Look would accept background input by default, which means it was easy to accidentally move the camera while typing in another window. (This is because HotkeyScheduler::Run sets the input gate to `true` after it's copied the hotkey state, supposedly for other threads (though `SetInputGate` uses a `thread_local` variable so I'm not 100% sure that's correct) and for the GBA windows (which always accept unfocused input, presumably because they won't be focused normally).
If a 64-bit register is passed to WriteConditionalExceptionExit,
the LDR instruction in it will read too much data. This seems
to be harmless right now, but causes problem in one of my PRs.
This should reduce (but not completely eliminate) gradual audio desyncs in dumps. This also allows for accurate sample rates for the GameCube.
Completely eliminating gradual audio desyncs will require resampling to an integer sample rate, as nothing seems to support a non-integer sample rate.
These values were obtained by setting a breakpoint at a game's entry point, and then observing the register values with Dolphin's register widget.
There are other registers that aren't handled by this PR, including CR, XER, SRR0, SRR1, and "Int Mask" (as well as most of the GPRs). They could be added in a later PR if it turns out that their values matter, but probably most of them don't.
This fixes Datel titles booting with the IPL skipped (see https://bugs.dolphin-emu.org/issues/8223), though when booted this way they are currently missing textures. Due to somewhat janky code, Datel overwrites the syscall interrupt handler and then immediately triggers it (with the `sc` instruction) before they restore the correct one. This works on real hardware due to icache, and also works in Dolphin when the IPL runs due to icache, but prior to this change `HID0.ICE` defaulted to 0 so icache was not enabled when the IPL was skipped.
DSPHLE::Initialize sets the halt and init bits to true (i.e. m_dsp_control.Hex starts as 0x804), which is reasonable behavior (this is the state the DSP will be in when starting a game from the IPL, as after `__OSStopAudioSystem` the control register is 0x804).
However, CMailHandler::m_halted defaults to false, and we only call CMailHandler::SetHalted in DSPHLE::DSP_WriteControlRegister when m_dsp_control.DSPHalt changes, so since DSPHalt defaults to true, if the first thing that happens is writing true to DSPHalt, we won't properly halt the mail handler.
Now, we call CMailHandler::SetHalted on startup. This fixes Datel titles when the IPL is skipped with DSP HLE (though this configuration only works once https://bugs.dolphin-emu.org/issues/8223 is fixed).
This fixes booting Datel titles with DSPHLE (see https://bugs.dolphin-emu.org/issues/12943). Datel messed up their DSP initialization code, so it only works by receiving a mail later on, but if halting isn't implemented then it receives the mail too early and hangs.
It's cleared whenever the uCode changes, so there's no reason to clear it in a destructor or during initialization.
I've also renamed it to ClearPending.
The # option means that 0x is prepended already, so the old code resulted in 0x0xDEADBEEF instead of the intended 0xDEADBEEF. WriteMailboxLow was already correct.
Before, both 1441 and 147f would disassemble as `lsr $acc0, #1`, when the second should be `lsr $acc0, #-1`, and both 14c1 and 14ff would be `asr $acc0, #1` when the second should be `asr $acc0, #-1`. I'm not entirely sure whether the minus signs actually make sense here, but this change is consistent with the assembler so that's an improvement at least.
devkitPro previously changed the formatting to not require negative signs for lsr and asr; this is probably something we should do in the future: 8a65c85c9b
This fixes the HermesText and HermesBinary tests (HermesText already wrote `lsr $ACC0, #-5`, so this is consistent with what it used before.)
For instance, ending with 0x009e (which you can do with CW 0x009e) indicates a LRI $ac0.m instruction, but there is no immediate value to load, so before whatever garbage in memory existed after the end of the file was used.
The bounds-checking also previously assumed that IRAM or IROM was being used, both of which were exactly 0x1000 long.
X30 is used in fewer situations than the comment was claiming.
(I think that when I wrote the comment I was counting the use of X30
as a temp variable in the slowmem code as clobbering X30, but that
happens after pushing X30, so it doesn't actually get clobbered.)
This is used when fastmem isn't available. Instead of always falling
back to the C++ code in MMU.cpp, the JIT translates addresses on its
own by looking them up in a table that Dolphin constructs. This is
slower than fastmem, but faster than the old non-fastmem code.
This is primarily useful for iOS, since that's the only major platform
nowadays where you can't reliably get fastmem. I think it would make
sense to merge this feature to master despite this, since there's
nothing actually iOS-specific about the feature. It would be of use
for me when I have to disable fastmem to stop Android Studio from
constantly breaking on segfaults, for instance.
Co-authored-by: OatmealDome <julian@oatmealdome.me>
Instead, saturate in OpReadRegister, as all uses of OpReadRegisterAndSaturate called OpReadRegister for other registers (and there isn't anything that writes to $ac0.m or $ac1.m without saturation).
There were 3 bugs here:
- The input register for the full register wasn't actually being used; it was read into RCX but RCX wasn't used by Update_SR_Register16_OverS32 (except as a scratch register). The way the DSP LLE recompiler uses registers is in general confusing, so this commit changes a few uses to have a variable for the register being used, to make code a bit more readable. (Default parameter values were also removed so that they needed to be explicitly specified).
- Update_SR_Register16 was doing a 64-bit test, when it should have been doing a 16-bit test. For the most part this doesn't matter due to sign-extension, but it does come up with e.g. `ORI` or `ANDI`.
- Update_SR_Register16_OverS32 did the over s32 check, and then called Update_SR_Register16. Update_SR_Register16 masks $sr with ~SR_CMP_MASK, clearing the over s32 bit. Now the over s32 check is performed after calling Update_SR_Register16 (without masking a second time). No official uCode cares about the over s32 bit.
We don't have anything called $amD, though we do have $acsD. However, these instructions affect flags based on the whole accumulator, so it's better to just use $acD.
For more information, ApplyWriteBackLog, WriteToBackLog, and ZeroWriteBackLog were added in b787f5f8f7 and the explanatory comment was added in fd40513fed, although it did not mention the specific instructions that could trigger this edge case. The statements about which registers can be written by main opcodes and extension opcodes are based on my own checking of all instructions in the manual.
It's been unused since DolphinWX was removed in 44b22c90df. Prior to that, it was used in Source/Core/DolphinWX/NetPlay/NetWindow.cpp. But the new equivalent in Source/Core/DolphinQt/NetPlay/NetPlayDialog.cpp uses NetPlayClient::GetPlayers instead. Stringifying (or creating a table, as is done now) should be done by the UI in any case.
Among other things, this trims trailing newline characters. Before (on windows) the \r would corrupt the output and make them very hard to understand (as the error message would be drawn over the code line, but part of the code line would peek out from behind it).
Page faults should only occur on architectures that support exception
handlers, so skip the test on other architectures to avoid spurious test
failures.
On GameCube, a ramp bit has no effect if its corresponding channel is
inactive. On Wii however, enabling just the ramp implicitly also enables
the channel. AXSetVoiceMix() never does that, so this commit should have
no impact on games unless they fiddle with the mixer control value
directly.
This refactorization is done just to match the order that I made
WriteToHardware use in 543ed8a. For WriteToHardware, it's important that
things like MMIO and gather pipe are handled before we reach a special
piece of code that only should get triggered for writes that hit memory
directly, but for ReadFromHardware we don't have any code like that.
This fixes a problem where Dolphin could crash if a non-translated
read crossed the end of a physical memory region.
The same change was applied to WriteToHardware in ecbce0a.