Gives two members without explicit initialization default values to be
consistent with the rest of the class and also ensuring deterministic
values on construction.
Though less important compared to #11470, save states also show the full path in the OSD message and could potentially dox a streamer who is playing in Dolphin. This is a simple fix - it removes the path from the message and only displays the file name.
Capitalize Skylander in tr strings
Lint and validation method fixes
Proper Attach and Change Interface method
Re-jig code to exit early and read easier
I haven't tested this extensively on real hardware, but I do know that bad things happen if the address isn't properly aligned, and libogc says it should be 32-byte aligned.
Turns out all OSD messages, every single one, are written to the titlebar. We've just never seen them because the FPS is in the title bar and it replaces it in a fraction of a second. This was only visible when saving savestates because it halts emulation for a moment while writing.
This is dumb, let's not do that anymore.
A few weeks ago, a vtuber tweeted that they had to remove a vod of their stream because Dolphin Emulator showed some personal information during the steam, and left a warning to everyone else that Dolphin shows the account name of the computer. And yea, we do, we show the full directory of the memory card every time a memory card is written, and due to mandatory Microsoft account nonsense, that is very likely to contain someone’s real name.
Fortunately this is very easy for us to solve. This change simply removes the filename from wrote memory card contents string. That’s it. All functionality of the wrote memory card OSD message remains the same, it just doesn’t say where the memory card is anymore.
There are lots of other potential solutions to this but after talking on IRC it seems the simplest one is the best.
Skylander code tidy ups
Convert c array to std::array and fix comments
Formatting fixes/review changes
Variable comment
Migrate portal to System Impl and code tidy ups
Use struct
Restore review changes
Minor fix to schedule transfer method
Change descriptors to hex and fix comments
Ported the code from RPCS3, with improvements made to the handling of control messages and audio transfers, Co-Authored with @mandar1jn
Missing new line chars
Co-Authored-By: mandar1jn <49076509+mandar1jn@users.noreply.github.com>
See the comment added by this commit. We were previously guarding against
overshooting in address calculations, but not against undershooting.
Perhaps someone assumed that the displacement of an x86 loadstore was
treated as unsigned?
Note: While the comment says we can undershoot by up to 2 GiB, in
practice Jit64 as it currently behaves won't actually undershoot by more
than 0x8000 if my analysis is correct. But address space is cheap, so
let's guard the full 2 GiB.
Plus two miscellaneous debugger features that I found along the way when
reading Jit64's code for comparison: bJITNoBlockLinking and tracing.
Fixes https://bugs.dolphin-emu.org/issues/13127.
Small optimization. By not calling WriteExit, the block linking system
never finds out about the exit we're doing, saving us from having to
disable block linking.
Previously, if a user on Windows launched Dolphin from the command line
and specified a path to an M3U file and included backslashes in this path,
Dolphin would fail to resolve relative paths in the M3U file.
The calculation of each address in lmw/stmw currently has a dependency
on the calculation of the previous address. By removing this dependency,
the host CPU should be able to pipeline the loads/stores better. The cost
we pay for this is up to one extra register and one extra MOV instruction
per guest instruction, but often nothing.
Making EmitBackpatchRoutine support using any register as the address
register would let us get rid of the MOV, but I consider that to be too
big of a task to do in one go at the same time as this.
Now that we've flipped the C++20 switch, let's start making use of
the nice new <bit> header.
I'm planning on handling this move away from BitUtils.h incrementally
in a series of PRs. There may be a few functions remaining in
BitUtils.h by the end that C++20 doesn't have any equivalents for.
This reverts commit 351d095fff.
In hindsight, my attempted optimization messes with the return
predictor, unlike real tail calls. So I think it does more bad than
good.
Use: callstack(0x80000000).
!callstack(value) works as a 'does not contain'.
Add strings to expr.h conditionals.
Use quotations: callstack("anim") to check symbols/name.
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.
Operations that have two operands and can't generate a default NaN,
i.e. addition and subtraction, already have the desired NaN handling
on x86. We just need to make sure to not reverse the operands.
This fixes ps_sum0/ps_sum1 outputting NaNs in cases where they shouldn't.
(HandleNaNs assumes that a NaN in a ps0 input always results in a NaN in
the ps0 output, and correspondingly for ps1.)
1. In some cases, ps_merge01 can be implemented using one instruction.
2. When we need two instructions for ps_merge01, it's best to start with
a MOV to avoid false dependencies on the destination register.
3. ps_merge10 can be implemented using a single EXT instruction.
This new function is like MOVP2R, except it masks out the lower 12 bits,
returning them instead of writing them to the register. These lower
12 bits can then be used as an offset for LDR/STR. This lets us turn
ADRP+ADD+LDR sequences with a zero offset into ADRP+LDR sequences with
a non-zero offset, saving one instruction.
When emulated GBAs were added to Dolphin, it was possible to control them
using the GC TAS input window. (Z was mapped to Select.) Unaware of this,
I broke the functionality in b296248.
To make it possible to control emulated GBAs using TAS input again,
I'm adding a proper TAS input window for GBAs, with a real Select button
and no analog controls.
I recently talked to a homebrew developer who was trying to add exception
handlers at link time but found out that Dolphin was overwriting their
exception handlers. I figure that's not the usual way to do exception
handlers, but... making us load the executable after setting up memory
rather than before is easy, and matches what we do when booting discs,
so I suppose there's no reason not to do it. It also matches the intent
of why Dolphin is writing default exception handlers – we're writing
them because some homebrew relies on exception handlers being left
around from whatever program was running before it (see 3dd777be70).
Let's take advantage of ARM64's input register shifting one last time,
shall we?
Before:
0x1280005b mov w27, #-0x3
0x1b1b7f18 mul w24, w24, w27
After:
0x4b180b18 sub w24, w24, w24, lsl #2
ARM64's flexible shifting of input registers also allows us to calculate
a negative power of two in one instruction; shift the input of a NEG
instruction.
Before:
0x128001f7 mov w23, #-0x10
0x1b1a7efa mul w26, w23, w26
0x93407f58 sxtw x24, w26
After:
0x4b1a13fa neg w26, w26, lsl #4
0x93407f58 sxtw x24, w26
If the destination register doesn't equal the input register, using it
to temporarily hold the immediate value is fair game as it'll be
overwritten with the result of the multiplication anyway. This can
slightly reduce register pressure.
Before:
0x52800659 mov w25, #0x32
0x1b197f5b mul w27, w26, w25
After:
0x5280065b mov w27, #0x32
0x1b1b7f5b mul w27, w26, w27
By taking advantage of ARM64's ability to shift an input register by any
amount, we can calculate multiplication by a number that is one more
than a power of two with a single instruction.
Before:
0x52800838 mov w24, #0x41
0x1b187f7b mul w27, w27, w24
After:
0x0b1b1b7b add w27, w27, w27, lsl #6
Turn multiplications by a power of two into bitshifts.
Before:
0x52800817 mov w23, #0x40
0x1b167ef6 mul w22, w23, w22
After:
0x531a66d6 lsl w22, w22, #6
Multiplication by one is also trivial. Depending on the registers
involved, either a single MOV or no instructions will be generated.
Before:
0x52800038 mov w24, #0x1
0x1b1a7f1b mul w27, w24, w26
After:
0x2a1a03fb mov w27, w26
Before:
0x52800039 mov w25, #0x1
0x1b1a7f3a mul w26, w25, w26
After:
Nothing!
Add a new function that will handle all the special cases regarding
multiplication. It does nothing for now, but will be expanded in
follow-up commits.
We can merge an SXTW with the SUB, eliminating one instruction. In
addition, it is no longer necessary to allocate a temporary register,
reducing register pressure.
Before:
0x93407f59 sxtw x25, w26
0x93407ebb sxtw x27, w21
0xcb1b033b sub x27, x25, x27
After:
0x93407f5b sxtw x27, w26
0xcb35c37b sub x27, x27, w21, sxtw
Because of the previous commit, `regs_in_use` must not include `dest_reg`
when calling MMIOLoadToReg. There are also some other registers we can
skip including in regs_in_use just for efficiency's sake.
The `addr_reg_set = false` statements that I've added in this commit are
technically redundant – if `mmio_address` is non-zero then `addr_reg_set`
is already false – but it's just a coincidence that that's the case.
The old calculation was stride * (max_index + 1), which fails if stride is less than the size of a component (for instance, if float XYZ positions are used, and the stride was set to 4 (i.e. sizeof(float)) instead of 12 (i.e. 3 * sizeof(float)), it would be missing the last 8 bytes of the final element in the array. Or, if stride was set to 0, then no bytes would be recorded at all (though that's not a useful configuration so it's unlikely to actually exist).
I'm not aware of any games affected by this issue.
This should fix recording the wall in the staircase leading to the basement in Luigi's Mansion (though I haven't tested it, as I don't own a copy of Luigi's Mansion). This uses NormalIndex3, and the index for the normal vector (generally 0x02XX or 0x01XX) there is always lower than the tangent or binormal (generally 0x07XX). Other games seem to usually have a similar range of indices for the normal, tangent, and binormal, so this issue wouldn't affect them.
In most cases, games will use the same type for all vertex components (either Index8 or Index16 or Direct). However, RS2's deflection towers use Index16 for the texture coordinate and Index8 for everything else, meaning the texture coordinates were recorded incorrectly (the first byte was used, so only indices 0 and 1 were recorded instead of 0 through 0x0192). Worse still, some background elements in RS2 use direct positions but indexed normals or texture coordinates, and those would not be recorded at all.
This is a regression from b5fd35f951.
The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
The new `dispatcher_no_timing_check` is the same as `dispatcher_no_check`
except it includes the "stepping check" in debug mode. This lets us avoid
the `m_enable_debugging ? dispatcher : dispatcher_no_check` dance.
Maybe "tail call" isn't quite the right term for what this code
is doing, since it's jumping to the dispatcher rather than
returning, but it's the same optimization as for a tail call.
fregsIn will include FD for double-precision instructions, since for
dependency tracking purposes the instruction does read the upper
half of FD. This is not what we want in HandleNaNs.
The consequence of this bug is that if an instruction was supposed to
output a NaN and FD happens to contain a NaN and FD happens to be the
same register as an unused register in the instruction encoding, the
NaN in FD could get used as the output instead of the correct NaN.
This isn't known to affect any games, which isn't especially surprising
considering that there's only one game that needs AccurateNaNs anyway.
Jumping to `dispatcher` requires first subtracting the downcount,
otherwise `dispatcher` may unpredictably jump to CoreTiming::Advance,
which could break determinism compatibility with JitArm64. We should
jump to `dispatcher_no_check` instead.
The breakpoint check in Jit.cpp makes it redundant.
Normally this redundant check doesn't cause any issues, but if you
create a breakpoint and enable logging without breaking, you get two
log messages if the breakpoint is at the beginning of a block. See
https://bugs.dolphin-emu.org/issues/13044.
This is also a tiny performance improvement for when debugging is
active, since we no longer check for breakpoints for blocks that never
had any breakpoints to begin with.
base is an unsigned variable, so we can make things little more
consistent by making the loop index unsigned so we aren't doing bit
arithmetic with signed types.
MemoryInterface already does this, so we can leave it alone.
No behavioral changes, just a consistency thing.
Micro-optimization. Some CPUs can fuse CMP+B, TST+B, arith+CBZ, etc.
I also moved things around for CMP+CSET and TST+CSET - which I'm not sure
if any CPUs support - but it doesn't hurt anything, so I might as well.
Improves accuracy but isn't known to affect any games.
This turned out to be fairly convenient to implement; ORing with the
PPC default NaN will quieten SNaNs and do nothing to QNaNs.
This existed in the initial megacommit (though I don't know why) as IO_SIZE. It was used in Memmap's Init() to compute totalMemSize, but I don't know if it actually did anything then. That use was removed in 2d0f714546, but the constant persisted until cc858c63b8, when it became a static variable.
This was added in 385d8e2b15, but became somewhat redundant with Do in 4c7bbd96e4, and completely redundant now that std::is_trivially_copyable_v is well-supported.
This lets the TAS input code use a higher-level interface for
overriding inputs instead of having to fiddle with raw bits.
WiiTASInputWindow in particular was messy with how much
controller code it had to re-implement.
Fixes a Rogue Squadron II regression from 9d73583.
This set_dirty stuff is pretty tricky to reason about. I thought I
was clever when coming up with set_dirty, but maybe I was too clever
for my own good...
In case the register we're binding is the same as the immediate register,
we should fetch the immediate before calling BindToRegister. The way
the register cache currently works, calling GetImm after BindToRegister
actually does work, but it's better to not rely on it.
Tested on an official DOL-014 (251 blocks) memory card by executing the
0xf4 command on a card with content along its entire length and then
dumping the whole card: it reads as 0xff all the way through.
Therefor, the current implementation is already consistent with hardware.
Texture dumping can already be done using VideoCommon's system (and in fact the same setting already enabled *both* of these). Dumping objects/TEV stages/texture fetches doesn't currently have an equivalent, but could be added to the FIFO player instead.
A (partial) port of #9481 to ARM64. This commit adds special cases for
immediate values equal to 0 or 0xFFFFFFFF, allowing for more efficient
or no code to be generated.
When a guest register is an immediate, it may be necessary to move this
value into a register. This is handled by gpr.R(), which lacks context
on how the register will be used. This leads to cases where the
immediate is written to a register, only for it to be overwritten. Take
for example this code generated by srwx:
0x5280031b mov w27, #0x18
0x53187edb lsr w27, w22, #24
gpr.BindToRegister() does have this context through the do_load
parameter, but didn't handle immediates. By adding this logic, we can
intelligently skip the write when do_load is false.