I recently talked to a homebrew developer who was trying to add exception
handlers at link time but found out that Dolphin was overwriting their
exception handlers. I figure that's not the usual way to do exception
handlers, but... making us load the executable after setting up memory
rather than before is easy, and matches what we do when booting discs,
so I suppose there's no reason not to do it. It also matches the intent
of why Dolphin is writing default exception handlers – we're writing
them because some homebrew relies on exception handlers being left
around from whatever program was running before it (see 3dd777be70).
We can merge an SXTW with the SUB, eliminating one instruction. In
addition, it is no longer necessary to allocate a temporary register,
reducing register pressure.
Before:
0x93407f59 sxtw x25, w26
0x93407ebb sxtw x27, w21
0xcb1b033b sub x27, x25, x27
After:
0x93407f5b sxtw x27, w26
0xcb35c37b sub x27, x27, w21, sxtw
Because of the previous commit, `regs_in_use` must not include `dest_reg`
when calling MMIOLoadToReg. There are also some other registers we can
skip including in regs_in_use just for efficiency's sake.
The `addr_reg_set = false` statements that I've added in this commit are
technically redundant – if `mmio_address` is non-zero then `addr_reg_set`
is already false – but it's just a coincidence that that's the case.
The old calculation was stride * (max_index + 1), which fails if stride is less than the size of a component (for instance, if float XYZ positions are used, and the stride was set to 4 (i.e. sizeof(float)) instead of 12 (i.e. 3 * sizeof(float)), it would be missing the last 8 bytes of the final element in the array. Or, if stride was set to 0, then no bytes would be recorded at all (though that's not a useful configuration so it's unlikely to actually exist).
I'm not aware of any games affected by this issue.
This should fix recording the wall in the staircase leading to the basement in Luigi's Mansion (though I haven't tested it, as I don't own a copy of Luigi's Mansion). This uses NormalIndex3, and the index for the normal vector (generally 0x02XX or 0x01XX) there is always lower than the tangent or binormal (generally 0x07XX). Other games seem to usually have a similar range of indices for the normal, tangent, and binormal, so this issue wouldn't affect them.
In most cases, games will use the same type for all vertex components (either Index8 or Index16 or Direct). However, RS2's deflection towers use Index16 for the texture coordinate and Index8 for everything else, meaning the texture coordinates were recorded incorrectly (the first byte was used, so only indices 0 and 1 were recorded instead of 0 through 0x0192). Worse still, some background elements in RS2 use direct positions but indexed normals or texture coordinates, and those would not be recorded at all.
This is a regression from b5fd35f951.
The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
The new `dispatcher_no_timing_check` is the same as `dispatcher_no_check`
except it includes the "stepping check" in debug mode. This lets us avoid
the `m_enable_debugging ? dispatcher : dispatcher_no_check` dance.