RCOpArg::ExtractWithByteOffset is only used in one place: a special case
of rlwinmx. ExtractWithByteOffset first stores the value of the
specified register into m_ppc_state (unless it's already there), and
then returns an offset into m_ppc_state. Our use of this function has
two undesirable properties (except in the trivial case `offset == 0`):
1. ExtractWithByteOffset calls StoreFromRegister without going through
any of the usual functions. This violated an assumption I made when
working on my constant propagation PR and led to a hard-to-find bug.
2. If the specified register is in a host register and is dirty,
ExtractWithByteOffset will store its value to m_ppc_state even when
it's unnecessary. In particular, this always happens when rlwinmx
uses the same register as input and output, since rlwinmx always
allocates a host register for the output and marks it as dirty.
Since ExtractWithByteOffset is only used in one place, I figure we might
as well inline it there. This commit does that, and also alters
rlwinmx's logic so the special case code is only triggered when the
input is already in m_ppc_state.
Input in `m_ppc_state`, before (11 bytes):
mov esi, dword ptr [rbp-104]
mov dword ptr [rbp-104], esi
movzx esi, byte ptr [rbp-101]
Input in `m_ppc_state`, after (5 bytes):
movzx esi, byte ptr [rbp-101]
Input in host register, before (8 bytes):
mov dword ptr [rbp-104], esi
movzx esi, byte ptr [rbp-101]
Input in host register, after (3 bytes):
shr edi, 0x18
Not sure if the behavior I'm implementing here is what real hardware
does, but since this is a buffer overflow, I'd like to get it fixed
quickly. Hardware verification can happen later.
https://bugs.dolphin-emu.org/issues/13506
Having to look up macros that are defined elsewhere makes the code
harder to reason about. The macros don't remove enough repetition to
justify their existence in my opinion.
When the divisor is a constant value, we can emit more efficient code.
For powers of two, we can use bit shifts. For other values, we can
instead use a multiplication by magic constant method.
- Example 1 - Division by 16 (power of two)
Before:
mov w24, #0x10 ; =16
udiv w27, w25, w24
After:
lsr w27, w25, #4
- Example 2 - Division by 10 (fast)
Before:
mov w25, #0xa ; =10
udiv w27, w26, w25
After:
mov w27, #0xcccd ; =52429
movk w27, #0xcccc, lsl #16
umull x27, w26, w27
lsr x27, x27, #35
- Example 3 - Division by 127 (slow)
Before:
mov w26, #0x7f ; =127
udiv w27, w27, w26
After:
mov w26, #0x408 ; =1032
movk w26, #0x8102, lsl #16
umaddl x27, w27, w26, x26
lsr x27, x27, #38
This implements the GameCube modem adapter. This implementation is stable but not perfect; it drops frames if the receive FIFO length is exceeded. This is probably due to the unimplemented interrupt mentioned in the comments. If the tapserver end of the connection is aware of this limitation, it's easily circumvented by lowering the MTU of the link, but ideally this wouldn't be necessary.
This has been tested with a couple of different versions of Phantasy Star Online, including Episodes 1 & 2 Trial Edition. The Trial Edition is the only version of the game that supports the Modem Adapter and not the Broadband Adapter, which is what made this commit necessary in the first place.
This expands the tapserver BBA interface to be available on all platforms. tapserver itself is still macOS-only, but newserv (the PSO server) is not, and it can directly accept local and remote tapserver connections as well. This makes the tapserver interface potentially useful on all platforms.