Shifting zero by any amount always gives zero.
Before:
41 B9 00 00 00 00 mov r9d,0
41 8B CF mov ecx,r15d
49 C1 E1 20 shl r9,20h
49 D3 F9 sar r9,cl
49 C1 E9 20 shr r9,20h
After:
Nothing, register is set to constant zero.
Before:
41 B8 00 00 00 00 mov r8d,0
41 8B CF mov ecx,r15d
49 C1 E0 20 shl r8,20h
49 D3 F8 sar r8,cl
41 8B C0 mov eax,r8d
49 C1 E8 20 shr r8,20h
44 85 C0 test eax,r8d
0F 95 45 58 setne byte ptr [rbp+58h]
After:
C6 45 58 00 mov byte ptr [rbp+58h],0
Occurs a bunch of times in Super Mario Sunshine. Since this is an
arithmetic shift a similar optimization can be done for constant -1
(0xFFFFFFFF), but I couldn't find any game where this happens.
Shifting zero by any amount always gives zero.
Before:
41 BF 00 00 00 00 mov r15d,0
8B CF mov ecx,edi
49 D3 E7 shl r15,cl
45 8B FF mov r15d,r15d
After:
Nothing, register is set to constant zero.
All games I've tried hit this optimization on launch. In Soul Calibur II
it occurs very frequently during gameplay.
Much like we did for srawx. This was already implemented on JitArm64.
Before:
B8 00 00 00 00 mov eax,0
8B F0 mov esi,eax
C1 E8 1F shr eax,1Fh
23 C6 and eax,esi
D1 FE sar esi,1
88 45 58 mov byte ptr [rbp+58h],al
After:
C6 45 58 00 mov byte ptr [rbp+58h],0
If both input registers hold known values at compile time, we can just
calculate the result on the spot.
Code has mostly been copied from JitArm64 where it had already been implemented.
Before:
BF FF FF FF FF mov edi,0FFFFFFFFh
8B C7 mov eax,edi
C1 FF 10 sar edi,10h
C1 E0 10 shl eax,10h
85 F8 test eax,edi
0F 95 45 58 setne byte ptr [rbp+58h]
After:
C6 45 58 01 mov byte ptr [rbp+58h],1
More efficient code can be generated if the shift amount is known at
compile time. We can once again take advantage of shifts with the shift
amount in an 8-bit immediate to eliminate ECX as a scratch register,
reducing register pressure and removing the occasional spill. We can
also do 32-bit shifts instead of 64-bit operations.
We recognize four distinct cases:
- The special case where we're dealing with the PowerPC's quirky shift
amount masking. If the shift amount is a number from 32 to 63, all
bits are shifted out and the result it either all zeroes or all ones.
Before:
B9 F0 FF FF FF mov ecx,0FFFFFFF0h
8B F7 mov esi,edi
48 C1 E6 20 shl rsi,20h
48 D3 FE sar rsi,cl
8B C6 mov eax,esi
48 C1 EE 20 shr rsi,20h
85 F0 test eax,esi
0F 95 45 58 setne byte ptr [rbp+58h]
After:
8B F7 mov esi,edi
C1 FE 1F sar esi,1Fh
0F 95 45 58 setne byte ptr [rbp+58h]
- The shift amount is zero. Not calculation needs to be done, just clear
the carry flag.
Before:
B9 00 00 00 00 mov ecx,0
49 C1 E5 20 shl r13,20h
49 D3 FD sar r13,cl
41 8B C5 mov eax,r13d
49 C1 ED 20 shr r13,20h
44 85 E8 test eax,r13d
0F 95 45 58 setne byte ptr [rbp+58h]
After:
C6 45 58 00 mov byte ptr [rbp+58h],0
- The carry flag doesn't need to be computed. Just do the arithmetic
shift.
Before:
B9 02 00 00 00 mov ecx,2
48 C1 E7 20 shl rdi,20h
48 D3 FF sar rdi,cl
48 C1 EF 20 shr rdi,20h
After:
C1 FF 02 sar edi,2
- The carry flag must be computed. In addition to the arithmetic shift,
we do a shift to the left and and them together to know if any ones
were shifted out. It's still better than before, because we can do
32-bit shifts.
Before:
B9 02 00 00 00 mov ecx,2
49 C1 E5 20 shl r13,20h
49 D3 FD sar r13,cl
41 8B C5 mov eax,r13d
49 C1 ED 20 shr r13,20h
44 85 E8 test eax,r13d
0F 95 45 58 setne byte ptr [rbp+58h]
After:
41 8B C5 mov eax,r13d
41 C1 FD 02 sar r13d,2
C1 E0 1E shl eax,1Eh
44 85 E8 test eax,r13d
0F 95 45 58 setne byte ptr [rbp+58h]
More efficient code can be generated if the shift amount is known at
compile time. Similar optimizations were present in JitArm64 already,
but were missing in Jit64.
- By using an 8-bit immediate we can eliminate the need for ECX as a
scratch register, thereby reducing register pressure and occasionally
eliminating a spill.
Before:
B9 18 00 00 00 mov ecx,18h
41 8B F7 mov esi,r15d
48 D3 E6 shl rsi,cl
8B F6 mov esi,esi
After:
41 8B CF mov ecx,r15d
C1 E1 18 shl ecx,18h
- PowerPC has strange shift amount masking behavior which is emulated
using 64-bit shifts, even though we only care about a 32-bit result.
If the shift amount is known, we can handle this special case
separately, and use 32-bit shift instructions otherwise. We also no
longer need to clear the upper 32 bits of the register.
Before:
BE F8 FF FF FF mov esi,0FFFFFFF8h
8B CE mov ecx,esi
41 8B F4 mov esi,r12d
48 D3 E6 shl rsi,cl
8B F6 mov esi,esi
After:
Nothing, register is set to constant zero.
- A shift by zero becomes a simple MOV.
Before:
BE 00 00 00 00 mov esi,0
8B CE mov ecx,esi
41 8B F3 mov esi,r11d
48 D3 E6 shl rsi,cl
8B F6 mov esi,esi
After:
41 8B FB mov edi,r11d
More efficient code can be generated if the shift amount is known at
compile time. Similar optimizations were present in JitArm64 already,
but were missing in Jit64.
- By using an 8-bit immediate we can eliminate the need for ECX as a
scratch register, thereby reducing register pressure and occasionally
eliminating a spill.
Before:
B9 18 00 00 00 mov ecx,18h
45 8B C1 mov r8d,r9d
49 D3 E8 shr r8,cl
After:
45 8B C1 mov r8d,r9d
41 C1 E8 18 shr r8d,18h
- PowerPC has strange shift amount masking behavior which is emulated
using 64-bit shifts, even though we only care about a 32-bit result.
If the shift amount is known, we can handle this special case
separately, and use 32-bit shift instructions otherwise.
Before:
B9 F8 FF FF FF mov ecx,0FFFFFFF8h
45 8B C1 mov r8d,r9d
49 D3 E8 shr r8,cl
After:
Nothing, register is set to constant zero.
- A shift by zero becomes a simple MOV.
Before:
B9 00 00 00 00 mov ecx,0
45 8B C1 mov r8d,r9d
49 D3 E8 shr r8,cl
After:
45 8B C1 mov r8d,r9d
The enumerated LOG_TYPE "OSREPORT" is currently used in both EXI_DeviceIPL.cpp and HLE_OS.cpp. In many games, the multitude of game functions detected by HLE_OS.cpp for OSREPORT logging results in poor log readability. This Pull Request remedies that by adding a new enumerated LOG_TYPE "OSREPORT_HLE" for log usage in HLE_OS.cpp.
In the future, further changing how logging in HLE_OS.cpp works may be desirable. As it is, game functions are detected that send a single character to the log. This is a major source of poor readability.
Introduces the system class that will eventually contain all relevant
system state, as opposed to everything being distributed all over the
place as global variables.
Throughout the codebase we have code that from its interface-view, does
not actually require its dependencies to be described in the interface,
and we routinely run into issues with initialization where we sometimes
make use of a facility before it's been initialized, which leads to
annoying to debug cases, because the reader needs to run through the
codebase and see what order things get initialized in, and how they're
being used. This is particularly a frequent issue in the video code.
Further, we also have a lot of code that makes use of file-scope
variables (many of which are non-trivial), which must all be default
initialized before the application can actually enter main(). While this
may not be a huge issue in itself, some of these are allocating, which
means that the application may need to use memory that it otherwise
wouldn't need to (e.g. when a game isn't running, this excess memory is
being used).
Being able to wrap all these subsystems into objects would be nicer,
since they can be constructed when they're actually needed. Them being
objects also means we can better express dependencies on subsystems as
types directly in the interface, making them explicit to the reader
instead of a change randomly blowing up, said reader inspecting it, and finding
out that something needed to be initialized beforehand. With the global
turned into a function parameter, the dependency is explicit and they
know just by reading it, that the given subsystem needs to be in a valid
state before calling the function.
For a prior example of an emulator that has moved to this model, see
yuzu, which has been migrated off of global variables all over the place
and replaced with a system instance (which has now reached the stage,
where the singleton can be removed).
We want to clear/memset the padding bytes, not just each member,
so using assignment or {} initialization is not an option.
To silence the warnings, cast the object pointer to u8* (which is not
undefined behavior) to make it explicit to the compiler that we want
to fill the object representation.
TunTap has recently become unmaintained, and it seems Apple wants developers to move away from kexts in general. TunTap currently takes some finagling to work on Catalina, and it may not work at all on Big Sur, necessitating a non-kext-based solution. Fortunately, fake Ethernet devices were introduced in Sierra and can be used similarly to tap adapters. This commit adds a new type of BBA interface implementation which uses fake Ethernet devices via tapserver (https://github.com/fuzziqersoftware/tapserver) to communicate with the host. This implementation was tested with PSO Episodes I & II, which can successfully connect to a private server running locally.
This implementation is only available on macOS, since that's the only place it's needed - Windows/Linux/Unix are unaffected by TunTap being deprecated.
Make sure m_is_populating_devices is true when a WM_INPUT_DEVICE_CHANGE
event is received directly on the ciface thread, so that callbacks do
not occur while removing devices. This breaks a hold-and-wait deadlock
between the ciface thread and the CPU thread when using emulated
Wiimotes.
Co-authored-by: brainleq <brainleq@users.noreply.github.com>
Co-authored-by: oldmud0 <oldmud0@users.noreply.github.com>
If we want to enable codes in the default game INIs,
we should have some way for users to disable them.
This commit accomplishes that by adding a *_Disabled
section corresponding to each *_Enabled section.
In order to reach the middle guard (at m_stack_base + GUARD_OFFSET)
before the bottom guard (at m_stack_base), the stack pointer
must start at an address which is higher than the middle guard.
It also didn't make sense that we were allocating memory
and then not using the top part of it.
Nintendo's official title installation code and ES both only look at
content IDs but we should probably check for content hashes in addition
to checking for IDs for at least two reasons:
1. Some of the installed contents could be corrupted -- this cannot be
easily detected without checking hashes.
2. Some mod distributors do not bother to update content IDs, which
means that installing updates from the UI would not actually
update the installed game. This is confusing for users.
To keep the existing semantic (for IOS especially), the new content
hash checks are opt-in for callers of GetStoredContentsFromTMD.
This commit changes WiiUtils's WAD installation logic to enable
the content hash checks.
Adds a flag to File::Delete and File::DeleteDir functions to control
whether a console warning is emitted when the file or directory doesn't
exist. The flag is optional and true by default to match current behavior.
Makes File::DeleteDir return true when attempting to delete a
nonexistent path.
The purpose of DeleteDir is to ensure the path doesn't exist after the
call, which is better reflected by the new return value. Additionally,
none of the current callers actually check the return value so this
won't break any existing code.
Fixes https://bugs.dolphin-emu.org/issues/12327.
When we started using fmt in CheckExternalExceptions, JitArm64
mysteriously stopped working even though the code path where
fmt was used never was reached. This is because the compiler
added a function prologue and epilogue to set up the stack,
since the code path that used fmt required the use of the stack.
However, the breakage didn't actually have anything to do
with the usage of the stack in itself, but rather with the
compiler's insertion of a stack canary. In the function
epilogue, a cmp instruction was inserted to check that the
stack canary had not been overwritten during the execution
of the function. This cmp instruction overwriting the status
flags ended up having a disastrous side effect once execution
returned to code emitted by JitArm64::WriteExceptionExit.
JitArm64's dispatcher contains a branch to the "do_timing"
code which is intended to be taken if the PPC downcount is
negative. However, the dispatcher doesn't update the status
flags on its own before this conditional branch, but rather
expects the calling code to have set them as a side effect
of DoDownCount. The root cause of our bug was that
JitArm64::WriteExceptionExit was calling DoDownCount before
Check(External)Exceptions instead of after.