Commit Graph

155 Commits

Author SHA1 Message Date
JosJuice b5c5371848 Arm64Emitter: Don't optimize ADD to MOV for SP
Unlike ADD (immediate), MOV (register) treats SP as ZR. Therefore the
ADDI2R optimization that was added in 67791d227c can't optimize ADD to
MOV when exactly one of the registers is SP.

There currently isn't any code in Dolphin that calls ADDI2R with
parameters that would trigger this case.
2024-02-06 21:58:07 +01:00
JosJuice d8c78f2a92 JitArm64: Fix the "do nothing" cases of ANDI2R and friends
So somehow I forgot that AArch64 uses three-operand encoding...

Fixes a regression from 6303416201 which manifested in various ways,
such as incorrect rendering of the Wind Waker title screen.
2023-12-21 20:51:32 +01:00
JosJuice dc60bc5f1e JitArm64: Improve codegen in ANDI2R and friends
The codegen for the functions themselves, not for the emitted code.

This seems to save 32 bytes per function. We also get rid of the oddity
we had before where ANDI2R would do masking for 32-bit operations but
the other functions wouldn't.
2023-12-17 18:13:32 +01:00
JosJuice a8e1e1ae48 JitArm64: Optimize additional cases of ANDI2R and friends
Now we'll never need a scratch register for values that are all zeroes
or all ones.
2023-12-17 18:13:32 +01:00
JosJuice 6303416201 JitArm64: Optimize ANDI2R and friends to no-ops when possible
This optimizes rlwnmx with mask == 0xFFFFFFFF.
2023-12-17 18:13:30 +01:00
JosJuice e0eb4ef5bc JitArm64: Use enum class for LogicalImm size parameter
This should prevent issues like the one fixed in the previous commit
from happening again.
2023-12-16 16:48:26 +01:00
JosJuice 67791d227c JitArm64: Add special zero case to ADDI2R
This normally doesn't reduce the instruction count, but is nonetheless
useful on CPUs that can do 0-cycle moves.
2023-12-01 21:31:11 +01:00
JosJuice 25ffb0dbfc JitArm64: Mask input to 32-bit ADDI2R
In case the input was a s32 that got sign extended as part of conversion
to u64.
2023-12-01 21:26:37 +01:00
JosJuice c248a69268 JitArm64: Add utility for calling a function with arguments
With this, situations where multiple arguments need to be moved
from multiple registers become easy to handle, and we also get
compile-time checking that the number of arguments is correct.
2023-11-01 19:01:58 +01:00
JosJuice 6e88c44d5d Move SmallVector to Common
We had one implementation of this type of data structure in Arm64Emitter
and one in VideoCommon. This moves the Arm64Emitter implementation to
its own file and adds begin and end functions to it, so that VideoCommon
can use it.

You may notice that the license header for the new file is CC0. I wrote
the Arm64Emitter implementation of SmallVector, so this should be no
problem.
2023-08-22 13:19:49 +02:00
Lioncash 784a216927 Common/MathUtil: Move IntLog2 into MathUtil namespace
Gets this out of the global namespace.
2023-04-15 03:35:05 -04:00
JosJuice b5b8871bce Arm64Emitter: Fix SHRN/SHRN2
The "vector shift by immediate" category encodes the shift amount for
right shifts as `size - amount`, whereas left shifts use `amount`.

We're not actually using SHRN/SHRN2 anywhere, which is why this has gone
undetected.
2022-12-10 11:20:23 +01:00
JosJuice 06e60ac327 JitArm64: Implement accurate NaNs
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.
2022-12-03 19:41:32 +01:00
JosJuice f45d3a6a2c JitArm64: Optimize ps_mergeXX
1. In some cases, ps_merge01 can be implemented using one instruction.
2. When we need two instructions for ps_merge01, it's best to start with
   a MOV to avoid false dependencies on the destination register.
3. ps_merge10 can be implemented using a single EXT instruction.
2022-11-26 18:14:58 +01:00
JosJuice 4dbf0b8e90 JitArm64: Reimplement Force25BitPrecision
The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
2022-10-22 10:03:52 +02:00
JosJuice 84375a91d9 Arm64Emitter: Combine immh and immb for Emit(Scalar)ShiftImm
This simplifies the callers of EmitShiftImm and EmitScalarShiftImm.
2022-10-19 20:20:39 +02:00
Pokechu22 a34d5e5960 Arm64Emitter: Add additional alignment assertions
Before, unaligned values would be silently ignored in most cases.
2022-09-18 23:33:24 -07:00
JosJuice 52661dcc76 Arm64Emitter: Fix encoding of size for ADD (vector)
This was causing a bug in the rounding of paired single multiplication
operands. If Force25BitPrecision was called for quad registers, the
element size of its ADD instruction would get treated as if it was 16
instead of the intended 64, which would cause the result of the
calculation to be incorrect if the carry had to pass a 16-bit boundary.

Fixes one of the two bugs reported in
https://bugs.dolphin-emu.org/issues/12998.
2022-08-05 21:49:28 +02:00
Pokechu22 1a92699455 Cast to int for enums that are not formattable 2022-01-13 11:11:08 -08:00
Pokechu22 558de04cfc Common/Assert: Actually use the ASSERT_MSG's log type parameter
Since it was unused, nonexistent values were used in a few places.  I've replaced them.
2022-01-09 12:44:14 -08:00
Pokechu22 44e93e91d7 Common/Assert: Switch to fmt 2022-01-09 12:43:11 -08:00
Pokechu22 2025763420 Treewide: Adjust order of includes 2021-12-10 14:49:57 -08:00
JMC47 e5a4a86672
Merge pull request #10055 from JosJuice/jitarm64-reuse-memory
JitArm64: Codegen space reuse
2021-11-20 17:35:24 -05:00
Merry 7c2b09e156 Arm64Emitter: Add FRINTI instruction 2021-11-06 19:15:26 +00:00
JosJuice 44beaeaff5 Arm64Emitter: Check end of allocated space when emitting code
JitArm64 port of 5b52b3e.
2021-10-13 21:52:16 +02:00
JosJuice 09cdb076a3 JitArm64: divwx - Optimize constant dividend
When the dividend is known at compile time, we can eliminate some
of the branching and precompute the result for the overflow case.
2021-08-26 14:50:01 +02:00
JosJuice a90b0a1c93 JitArm64: Implement mtfsfx
The sixth and final part of implementing the FPSCR system register
instructions.
2021-07-31 23:50:20 +02:00
JosJuice 8af5095ff4 JitArm64: Stop using hand-encoded logical immediates 2021-07-12 22:25:49 +02:00
JosJuice 10861ed8ce JitArm64: Turn IsImmLogical into a constexpr constructor 2021-07-10 20:31:28 +02:00
JosJuice ab024b218e JitArm64: Accept LogicalImm struct as bitwise inst parameter 2021-07-10 20:13:11 +02:00
JosJuice cbbd3d3956 Arm64Emitter: Fix 64-bit TBZ/TBNZ encoding
We haven't actually used 64-bit TBZ/TBNZ anywhere in Dolphin, so
this mistake hasn't broken anything, but let's fix it regardless.
2021-07-07 12:21:07 +02:00
Pierre Bourdon e149ad4f0a
treewide: convert GPLv2+ license info to SPDX tags
SPDX standardizes how source code conveys its copyright and licensing
information. See https://spdx.github.io/spdx-spec/1-rationale/ . SPDX
tags are adopted in many large projects, including things like the Linux
kernel.
2021-07-05 04:35:56 +02:00
JosJuice e0c81ae54a JitArm64: Fix MSVC warnings 2021-05-28 15:34:08 +02:00
Skyler Saleh 948764d37b Apple M1: Build, Analytics, and Memory Management
Analytics:
- Incorporated fix to allow the full set of analytics that was recommended by
  spotlightishere

BuildMacOSUniversalBinary:
- The x86_64 slice for a universal binary is now built for 10.12
- The universal binary build script now can be configured though command line
  options instead of modifying the script itself.
- os.system calls were replaced with equivalent subprocess calls
- Formatting was reworked to be more PEP 8 compliant
- The script was refactored to make it more modular
- The com.apple.security.cs.disable-library-validation entitlement was removed

Memory Management:
- Changed the JITPageWrite*Execute*() functions to incorporate support for
  nesting

Other:
- Fixed several small lint errors
- Fixed doc and formatting mistakes
- Several small refactors to make things clearer
2021-05-22 15:25:17 -07:00
Skyler Saleh 4ecb3084b7 Apple M1 Support for MacOS
This commit adds support for compiling Dolphin for ARM on MacOS so that it can
run natively on the M1 processors without running through Rosseta2 emulation
providing a 30-50% performance speedup and less hitches from Rosseta2.

It consists of several key changes:

- Adding support for W^X allocation(MAP_JIT) for the ARM JIT
- Adding the machine context and config info to identify the M1 processor
- Additions to the build system and docs to support building universal binaries
- Adding code signing entitlements to access the MAP_JIT functionality
- Updating the MoltenVK libvulkan.dylib to a newer version with M1 support
2021-05-22 15:25:17 -07:00
Mai M 1054abc9cc
Merge pull request #9712 from JosJuice/jitarm64-fmul-rounding
JitArm64: Fix fmul rounding issues
2021-05-20 10:25:02 -04:00
JosJuice 11be2314fe JitArm64: Fix fmul rounding issues
This is a port of 4f18f60 to JitArm64.
2021-05-15 23:27:34 +02:00
JosJuice 85226e09f0 JitArm64: Implement fres 2021-05-15 19:16:32 +02:00
JosJuice 749db94dec Arm64Emitter: Implement more variants of FMOV 2021-05-13 10:13:59 +02:00
JosJuice 1d106ceaf5 JitArm64: Optimize ConvertSingleToDouble, part 2
If we can prove that FCVT will provide a correct conversion,
we can use FCVT. This makes the common case a bit faster
and the less likely cases (unfortunately including zero,
which FCVT actually can convert correctly) a bit slower.
2021-04-25 15:56:19 +02:00
JosJuice a45a0a2066
Merge pull request #9494 from Dentomologist/convert_arm64reg_to_enum_class
Arm64Gen: Convert ARM64Reg to enum class
2021-03-17 00:05:23 +01:00
Dentomologist f0f206714f Arm64Gen: Convert ARM64Reg to enum class
Most changes are just adding ARM64Reg:: in front of the constants.
2021-03-13 10:10:59 -08:00
Dentomologist 686314b548 Arm64Gen: Move constant and make constexpr
Namespace-scope variable was only used in one function so move it there
2021-03-07 10:09:59 -08:00
Dentomologist dffcbcc6c4 Arm64Gen: Remove unused constant 2021-03-07 10:09:59 -08:00
JosJuice 1e500d96b0 JitArm64: Workaround for GCC ICE 2021-02-15 23:46:08 +01:00
JosJuice 9ad4f724e4 Arm64Emitter: Use ORR in MOVI2R 2021-02-13 21:04:13 +01:00
JosJuice 0d5ed06daf Arm64Emitter: Improve MOVI2R
More or less a complete rewrite of the function which aims
to be equally good or better for each given input, without
relying on special cases like the old implementation did.

In particular, we now have more extensive support for
MOVN, as mentioned in a TODO comment.
2021-02-13 20:23:03 +01:00
JosJuice 4e107935ac Arm64Emitter: Allow specifying 21th bit of ADRP imm 2021-02-13 11:33:27 +01:00
JosJuice d226b8f825 Arm64Emitter: Remove optimize parameter from MOVI2R
I don't really see the use of this. (Maybe in the past it
was used for when we need a constant number of instructions
for backpatching? But we don't use MOVI2R for that now.)
2021-02-13 11:33:27 +01:00
MerryMage 8aa2013a2d Arm64Emitter: Add additional assertions to BFI/UBFIZ 2021-01-31 12:04:57 +00:00