Commit Graph

38489 Commits

Author SHA1 Message Date
Bram Speeckaert 274e34ddf1 JitArm64: MultiplyImmediate - Handle -(2^n) + 1
Let's take advantage of ARM64's input register shifting one last time,
shall we?

Before:
0x1280005b   mov    w27, #-0x3
0x1b1b7f18   mul    w24, w24, w27

After:
0x4b180b18   sub    w24, w24, w24, lsl #2
2022-11-02 21:53:19 +01:00
Bram Speeckaert 7073a135c6 JitArm64: MultiplyImmediate - Handle -(2^n)
ARM64's flexible shifting of input registers also allows us to calculate
a negative power of two in one instruction; shift the input of a NEG
instruction.

Before:
0x128001f7   mov    w23, #-0x10
0x1b1a7efa   mul    w26, w23, w26
0x93407f58   sxtw   x24, w26

After:
0x4b1a13fa   neg    w26, w26, lsl #4
0x93407f58   sxtw   x24, w26
2022-11-02 21:53:19 +01:00
Bram Speeckaert 1c87f040a3 JitArm64: mulli - Only allocate reg when necessary
If the destination register doesn't equal the input register, using it
to temporarily hold the immediate value is fair game as it'll be
overwritten with the result of the multiplication anyway. This can
slightly reduce register pressure.

Before:

0x52800659   mov    w25, #0x32
0x1b197f5b   mul    w27, w26, w25

After:
0x5280065b   mov    w27, #0x32
0x1b1b7f5b   mul    w27, w26, w27
2022-11-02 21:53:19 +01:00
Bram Speeckaert 20dd5cadab JitArm64: MultiplyImmediate - Add comments 2022-11-02 21:53:17 +01:00
Bram Speeckaert c349875cdc JitArm64: MultiplyImmediate - Handle 2^n + 1
By taking advantage of ARM64's ability to shift an input register by any
amount, we can calculate multiplication by a number that is one more
than a power of two with a single instruction.

Before:
0x52800838   mov    w24, #0x41
0x1b187f7b   mul    w27, w27, w24

After:
0x0b1b1b7b   add    w27, w27, w27, lsl #6
2022-11-02 21:52:44 +01:00
Bram Speeckaert 3aaf1a2b8b JitArm64: MultiplyImmediate - Handle 2^n
Turn multiplications by a power of two into bitshifts.

Before:
0x52800817   mov    w23, #0x40
0x1b167ef6   mul    w22, w23, w22

After:
0x531a66d6   lsl    w22, w22, #6
2022-11-02 21:52:37 +01:00
Markus Wick 0210d115c2
Merge pull request #11227 from JosJuice/jitarm64-mmio-clobber
JitArm64: Move MMIO handler result before popping stack
2022-11-02 10:19:22 +01:00
Jordan Woyak 168a49c87f ControllerInterface: DSU InputBackend implementation. 2022-11-01 21:59:09 -05:00
Jordan Woyak 2e5cd5d519 ControllerInterface: evdev InputBackend implementation. 2022-11-01 21:59:08 -05:00
Jordan Woyak 44a4573303 ControllerInterface: Add InputBackend interface and SDL implementation. 2022-11-01 21:59:08 -05:00
Bram Speeckaert f25611f388 JitArm64: MultiplyImmediate - Handle 1
Multiplication by one is also trivial. Depending on the registers
involved, either a single MOV or no instructions will be generated.

Before:
0x52800038   mov    w24, #0x1
0x1b1a7f1b   mul    w27, w24, w26

After:
0x2a1a03fb   mov    w27, w26

Before:
0x52800039   mov    w25, #0x1
0x1b1a7f3a   mul    w26, w25, w26

After:
Nothing!
2022-11-01 21:13:45 +01:00
Bram Speeckaert 51cb918aa5 JitArm64: MultiplyImmediate - Handle 0
Multiplication by zero always gives zero.

Before:
0x52800019   mov    w25, #0x0
0x1b197f5b   mul    w27, w26, w25

After:
Nothing!
2022-11-01 21:13:38 +01:00
Bram Speeckaert 080513284c JitArm64: mullwx - Use MultiplyImmediate 2022-11-01 19:05:33 +01:00
Bram Speeckaert 53a8cd1563 JitArm64: mulli - Use MultiplyImmediate 2022-11-01 19:04:50 +01:00
Bram Speeckaert 4aa0c0133a JitArm64: Introduce MultiplyImmediate
Add a new function that will handle all the special cases regarding
multiplication. It does nothing for now, but will be expanded in
follow-up commits.
2022-11-01 19:01:38 +01:00
JosJuice dc046a2470
Merge pull request #11237 from t895/grid-fix
Android: Add more game grid sizes for long displays
2022-11-01 19:00:53 +01:00
Bram Speeckaert d0de68c41b JitArm64: cmp - Optimize general case
We can merge an SXTW with the SUB, eliminating one instruction. In
addition, it is no longer necessary to allocate a temporary register,
reducing register pressure.

Before:
0x93407f59   sxtw   x25, w26
0x93407ebb   sxtw   x27, w21
0xcb1b033b   sub    x27, x25, x27

After:
0x93407f5b   sxtw   x27, w26
0xcb35c37b   sub    x27, x27, w21, sxtw
2022-11-01 12:21:24 +01:00
Bram Speeckaert ae6ce1df48 Arm64Emitter: Add ArithOption with ExtendSpecifier
ARM64 can do perform various types of sign and zero extension on a
register value before using it. The Arm64Emitter already had support for
this, but it was kinda hidden away.

This commit exposes the functionality by making the ExtendSpecifier enum
available everywhere and adding a new ArithOption constructor.
2022-11-01 12:15:56 +01:00
Bram Speeckaert 82f22cdfa1 JitArm64: cmp - Optimize a == -1 case
By explicitly handling this, we can avoid materializing -1 in a
register and generate more efficient code by taking advantage of -x ==
~x + 1.

Before:
0x12800015   mov    w21, #-0x1
0x93407eb9   sxtw   x25, w21
0x93407ef8   sxtw   x24, w23
0xcb180338   sub    x24, x25, x24

After:
0x2a3703f8   mvn    w24, w23
0x93407f18   sxtw   x24, w24
2022-11-01 12:00:32 +01:00
Bram Speeckaert 592ba31e22 JitArm64: cmp - Optimize a == 0 case
By explicitly handling this, we can avoid materializing zero in a
register and generate more efficient code altogether.

Before:
0x52800016   mov    w22, #0x0
0xb94093b5   ldr    w21, [x29, #0x90]
0x93407ed7   sxtw   x23, w22
0x93407eb9   sxtw   x25, w21
0xcb1902f9   sub    x25, x23, x25

After:
0xb94093b7   ldr    w23, [x29, #0x90]
0x4b1703f9   neg    w25, w23
0x93407f39   sxtw   x25, w25
2022-11-01 11:52:00 +01:00
Bram Speeckaert f5e7e70cc5 JitArm64: cmp - Refactor 2022-11-01 11:47:17 +01:00
Bram Speeckaert dbb8f588c7 JitArm64: cmpl - Optimize a == 0 case
By explicitly handling this, we can avoid materializing zero in a
register.

Before:
0x52800019   mov    w25, #0x0
0xb94087b6   ldr    w22, [x29, #0x84]
0xcb16033b   sub    x27, x25, x22

After:
0xb94087b9   ldr    w25, [x29, #0x84]
0xcb1903fb   neg    x27, x25
2022-11-01 11:27:45 +01:00
Dentomologist 7cd08fde75 Updater: Add/clarify error messages 2022-10-31 23:36:07 -07:00
Dentomologist 2808db7f2f FileUtil: Return success bool from CopyDir 2022-10-31 23:33:02 -07:00
JMC47 5488d3b125
Merge pull request #11239 from Tilka/ogl
OGL: fix compute shader labels
2022-11-01 02:18:37 -04:00
Tilka 12b204d92a
Merge pull request #11241 from shuffle2/revert-11234-updater
Revert "MacUpdater: test that os version check is working"
2022-11-01 02:04:41 +00:00
shuffle2 111e965c7e
Revert "MacUpdater: test that os version check is working" 2022-10-31 18:53:22 -07:00
Tilka b182abe0ae
Merge pull request #11234 from shuffle2/updater
MacUpdater: test that os version check is working
2022-11-01 01:28:20 +00:00
Tillmann Karras 22eb7e6645 OGL: use already known object label lengths
Passing -1 means the driver has to call strlen().
2022-11-01 01:10:03 +00:00
Tillmann Karras 4b8fe959d4 OGL: fix compute shader labels
This fixes GL_INVALID_VALUE errors when using GPU texture decoding.
2022-11-01 01:04:46 +00:00
Admiral H. Curtiss 4955af5e27
Merge pull request #11238 from K0bin/zero-descriptor-size
VideoBackends:Vulkan: Fix 0 size descriptor pools
2022-11-01 00:19:47 +01:00
Robin Kertels f5fecaf964
VideoBackends:Vulkan: Fix 0 size descriptor pools
[ VUID-VkDescriptorPoolCreateInfo-maxSets-00301 ] Object 0:
handle = 0x7f1,b8d,3cd,e70, type = VK_OBJECT_TYPE_DEVICE; |
MessageID = 0xa1,70e,236 | vkCreateDescriptorPool():
pCreateInfo->maxSets is not greater than 0.
The Vulkan spec states: maxSets must be greater than 0
2022-10-31 22:41:16 +01:00
Charles Lombardo 349b16aa55 Android: Add more game grid sizes for long displays 2022-10-31 13:11:17 -04:00
JosJuice d3718b1b81 Translation resources sync with Transifex 2022-10-30 22:39:29 +01:00
Shawn Hoffman 7cc8e37aee MacUpdater: test that os version check is working
Adds a key to Info.plist with default value to test
Updater - this commit is intended to be reverted
2022-10-30 13:19:43 -07:00
JMC47 969309c457
Merge pull request #11220 from shuffle2/macversion
MacUpdater: check os version
2022-10-30 15:19:55 -04:00
Shawn Hoffman 089886a6f8 MacUpdater: check os version 2022-10-30 12:04:57 -07:00
JMC47 2f80928be3
Merge pull request #11216 from JMC47/OwlsofGaINI
Disable "Force Texture Filtering" in Owls of Ga'Hoole
2022-10-30 13:36:46 -04:00
JMC47 f277a921a9
Merge pull request #11231 from shuffle2/updater
windows: Rename: use std::filesystem::rename for posix behavior
2022-10-30 13:32:10 -04:00
JMC47 950e1f94dc
Merge pull request #11185 from TryTwo/PR_MemoryWidget_Address_Input_History
MemoryWidget: Make search address a combobox that holds address history.
2022-10-30 04:21:14 -04:00
TryTwo 053320b7cf MemoryWidget: Make search address a combobox that holds address history.
Always update the combobox when a new target address is sent.
2022-10-29 22:41:30 -07:00
Shawn Hoffman 68875dc06b MacUpdater: add version info to Updater.app too 2022-10-29 20:32:59 -07:00
Shawn Hoffman 836bc74b2d windows: Rename: use std::filesystem::rename for posix behavior 2022-10-29 19:57:26 -07:00
Pokechu22 fe559f3ed3 VideoCommon/Statistics: Require semicolons after statistics macros
This is clearer and reduces IntelliSense problems.
2022-10-29 15:39:41 -07:00
Admiral H. Curtiss 0628794cb6
Merge pull request #11226 from K0bin/d3d12-fix
VideoBackends:D3D12: Fix hang in Twilight Princess
2022-10-30 00:24:16 +02:00
Robin Kertels a07ee729e5
VideoBackends:D3D12: Defer binding framebuffer in SetAndDiscardFramebuffer
BindFramebuffer depends on the pipeline which might not be set yet.
That's why the framebuffer dirty flag exists in the first place.
I assume BindFramebuffer was called directly here, in order to handle
the texture state transitions necessary for DiscardResource.
The state is tracked anyway, so we can just issue those transitions there
too and defer binding the actual framebuffer.

Fixes an issue in Zelda Twilight Princess with EFB depth peeks.
Dolphin would bind a frame buffer which doesn't have an integer format
descriptor for the color target before binding the new pipeline.
So it would accidentally use the 0 descriptor.

Debug layer error:
D3D12 ERROR: ID3D12CommandList::OMSetRenderTargets:
Specified CPU descriptor handle ptr=0x0000000000000000 does not refer to
a location in a descriptor heap. pRenderTargetDescriptors[0] is the issue.
[ EXECUTION ERROR #646: INVALID_DESCRIPTOR_HANDLE]
2022-10-29 23:41:32 +02:00
Robin Kertels a6aa651291
VideoBackends:D3D12: Use COMMON as initial state for default heap buffer
Fixes the following error in the D3D12 debug layer:

D3D12 WARNING: ID3D12Device::CreateCommittedResource:
Ignoring InitialState D3D12_RESOURCE_STATE_UNORDERED_ACCESS.
Buffers are effectively created in state D3D12_RESOURCE_STATE_COMMON.
[ STATE_CREATION WARNING #1328: CREATERESOURCE_STATE_IGNORED]
2022-10-29 23:39:32 +02:00
Robin Kertels 22fecb41fc
VideoBackends:D3D12: Don't query GPU descriptor handle for non-shader visible heap
Fixes the following error in the D3D12 debug layer:

D3D12 ERROR: ID3D12DescriptorHeap::GetGPUDescriptorHandleForHeapStart:
GetGPUDescriptorHandleForHeapStart is invalid to call on a descriptor
heap that does not have DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE set.
If the heap is not supposed to be shader visible, then
GetCPUDescriptorHandleForHeapStart would be the appropriate method
to call. That call is valid both for shader visible and non shader
visible descriptor heaps.
[ STATE_GETTING ERROR #1315: DESCRIPTOR_HEAP_NOT_SHADER_VISIBLE]
2022-10-29 23:39:27 +02:00
Tilka a1e41f305e
Merge pull request #11229 from Tilka/verifier
VolumeVerifier: fix bogus "serial/version missing" error
2022-10-29 21:45:52 +01:00
Tillmann Karras cacdd18ca0 VolumeVerifier: fix bogus "serial/version missing" error
When searching for a disc where the revision doesn't match any disc in
the datfile, the loop would never get to the part where serials_exist is
set to true, leading to a bogus error message.
2022-10-29 21:32:57 +01:00