Otherwise, texelFetch() will use an out-of-bounds layer for game textures (that have 1 layer; EFB copies have 2 layers in stereoscopic 3D mode), which is undefined behavior (often resulting in a black image). The fast texture sampling path uses texture(), which always clamps (see https://www.khronos.org/opengl/wiki/Array_Texture#Access_in_shaders), so it was unaffected by this difference.
The former is deprecated and pretty much all modern drivers
support VK_EXT_debug_utils.
Android drivers dont support it. On those drivers,
we use the implementation provided by the validation layers.
Plus two miscellaneous debugger features that I found along the way when
reading Jit64's code for comparison: bJITNoBlockLinking and tracing.
Fixes https://bugs.dolphin-emu.org/issues/13127.
Small optimization. By not calling WriteExit, the block linking system
never finds out about the exit we're doing, saving us from having to
disable block linking.
We should expose Enable Controller Input and the turbo settings for
GBA just like we do for GameCube controllers and Wii Remotes.
I just forgot about it when implementing the GBA TAS input window.
Previously, if a user on Windows launched Dolphin from the command line
and specified a path to an M3U file and included backslashes in this path,
Dolphin would fail to resolve relative paths in the M3U file.
The calculation of each address in lmw/stmw currently has a dependency
on the calculation of the previous address. By removing this dependency,
the host CPU should be able to pipeline the loads/stores better. The cost
we pay for this is up to one extra register and one extra MOV instruction
per guest instruction, but often nothing.
Making EmitBackpatchRoutine support using any register as the address
register would let us get rid of the MOV, but I consider that to be too
big of a task to do in one go at the same time as this.
Now that we've flipped the C++20 switch, let's start making use of
the nice new <bit> header.
I'm planning on handling this move away from BitUtils.h incrementally
in a series of PRs. There may be a few functions remaining in
BitUtils.h by the end that C++20 doesn't have any equivalents for.
This reverts commit 351d095fff.
In hindsight, my attempted optimization messes with the return
predictor, unlike real tail calls. So I think it does more bad than
good.
The "vector shift by immediate" category encodes the shift amount for
right shifts as `size - amount`, whereas left shifts use `amount`.
We're not actually using SHRN/SHRN2 anywhere, which is why this has gone
undetected.
Use: callstack(0x80000000).
!callstack(value) works as a 'does not contain'.
Add strings to expr.h conditionals.
Use quotations: callstack("anim") to check symbols/name.
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.
This affected the memory and registers widgets (and possibly others). I'm pretty sure it regressed in 5f629abd8b.
The SetCodeVisible line is a new fix, but the equivalent already existed in the memory widget.
The call to analyzer.Analyze breaks when it attempts to read an instruction, as it eventually tries to read memory when Memory::m_pRAM is nullptr. Trying to read when execution is not paused in general seems like a bad idea (especially as analyzer.Analyze uses PowerPC::TryReadInstruction which can update icache - this is probably still a problem).
Operations that have two operands and can't generate a default NaN,
i.e. addition and subtraction, already have the desired NaN handling
on x86. We just need to make sure to not reverse the operands.
This fixes ps_sum0/ps_sum1 outputting NaNs in cases where they shouldn't.
(HandleNaNs assumes that a NaN in a ps0 input always results in a NaN in
the ps0 output, and correspondingly for ps1.)
1. In some cases, ps_merge01 can be implemented using one instruction.
2. When we need two instructions for ps_merge01, it's best to start with
a MOV to avoid false dependencies on the destination register.
3. ps_merge10 can be implemented using a single EXT instruction.
This regressed in 0a906f553f, I think (though I haven't confirmed it). Mario Tennis and Luigi's Mansion both use these for some reason (as far as I can tell, the data isn't actually used; it's just extra data included for no reason)
DataReader is generally jank - it has a start and end pointer, but the end pointer is generally not used, and all of the vertex loaders mostly bypassed it anyways.
Wrapper code (the vertex loaer test, as well as Fifo.cpp and OpcodeDecoding.cpp) still uses it, as does the software vertex loader (which is not a subclass of VertexLoader). These can probably be eliminated later.
This new function is like MOVP2R, except it masks out the lower 12 bits,
returning them instead of writing them to the register. These lower
12 bits can then be used as an offset for LDR/STR. This lets us turn
ADRP+ADD+LDR sequences with a zero offset into ADRP+LDR sequences with
a non-zero offset, saving one instruction.
When emulated GBAs were added to Dolphin, it was possible to control them
using the GC TAS input window. (Z was mapped to Select.) Unaware of this,
I broke the functionality in b296248.
To make it possible to control emulated GBAs using TAS input again,
I'm adding a proper TAS input window for GBAs, with a real Select button
and no analog controls.
0e02ddcf52 removed separate logic for tiled versus non-tiled EFB peek caches, and as part of that made it so that color peeks updated the frame access mask even when a non-tiled cache is in use. However, the same change was not made for depth peeks. I'm not sure if this affected anything in practice.
`ImGui::GetIO` performs an assertion that a context exists, and if one doesn't then things will likely crash. Unfortunately this crash is hard to consistently reproduce.
I recently talked to a homebrew developer who was trying to add exception
handlers at link time but found out that Dolphin was overwriting their
exception handlers. I figure that's not the usual way to do exception
handlers, but... making us load the executable after setting up memory
rather than before is easy, and matches what we do when booting discs,
so I suppose there's no reason not to do it. It also matches the intent
of why Dolphin is writing default exception handlers – we're writing
them because some homebrew relies on exception handlers being left
around from whatever program was running before it (see 3dd777be70).
Let's take advantage of ARM64's input register shifting one last time,
shall we?
Before:
0x1280005b mov w27, #-0x3
0x1b1b7f18 mul w24, w24, w27
After:
0x4b180b18 sub w24, w24, w24, lsl #2
ARM64's flexible shifting of input registers also allows us to calculate
a negative power of two in one instruction; shift the input of a NEG
instruction.
Before:
0x128001f7 mov w23, #-0x10
0x1b1a7efa mul w26, w23, w26
0x93407f58 sxtw x24, w26
After:
0x4b1a13fa neg w26, w26, lsl #4
0x93407f58 sxtw x24, w26
If the destination register doesn't equal the input register, using it
to temporarily hold the immediate value is fair game as it'll be
overwritten with the result of the multiplication anyway. This can
slightly reduce register pressure.
Before:
0x52800659 mov w25, #0x32
0x1b197f5b mul w27, w26, w25
After:
0x5280065b mov w27, #0x32
0x1b1b7f5b mul w27, w26, w27
By taking advantage of ARM64's ability to shift an input register by any
amount, we can calculate multiplication by a number that is one more
than a power of two with a single instruction.
Before:
0x52800838 mov w24, #0x41
0x1b187f7b mul w27, w27, w24
After:
0x0b1b1b7b add w27, w27, w27, lsl #6
Turn multiplications by a power of two into bitshifts.
Before:
0x52800817 mov w23, #0x40
0x1b167ef6 mul w22, w23, w22
After:
0x531a66d6 lsl w22, w22, #6
Multiplication by one is also trivial. Depending on the registers
involved, either a single MOV or no instructions will be generated.
Before:
0x52800038 mov w24, #0x1
0x1b1a7f1b mul w27, w24, w26
After:
0x2a1a03fb mov w27, w26
Before:
0x52800039 mov w25, #0x1
0x1b1a7f3a mul w26, w25, w26
After:
Nothing!
Add a new function that will handle all the special cases regarding
multiplication. It does nothing for now, but will be expanded in
follow-up commits.
We can merge an SXTW with the SUB, eliminating one instruction. In
addition, it is no longer necessary to allocate a temporary register,
reducing register pressure.
Before:
0x93407f59 sxtw x25, w26
0x93407ebb sxtw x27, w21
0xcb1b033b sub x27, x25, x27
After:
0x93407f5b sxtw x27, w26
0xcb35c37b sub x27, x27, w21, sxtw
ARM64 can do perform various types of sign and zero extension on a
register value before using it. The Arm64Emitter already had support for
this, but it was kinda hidden away.
This commit exposes the functionality by making the ExtendSpecifier enum
available everywhere and adding a new ArithOption constructor.
[ VUID-VkDescriptorPoolCreateInfo-maxSets-00301 ] Object 0:
handle = 0x7f1,b8d,3cd,e70, type = VK_OBJECT_TYPE_DEVICE; |
MessageID = 0xa1,70e,236 | vkCreateDescriptorPool():
pCreateInfo->maxSets is not greater than 0.
The Vulkan spec states: maxSets must be greater than 0
BindFramebuffer depends on the pipeline which might not be set yet.
That's why the framebuffer dirty flag exists in the first place.
I assume BindFramebuffer was called directly here, in order to handle
the texture state transitions necessary for DiscardResource.
The state is tracked anyway, so we can just issue those transitions there
too and defer binding the actual framebuffer.
Fixes an issue in Zelda Twilight Princess with EFB depth peeks.
Dolphin would bind a frame buffer which doesn't have an integer format
descriptor for the color target before binding the new pipeline.
So it would accidentally use the 0 descriptor.
Debug layer error:
D3D12 ERROR: ID3D12CommandList::OMSetRenderTargets:
Specified CPU descriptor handle ptr=0x0000000000000000 does not refer to
a location in a descriptor heap. pRenderTargetDescriptors[0] is the issue.
[ EXECUTION ERROR #646: INVALID_DESCRIPTOR_HANDLE]
Fixes the following error in the D3D12 debug layer:
D3D12 WARNING: ID3D12Device::CreateCommittedResource:
Ignoring InitialState D3D12_RESOURCE_STATE_UNORDERED_ACCESS.
Buffers are effectively created in state D3D12_RESOURCE_STATE_COMMON.
[ STATE_CREATION WARNING #1328: CREATERESOURCE_STATE_IGNORED]
Fixes the following error in the D3D12 debug layer:
D3D12 ERROR: ID3D12DescriptorHeap::GetGPUDescriptorHandleForHeapStart:
GetGPUDescriptorHandleForHeapStart is invalid to call on a descriptor
heap that does not have DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE set.
If the heap is not supposed to be shader visible, then
GetCPUDescriptorHandleForHeapStart would be the appropriate method
to call. That call is valid both for shader visible and non shader
visible descriptor heaps.
[ STATE_GETTING ERROR #1315: DESCRIPTOR_HEAP_NOT_SHADER_VISIBLE]
When searching for a disc where the revision doesn't match any disc in
the datfile, the loop would never get to the part where serials_exist is
set to true, leading to a bogus error message.
Because of the previous commit, `regs_in_use` must not include `dest_reg`
when calling MMIOLoadToReg. There are also some other registers we can
skip including in regs_in_use just for efficiency's sake.
The `addr_reg_set = false` statements that I've added in this commit are
technically redundant – if `mmio_address` is non-zero then `addr_reg_set`
is already false – but it's just a coincidence that that's the case.