Sometimes a gecko code would have a title long enough to appear over the checkbox. This is now prevented by marking the text's boundary a 16dp before the start of the checkbox.
`ImGui::GetIO` performs an assertion that a context exists, and if one doesn't then things will likely crash. Unfortunately this crash is hard to consistently reproduce.
When inflating this layout, the layout inflater doesn't expect a View and rather a descendant of ViewGroup. This resulted in a crash which is resolved by using a FrameLayout instead.
I recently talked to a homebrew developer who was trying to add exception
handlers at link time but found out that Dolphin was overwriting their
exception handlers. I figure that's not the usual way to do exception
handlers, but... making us load the executable after setting up memory
rather than before is easy, and matches what we do when booting discs,
so I suppose there's no reason not to do it. It also matches the intent
of why Dolphin is writing default exception handlers – we're writing
them because some homebrew relies on exception handlers being left
around from whatever program was running before it (see 3dd777be70).
Let's take advantage of ARM64's input register shifting one last time,
shall we?
Before:
0x1280005b mov w27, #-0x3
0x1b1b7f18 mul w24, w24, w27
After:
0x4b180b18 sub w24, w24, w24, lsl #2
ARM64's flexible shifting of input registers also allows us to calculate
a negative power of two in one instruction; shift the input of a NEG
instruction.
Before:
0x128001f7 mov w23, #-0x10
0x1b1a7efa mul w26, w23, w26
0x93407f58 sxtw x24, w26
After:
0x4b1a13fa neg w26, w26, lsl #4
0x93407f58 sxtw x24, w26
If the destination register doesn't equal the input register, using it
to temporarily hold the immediate value is fair game as it'll be
overwritten with the result of the multiplication anyway. This can
slightly reduce register pressure.
Before:
0x52800659 mov w25, #0x32
0x1b197f5b mul w27, w26, w25
After:
0x5280065b mov w27, #0x32
0x1b1b7f5b mul w27, w26, w27
By taking advantage of ARM64's ability to shift an input register by any
amount, we can calculate multiplication by a number that is one more
than a power of two with a single instruction.
Before:
0x52800838 mov w24, #0x41
0x1b187f7b mul w27, w27, w24
After:
0x0b1b1b7b add w27, w27, w27, lsl #6
Turn multiplications by a power of two into bitshifts.
Before:
0x52800817 mov w23, #0x40
0x1b167ef6 mul w22, w23, w22
After:
0x531a66d6 lsl w22, w22, #6
Multiplication by one is also trivial. Depending on the registers
involved, either a single MOV or no instructions will be generated.
Before:
0x52800038 mov w24, #0x1
0x1b1a7f1b mul w27, w24, w26
After:
0x2a1a03fb mov w27, w26
Before:
0x52800039 mov w25, #0x1
0x1b1a7f3a mul w26, w25, w26
After:
Nothing!
Add a new function that will handle all the special cases regarding
multiplication. It does nothing for now, but will be expanded in
follow-up commits.
We can merge an SXTW with the SUB, eliminating one instruction. In
addition, it is no longer necessary to allocate a temporary register,
reducing register pressure.
Before:
0x93407f59 sxtw x25, w26
0x93407ebb sxtw x27, w21
0xcb1b033b sub x27, x25, x27
After:
0x93407f5b sxtw x27, w26
0xcb35c37b sub x27, x27, w21, sxtw
ARM64 can do perform various types of sign and zero extension on a
register value before using it. The Arm64Emitter already had support for
this, but it was kinda hidden away.
This commit exposes the functionality by making the ExtendSpecifier enum
available everywhere and adding a new ArithOption constructor.
[ VUID-VkDescriptorPoolCreateInfo-maxSets-00301 ] Object 0:
handle = 0x7f1,b8d,3cd,e70, type = VK_OBJECT_TYPE_DEVICE; |
MessageID = 0xa1,70e,236 | vkCreateDescriptorPool():
pCreateInfo->maxSets is not greater than 0.
The Vulkan spec states: maxSets must be greater than 0
BindFramebuffer depends on the pipeline which might not be set yet.
That's why the framebuffer dirty flag exists in the first place.
I assume BindFramebuffer was called directly here, in order to handle
the texture state transitions necessary for DiscardResource.
The state is tracked anyway, so we can just issue those transitions there
too and defer binding the actual framebuffer.
Fixes an issue in Zelda Twilight Princess with EFB depth peeks.
Dolphin would bind a frame buffer which doesn't have an integer format
descriptor for the color target before binding the new pipeline.
So it would accidentally use the 0 descriptor.
Debug layer error:
D3D12 ERROR: ID3D12CommandList::OMSetRenderTargets:
Specified CPU descriptor handle ptr=0x0000000000000000 does not refer to
a location in a descriptor heap. pRenderTargetDescriptors[0] is the issue.
[ EXECUTION ERROR #646: INVALID_DESCRIPTOR_HANDLE]
Fixes the following error in the D3D12 debug layer:
D3D12 WARNING: ID3D12Device::CreateCommittedResource:
Ignoring InitialState D3D12_RESOURCE_STATE_UNORDERED_ACCESS.
Buffers are effectively created in state D3D12_RESOURCE_STATE_COMMON.
[ STATE_CREATION WARNING #1328: CREATERESOURCE_STATE_IGNORED]
Fixes the following error in the D3D12 debug layer:
D3D12 ERROR: ID3D12DescriptorHeap::GetGPUDescriptorHandleForHeapStart:
GetGPUDescriptorHandleForHeapStart is invalid to call on a descriptor
heap that does not have DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE set.
If the heap is not supposed to be shader visible, then
GetCPUDescriptorHandleForHeapStart would be the appropriate method
to call. That call is valid both for shader visible and non shader
visible descriptor heaps.
[ STATE_GETTING ERROR #1315: DESCRIPTOR_HEAP_NOT_SHADER_VISIBLE]
When searching for a disc where the revision doesn't match any disc in
the datfile, the loop would never get to the part where serials_exist is
set to true, leading to a bogus error message.
Because of the previous commit, `regs_in_use` must not include `dest_reg`
when calling MMIOLoadToReg. There are also some other registers we can
skip including in regs_in_use just for efficiency's sake.
The `addr_reg_set = false` statements that I've added in this commit are
technically redundant – if `mmio_address` is non-zero then `addr_reg_set`
is already false – but it's just a coincidence that that's the case.
I originally added these in 2b1d1038a6, for both the TPipelineFunction and the size. The size was moved into the header in fdcd2b7d00 (making the size functions obsolete), but it seems that the functions themselves are no longer needed now.
I think I didn't use this approach before because it would have required ComponentFormatTable and ComponentCountRow to be templated, which would end up resulting in lines that were too long and thus wrapped in awkward places. (I *think* they didn't get inferred properly.) Now that we only need TPipelineFunction, the templating is not needed, and this ends up being a more readable version of the version with the wrapper functions.
This adds PR 10890's new setting to the Android GUI. I'm curious to see
if any Android users might get a performance improvement from it.
Due to how our settings work on Android, I haven't implemented disabling
the checkbox when the graphics backend doesn't support both GS and VS
for point/line expansion, but I don't think that's critical to have.
The old calculation was stride * (max_index + 1), which fails if stride is less than the size of a component (for instance, if float XYZ positions are used, and the stride was set to 4 (i.e. sizeof(float)) instead of 12 (i.e. 3 * sizeof(float)), it would be missing the last 8 bytes of the final element in the array. Or, if stride was set to 0, then no bytes would be recorded at all (though that's not a useful configuration so it's unlikely to actually exist).
I'm not aware of any games affected by this issue.
This should fix recording the wall in the staircase leading to the basement in Luigi's Mansion (though I haven't tested it, as I don't own a copy of Luigi's Mansion). This uses NormalIndex3, and the index for the normal vector (generally 0x02XX or 0x01XX) there is always lower than the tangent or binormal (generally 0x07XX). Other games seem to usually have a similar range of indices for the normal, tangent, and binormal, so this issue wouldn't affect them.
In most cases, games will use the same type for all vertex components (either Index8 or Index16 or Direct). However, RS2's deflection towers use Index16 for the texture coordinate and Index8 for everything else, meaning the texture coordinates were recorded incorrectly (the first byte was used, so only indices 0 and 1 were recorded instead of 0 through 0x0192). Worse still, some background elements in RS2 use direct positions but indexed normals or texture coordinates, and those would not be recorded at all.
This is a regression from b5fd35f951.
`count` is the number of stereo samples to write (where each stereo sample is two shorts), while `BUFFER_SIZE` is the size of the buffer in shorts. So `count` needs to be multiplied by `2`, not `BUFFER_SIZE`. Also, when this check was failed, the previous code just clobbered whatever was past the end of the buffer after logging the warning, which corrupted `basename`, eventually resulting in Dolphin crashing.
This affected Datel's Wii-compatible Action Replay, which uses a block size of 2298, or 18384 stereo samples, which is 36768 shorts, which is bigger than the buffer size of 32768. (However, the previous commit means that only one block is transfered at a time, eliminating this issue; fixing the bounds check is just a general safety thing instead of an actual bugfix now.)
The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
The new `dispatcher_no_timing_check` is the same as `dispatcher_no_check`
except it includes the "stepping check" in debug mode. This lets us avoid
the `m_enable_debugging ? dispatcher : dispatcher_no_check` dance.
Maybe "tail call" isn't quite the right term for what this code
is doing, since it's jumping to the dispatcher rather than
returning, but it's the same optimization as for a tail call.
fregsIn will include FD for double-precision instructions, since for
dependency tracking purposes the instruction does read the upper
half of FD. This is not what we want in HandleNaNs.
The consequence of this bug is that if an instruction was supposed to
output a NaN and FD happens to contain a NaN and FD happens to be the
same register as an unused register in the instruction encoding, the
NaN in FD could get used as the output instead of the correct NaN.
This isn't known to affect any games, which isn't especially surprising
considering that there's only one game that needs AccurateNaNs anyway.
Jumping to `dispatcher` requires first subtracting the downcount,
otherwise `dispatcher` may unpredictably jump to CoreTiming::Advance,
which could break determinism compatibility with JitArm64. We should
jump to `dispatcher_no_check` instead.
The breakpoint check in Jit.cpp makes it redundant.
Normally this redundant check doesn't cause any issues, but if you
create a breakpoint and enable logging without breaking, you get two
log messages if the breakpoint is at the beginning of a block. See
https://bugs.dolphin-emu.org/issues/13044.
This is also a tiny performance improvement for when debugging is
active, since we no longer check for breakpoints for blocks that never
had any breakpoints to begin with.
Nothing currently uses it. It could theoretically be replaced with fmt support, but I don't think the LOG_VULKAN_ERROR macro is that useful and it'd be better to replace it with regular logging instead.
base is an unsigned variable, so we can make things little more
consistent by making the loop index unsigned so we aren't doing bit
arithmetic with signed types.
MemoryInterface already does this, so we can leave it alone.
No behavioral changes, just a consistency thing.
Rather than makring some parts of VertexLoaderManager dirty in some places and some in others, do it all in VideoState. Also, since CPState no longer contains pointers/non-CP data after d039b1bc0d, we can just use p.Do on it instead of manually saving each field.
Micro-optimization. Some CPUs can fuse CMP+B, TST+B, arith+CBZ, etc.
I also moved things around for CMP+CSET and TST+CSET - which I'm not sure
if any CPUs support - but it doesn't hurt anything, so I might as well.
Improves accuracy but isn't known to affect any games.
This turned out to be fairly convenient to implement; ORing with the
PPC default NaN will quieten SNaNs and do nothing to QNaNs.
This existed in the initial megacommit (though I don't know why) as IO_SIZE. It was used in Memmap's Init() to compute totalMemSize, but I don't know if it actually did anything then. That use was removed in 2d0f714546, but the constant persisted until cc858c63b8, when it became a static variable.
This was added in 385d8e2b15, but became somewhat redundant with Do in 4c7bbd96e4, and completely redundant now that std::is_trivially_copyable_v is well-supported.
This is the first step of getting rid of the controller indirection
on Android. (Needing a way for touch controls to provide input
to the emulator core is the reason why the controller indirection
exists to begin with as far as I understand it.)
This lets the TAS input code use a higher-level interface for
overriding inputs instead of having to fiddle with raw bits.
WiiTASInputWindow in particular was messy with how much
controller code it had to re-implement.
Fixes a Rogue Squadron II regression from 9d73583.
This set_dirty stuff is pretty tricky to reason about. I thought I
was clever when coming up with set_dirty, but maybe I was too clever
for my own good...
In case the register we're binding is the same as the immediate register,
we should fetch the immediate before calling BindToRegister. The way
the register cache currently works, calling GetImm after BindToRegister
actually does work, but it's better to not rely on it.
Avoid waiting for earlier submissions when we flush more often.
The vertex manager will flush more often if the game accesses the EFB
on the CPU, to give the GPU a head start.
The screen real-estate is already reserved, the values are dumped and
restored by the on-DSP code, why not make something out of these values ?
Allows following:
- where exactly send_back was called from ($st1)
- the boundaries and progress of the innermost BLOOP{,I} ($st0, 2 and 3)
up to send_back's call
Before, only the symbols box would update. However, if you edit the symbol of a function in the call stack (which seems like something that would happen reasonably often while debugging), the call stack would be out of date until it was updated by clicking on it. Callers and calls were more of an edge case; for them to be out of date, you would need to right-click on an instruction in a function other than the one containing the currently-selected instruction (though it would also affect recursive functions).
Tested on an official DOL-014 (251 blocks) memory card by executing the
0xf4 command on a card with content along its entire length and then
dumping the whole card: it reads as 0xff all the way through.
Therefor, the current implementation is already consistent with hardware.
Compared to the previous solution of using big `synchronized` blocks,
this makes GameFileCacheManager's executor thread release and re-lock
the lock when possible, giving the GUI thread a chance to do a
(comparatively) quick getOrAdd call if it needs to.
This reverts commit fb265b610d.
The optimization in that commit is safe when the executor thread is
writing and the GUI thread is reading, but I had failed to take into
account that it's unsafe when the GUI thread is writing and the executor
thread is reading. (The native UpdateAdditionalMetadata function loops
through m_cached_files, which is unsafe if another thread is adding
elements to m_cached_files simultaneously.)
Losing out on this optimization isn't too bad, because
719930bb39 makes it very unlikely that
both threads will want the lock at the same time.
Texture dumping can already be done using VideoCommon's system (and in fact the same setting already enabled *both* of these). Dumping objects/TEV stages/texture fetches doesn't currently have an equivalent, but could be added to the FIFO player instead.
A (partial) port of #9481 to ARM64. This commit adds special cases for
immediate values equal to 0 or 0xFFFFFFFF, allowing for more efficient
or no code to be generated.
When a guest register is an immediate, it may be necessary to move this
value into a register. This is handled by gpr.R(), which lacks context
on how the register will be used. This leads to cases where the
immediate is written to a register, only for it to be overwritten. Take
for example this code generated by srwx:
0x5280031b mov w27, #0x18
0x53187edb lsr w27, w22, #24
gpr.BindToRegister() does have this context through the do_load
parameter, but didn't handle immediates. By adding this logic, we can
intelligently skip the write when do_load is false.
Because of the previous commit, this is needed to stop DolphinQt from
forgetting that the user pressed ignore whenever any part of the config
is changed.
This commit also changes the behavior a bit on DolphinQt: "Ignore for
this session" now applies to the current emulation session instead of
the current Dolphin launch. This matches how it already worked on
Android, and is in my opinion better because it means the user won't
lose out on important panic alerts in a game becase they played another
game first that had repeated panic alerts that they wanted to ignore.
For Android, this commit isn't necessary, but it makes the code cleaner.
This currently fails for direct with NormalIndex3 enabled (see https://bugs.dolphin-emu.org/issues/12952). The goal of this test is to be able to confidently say that that bug has been fixed.
We have one that does a similar thing, but only to measure speed and uses indices. This one verifies accuracy (and uses the largest possible input size by using direct components).
Not doing so produces a warning in clang:
ISO C++20 considers use of overloaded operator '!=' (with operand types
'Metal::DepthStencilSelector' and 'Metal::DepthStencilSelector') to be
ambiguous despite there being a unique best viable function with
non-reversed arguments
The underlying reason for this warning is an incorrect method signature.
* 'hangle' was a typo
* Light colors include an alpha value, so they should be 8 characters, not 6
* The XF command format adds 1 to the count internally (so 0 is one word), but we need to subtract that back to produce a valid command
* XFMEM_POSTMATRICES was calculating the row by subtracting XFMEM_POSMATRICES (POS vs POST), resulting in incorrect row numbering
It stores both the konst selection value for alpha and color channels (for two tev stages per ksel), and half of a swap table row (there are 4 total swap tables, which can be used for swizzling the rasterized color and the texture color, and indices selecting which tables to use are stored per tev stage in the alpha combiner). Since these are indexed very differently, the old code was hard to follow.
If GameFile.getCustomCoverPath returns a mangled URI, we need to
unmangle it before passing it to Picasso, since Picasso has no
concept of Dolphin's mangled URIs.
The masking was incorrect. This affects the main menu of The Last Avatar, though that menu also relies on copy filter functionality that is not correctly handled in the software renderer so the difference is not obvious; that game shuffles textures across all indices for some reason, so this issue would presumably result in subtle flickering.
Per https://en.cppreference.com/w/cpp/preprocessor/replace#.23_and_.23.23_operators the `##` behavior is a nonstandard extension; this extension seems to be supported by all compilers we care about, but IntelliSense in visual studio doesn't correctly handle it, resulting in false errors in the IDE (but not when compiling).
Per https://en.cppreference.com/w/cpp/preprocessor/replace#Function-like_macros C++20 introduced a workaround, where `__VA_OPT__(, )` generates a comma if and only if `__VA_ARGS__` is non-empty.
This PR replaces all occurrences, with the exception of Externals, DSPSpy (which is not likely to be edited in MSVC and does not target C++20 currently), and JitArm64_Integer.cpp (which uses `Function(__VA_ARGS__)`, and thus does not ever need a comma).
Fixes https://bugs.dolphin-emu.org/issues/13017. With uCode switching, the existing instance of AXUCode is re-activated when GBAUCode is done, but if the state remains as WaitingForNextTask, it won't be able to do anything. Instead, it needs to be in WaitingForCmdListSize.
(When the AX uCode is resumed, startpc is set to 0x0030, at least for 0x07f88145; this is the same location as MAIL_RESUME jumps to, so DSP_RESUME should be sent when the resuming happens; that's already handled by AXUCode::Update.)
In the past, directory initialization could fail for two reasons:
The user was rejecting the storage permission, or external storage
wasn't mounted. With the introduction of scoped storage, the first of
these two couldn't happen anymore; if the user rejects the storage
permission, we just use the app-specific directory instead of the
dolphin-emu directory.
By making it so Dolphin force quits if external storage isn't mounted,
we can get rid of our code for handling retrying directory initialization
after it fails. I think this slight hit to UX is worth it considering
that basically nobody has an Android device with detachable primary
external storage anymore. And the UX hit is very small; the user just has
to manually open the app again after remounting external storage. The
toast about external storage not being mounted will still be displayed.
The recent merge of the splash screen PR may have made it so that the
code for handling directory initialization failing doesn't work anymore.
To be completely honest, I'm not sure how to even test this in 2022.
walking the zip prevents minizip from re-reading the same
data repeatedly from the actual backing filesystem.
also improves most usages of minizip to allow for >4GB,
files altho we probably don't need it
dir_path is used by PanicAlertFormatT, which prior to PR 10209 used a
lambda. Before c++20, referring to structured bindings in lambda captures
was forbidden. The problem is now doubly fixed, so put the structured
binding back in.
Fixes the Dolphin bug mentioned in
https://github.com/dolphin-emu/hwtests/issues/45.
Because this doesn't fix any observed behavior in games (no, 1080°
Avalanche isn't affected), I haven't implemented this in the JITs,
so as to not cause unnecessary performance degradations.
This was causing a bug in the rounding of paired single multiplication
operands. If Force25BitPrecision was called for quad registers, the
element size of its ADD instruction would get treated as if it was 16
instead of the intended 64, which would cause the result of the
calculation to be incorrect if the carry had to pass a 16-bit boundary.
Fixes one of the two bugs reported in
https://bugs.dolphin-emu.org/issues/12998.
This command does not upload the MAIN buffers to CPU memory. This was
functionally fixed in f11a40f858 without
updating the comments and variable names.
Prior to 7854bd7109, this was used by the debugger for the OpenGL and D3D9 plugins to control logging (via PRIM_LOG and INFO_LOG/DEBUG_LOG in VideoCommon code; PRIM_LOG was changed in 77215fd27c), and also framedumping (removed in 64927a2f81 and 2d8515c0cf), shader dumping (removed in 2d8515c0cf and this commit), and texture dumping (removed in 54aeec7a8f). Apart from shader dumping, all of these features have modern alternatives, and shader source code can be seen in RenderDoc if "Enable API Validation Layers" is checked (which also enables source attachment), so there's no point in keeping this around.
Previously, we had WBFS and CISO which both returned an upper bound
of the size, and other formats which returned an accurate size. But
now we also have NFS, which returns a lower bound of the size. To
allow VolumeVerifier to make better informed decisions for NFS, let's
use an enum instead of a bool for the type of data size a blob has.
For a few years now, I've been thinking it would be nice to make Dolphin
support reading Wii games in the format they come in when you download
them from the Wii U eShop. The Wii U eShop has some good deals on Wii
games (Metroid Prime Trilogy especially is rather expensive if you try
to buy it physically!), and it's the only place right now where you can
buy Wii games digitally.
Of course, Nintendo being Nintendo, next year they're going to shut down
this only place where you can buy Wii games digitally. I kind of wish I
had implemented this feature earlier so that people would've had ample
time to buy the games they want, but... better late than never, right?
I used MIT-licensed code from the NOD library as a reference when
implementing this. None of the code has been directly copied, but
you may notice that the names of the struct members are very similar.
c1635245b8/lib/DiscIONFS.cpp
Needed for the next commit. NFS disc images are hashed but not encrypted.
While we're at it, also get rid of SupportsIntegrityCheck.
It does the same thing as old IsEncryptedAndHashed and new HasWiiHashes.
This normalization was added in 02ac5e95c8, and changed to use floats in 4bf031c064. The conversion to floats means that sometimes there is insufficient precision for the normalization process, which results in values of NaN or infinity. Performing the whole process with doubles prevents that, but games also sometimes set the values to NaN or infinity directly (possibly accidentally due to the values not being initialized due to them not being used in the current configuration?).
The version of Mesa currently in use on FifoCI (20.3.5) has issues with NaN. Although this bug has been fixed (b3f3287eac in 21.2.0), FifoCI is stuck with the older version.
This change may or may not be incorrect, but it should result in the same behavior as already present in Dolphin, while working around the Mesa bug.
CARDUCode, GBAUCode, and INITUCode previously didn't have an implementation of it. In practice it's unlikely that this caused an issue, since these uCodes are only active for a few frames at most, but now that GBAUCode doesn't have global state, we can implement it there. I also implemented it for CARDUCode, although our CARDUCode implementation does not have all states handled yet - this is simply future-proofing so that when the card uCode is properly implemented, the save state version does not need to be bumped. INITUCode does not have any state to save, though.
The accuracy improvements are:
* The request mail must be 0xabba0000 exactly; both the low and high parts are checked
* The address is masked with 0x0fffffff
* Before, the global state meant that after the GBA uCode had been used once, it would accept 0xcdd1 commands immediately. Now, it only accepts them after execution has finished.
* moves dolphin-specific settings out of Base.props
* creates exports.props for externals, allowing to easily import
individual Externals
* corrects some cruft that accumulated and probably contributed
to msbuild overbuilding