This commit adds a new "discarded" state for registers.
Discarding a register is like flushing it, but without
actually writing its value back to memory. We can discard
a register only when it is guaranteed that no instruction
will read from the register before it is next written to.
Discarding reduces the register pressure a little, and can
also let us skip a few flushes on interpreter fallbacks.
The output of instructions like fabsx and ps_sel is store-safe
if and only if the relevant inputs are. The old code was always
marking the output as store-safe if the output was a single,
and never otherwise.
Also, the old code was treating the output of psq_l/psq_lu as
store-safe, which seems incorrect (if dequantization is disabled).
This improves the speed of verifying Wii WIA/RVZ files.
For me, the verification speed for LZMA2-compressed files
has gone from 11-12 MiB/s to 13-14 MiB/s.
One thing VolumeVerifier does to achieve parallelism is to
compute hashes for one chunk of data while reading the next
chunk of data. In master, when reading data from a Wii
partition, each such chunk is 32 KiB. This is normally fine,
but with WIA and RVZ it leads to rather lopsided read times
(without the compute times being lopsided): The first 32 KiB
of each 2 MiB takes a long time to read, and the remaining
part of the 2 MiB can be read nearly instantly. (The WIA/RVZ
code has to read the entire 2 MiB in order to compute hashes
which appear at the beginning of the 2 MiB, and then caches
the result afterwards.) This leads to us at times not doing
much reading and at other times not doing much computation.
To improve this, this change makes us use 2 MiB chunks
instead of 32 KiB chunks when reading from Wii partitions.
(block = 32 KiB, group = 2 MiB)
This can't actually happen in practice due to how WAD files work,
but it's very easy to add support for thanks to the last commit,
so we might as well add support for it.
The performance gains of doing this aren't too important since you
normally wouldn't run into any disc image that has overlapping blocks
(which by extension means overlapping partitions), but this change also
lets us get rid of things like VolumeVerifier's mutex that used to
exist just for the sake of handling overlapping blocks.
Panic alerts in DiscIO can potentially be very annoying since
large amounts of them can pop up when loading the game list
if you have some particularly weird files in your game list.
This was a much bigger problem back in 5.0 with its
"Tried to decrypt data from a non-Wii volume" panic alert, but
I figured I would take it all the way and remove the remaining
panic alerts that can show up when loading the game list.
I have exempted uses of ASSERT/ASSERT_MSG since they indicate
a bug in Dolphin rather than a malformed file.
If we know at compile time that the PPC carry flag definitely
has a certain value, we can bake that value into the emitted code
and skip having to read from PPCState.
When a save state is loaded, the IOS device serving bluetooth
is cast as BluetoothEmuDevice. If, however, a real Wiimote
with BT passthrough is used, this caused the game to crash.
Now the proper device class is used.
At a first glance it may look like a part of the code I added to
srawx in efeda3b has a bug when a == s. The code actually happens
to work correctly, but in the interest of making the code easier
to reason about, I'd like to change the way it's implemented. This
change should improve the pipelining a little in the a == s case too.
Fix Gamelist context menu item 'Open Containing Folder' opening wrong
target on Windows when game parent folder is [foobar] and grandparent
folder contains file [foobar].bat or [foobar].exe
Add trailing directory separator to parent folder path to force Windows
to interpret path as directory.
Fixes https://bugs.dolphin-emu.org/issues/12411
21c152f added a small hack to DVDInterface to keep WBFS and CISO
files working with Nintendo's "Error #001" anti-piracy check.
Unfortunately I don't think it's possible to support WBFS and
CISO without any kind of hack or heuristic, but what we can do
is replace the 21c152f hack (which applies regardless of file
format) with a hack that only is active when using WBFS or CISO.
This change is similar to 2a5a399, but the disc size is
calculated in a different way.
Add ! before unused variables to 'use' them.
Ubuntu-x64 emits warnings for unused variables because gcc decides
it should ignore the void cast around them. See thread for discussion:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66425
Loop index int i was being compared against GetControllerCount() which
returned a size_t. This was the only place GetControllerCount() was
called from so the change of return type doesn't disturb anything else.
Changing the loop index to size_t wouldn't work as well since it's
passed into GetController(), which takes an int and is called from many
places, so it would need a cast anyway on an already busy line.
...and let's optimize a divisor of 2 ever so slightly for good measure.
I wouldn't have bothered, but most GameCube games seem to hit this on
launch.
- Division by 2
Before:
41 BE 02 00 00 00 mov r14d,2
41 8B C2 mov eax,r10d
45 85 F6 test r14d,r14d
74 0D je overflow
3D 00 00 00 80 cmp eax,80000000h
75 0E jne normal_path
41 83 FE FF cmp r14d,0FFFFFFFFh
75 08 jne normal_path
overflow:
C1 F8 1F sar eax,1Fh
44 8B F0 mov r14d,eax
EB 07 jmp done
normal_path:
99 cdq
41 F7 FE idiv eax,r14d
44 8B F0 mov r14d,eax
done:
After:
45 8B F2 mov r14d,r10d
41 C1 EE 1F shr r14d,1Fh
45 03 F2 add r14d,r10d
41 D1 FE sar r14d,1
Add a function to calculate the magic constants required to optimize
signed 32-bit division.
Since this optimization is not exclusive to any particular architecture,
JitCommon seemed like a good place to put this.
Zero divided by any number is still zero. For whatever reason, this case
shows up frequently too.
Before:
B8 00 00 00 00 mov eax,0
85 F6 test esi,esi
74 0C je overflow
3D 00 00 00 80 cmp eax,80000000h
75 0C jne normal_path
83 FE FF cmp esi,0FFFFFFFFh
75 07 jne normal_path
overflow:
C1 F8 1F sar eax,1Fh
8B F8 mov edi,eax
EB 05 jmp done
normal_path:
99 cdq
F7 FE idiv eax,esi
8B F8 mov edi,eax
done:
After:
Nothing!
When the dividend is known at compile time, we can eliminate some of the
branching and precompute the result for the overflow case.
Before:
B8 54 D3 E6 02 mov eax,2E6D354h
85 FF test edi,edi
74 0C je overflow
3D 00 00 00 80 cmp eax,80000000h
75 0C jne normal_path
83 FF FF cmp edi,0FFFFFFFFh
75 07 jne normal_path
overflow:
C1 F8 1F sar eax,1Fh
8B F8 mov edi,eax
EB 05 jmp done
normal_path:
99 cdq
F7 FF idiv eax,edi
8B F8 mov edi,eax
done:
After:
85 FF test edi,edi
75 04 jne normal_path
33 FF xor edi,edi
EB 0A jmp done
normal_path:
B8 54 D3 E6 02 mov eax,2E6D354h
99 cdq
F7 FF idiv eax,edi
8B F8 mov edi,eax
done:
Fairly common with constant dividend of zero. Non-zero values occur
frequently in Ocarina of Time Master Quest.
Whether the custom RTC setting is enabled shouldn't in itself
affect determinism (as long as the actual RTC value is properly
synced). Alters the logic added in 4b2906c.
I'm not entirely certain that this is correct, but the current
code doesn't really make sense to me... If we need to force the
RTC bias to 0 when custom RTC is enabled, why don't we need to
do it when custom RTC is disabled? The code for getting the
host system's current time doesn't contain any special handling
for the guest's RTC bias as far as I can tell.
The loop in WIARVZFileReader::Chunk::Read could terminate
prematurely if the size argument was smaller than the size
of an exception list which had only been partially loaded.
Fixes issue 11393.
The problem is that left and top make no sense for a width by height array; they only make sense in a larger array where from which a smaller part is extracted. Thus, the overall size of the array is provided to CopyRegion in addition to the sub-region. EncodeXFB already handles the extraction, so CopyRegion's only use there is to resize the image (and thus no sub-region is provided).
BPMEM_TEV_COLOR_ENV + 6 (0xC6) was missing due to a typo. BPMEM_BP_MASK (0xFE) does not lend itself well to documentation with the current FIFO analyzer implementation (since it requires remembering the values in BP memory) but still shouldn't be treated as unknown. BPMEM_TX_SETMODE0_4 and BPMEM_TX_SETMODE1_4 (0xA4-0xAB) were missing entirely.
Additional changes:
- For TevStageCombiner's ColorCombiner and AlphaCombiner, op/comparison and scale/compare_mode have been split as there are different meanings and enums if bias is set to compare. (Shift has also been renamed to scale)
- In TexMode0, min_filter has been split into min_mip and min_filter.
- In TexImage1, image_type is now cache_manually_managed.
- The unused bit in GenMode is now exposed.
- LPSize's lineaspect is now named adjust_for_aspect_ratio.
Additionally, VCacheEnhance has been added to UVAT_group1. According to YAGCD, this field is always 1.
TVtxDesc also now has separate low and high fields whose hex values correspond with the proper registers, instead of having one 33-bit value. This change was made in a way that should be backwards-compatible.
The PPC is supposed to be held in reset when another version of IOS is
in the process of being launched for a PPC title launch.
Probably doesn't matter in practice, though the inaccuracy was
definitely observable from the PPC.
We should only try to load a symbol map for the new title *after* it
has been loaded into memory, not before. Likewise for applying HLE
patches and loading new custom textures.
In practice, loading/repatching too early was only a problem for
titles that are launched via ES_Launch. This commit fixes that.
The extra IPC ack is triggered by a syscall that is invoked in ES's
main function; the syscall literally just sets Y2, IX1 and IX2 in
HW_IPC_ARMCTRL -- there is no complicated ack queue or anything.
Low MEM1 is cleared by IOS before all the other constants are written.
This will overwrite the Gecko code handler but it should be fine
because HLE::Reload (which will set up the code handler hook again)
will be called after a title change is detected.
The Host constructor sets a callback on a lambda that in turn calls
Host_UpdateDisasmDialog. Since that function is not a member function
capturing this is unnecessary.
Fixes -Wunused-lambda-capture warning on freebsd-x64.
When reading a reply from a message sent to the data socket there is
the possibility that the other side gets sent multiple messages
before replying to any of them, which can lead to multiple replies
sent in a row. Though this only happens when things time out, it's
quite possible for these timeouts to happen or build up over time,
especially when initiating the connection.
This change makes sure to flush any pending bytes that have not been
read yet out of the socket after a successful POLL reply is received,
since that is the most common time when backups occur, and as well as
using the exact number of bytes in an expected reply, to ensure
the received data and the message it's replying to do not get out of
sync.
The result of calls to PPCSTATE_OFF_PS0/1 were being cast to u32 and
passed to functions expecting s32 parameters. This changes the casts
to s32 instead.
One location was missing a cast and generated a warning with VS which
is now fixed.
Added `ToggleBreakPoint` to both interface BreakPoints/MemChecks. this would allow us to toggle the state of the breakpoint.
Also the TMemCheck::is_ranged is not longer serialized to string, since can be deduce by comparing the TMemCheck::start_address and TMemCheck::end_address
DualShock UDP Client is the only place in the code that assumed OnConfigChanged()
is called at least once on startup or it won't load up the setting, so I took care of that
This was caused, because we were saving the `break_on_hit` flag with the letter `p`. Then while loading the breakpoints, we read the flag with the letter `b`, resulting in the `break_on_hit` flag being always false
Filesystem accesses aren't magically faster when they are done by ES,
so this commit changes our content wrapper IPC commands to take FS
access times and read operations into account.
This should make content read timings a lot more accurate and closer
to console. Note that the accuracy of the timings are limited to the
accuracy of the emulated FS timings, and currently performance
differences between IOS9-IOS28 and newer IOS versions are not emulated.
Part 1 of fixing https://bugs.dolphin-emu.org/issues/11346
(part 2 will involve emulating those differences)
This makes it more convenient to emulate timings for IPC commands that
perform internal IOS <-> IOS IPC, for example ES relying on FS
for filesystem access.
According to hwtests, older versions of IOS are slower at performing
various filesystem operations:
https://docs.google.com/spreadsheets/d/1OKo9IUuKCrniz4m0kYIaMP_qFtOCmAzHZ_zAmobvBcc/edit
(courtesy of JMC)
A quick glance at IOS9 reveals that older versions of IOS have a
simplistic implementation of memcpy that does not optimize large copies
by copying 16 bytes or 32 bytes per chunk, which makes cached reads
and writes noticeably slower -- the difference was significant enough
that the OoT speedrunning community noticed that IOS9 (the IOS that
is used for the OoT VC title) was slower.
More or less a complete rewrite of the function which aims
to be equally good or better for each given input, without
relying on special cases like the old implementation did.
In particular, we now have more extensive support for
MOVN, as mentioned in a TODO comment.
Instead of constructing IPCCommandResult with static member functions
in the Device class, we can just add the relevant constructors to the
reply struct itself. Makes more sense than putting it in Device
when the struct is used in the kernel code and doesn't use any Device
specific members...
This commit also changes the IPC command handlers to return an optional
IPCCommandResult rather than an IPCCommandResult. This removes the need
for a separate boolean that indicates whether the "result" is actually
a reply, and also avoids the need to set dummy result values and ticks.
It also makes it really obvious which commands can result in no reply
being generated.
Finally, this commit renames IPCCommandResult to IPCReply since the
struct is now only used for actual replies. This new name is less
verbose in my opinion.
The diff is quite large since this touches every command handler, but
the only functional change is that I fixed EnqueueIPCReply to
take a s64 for cycles_in_future to match IPCReply.
PrepareForState is now unnecessary with the new implementation of
HostFileSystem::DoState, which does what the old implementation
(CWII_IPC_HLE_Device_FileIO::PrepareForState) used to do.
I don't really see the use of this. (Maybe in the past it
was used for when we need a constant number of instructions
for backpatching? But we don't use MOVI2R for that now.)
Now that the ES class (now called ESDevice) and the ES namespace do
not conflict anymore, "IOS::" can be dropped in a lot of cases.
This also removes "IOS::HLE::" for code that is already in that
namespace. Some of those names used to be explicitly qualified
only for historical reasons.
There are no functional changes.
Some of the device names can be ambiguous and require fully or partly
qualifying the name (e.g. IOS::HLE::FS::) in a somewhat verbose way.
Additionally, insufficiently qualified names are prone to breaking.
Consider the example of IOS::HLE::FS:: (namespace) and
IOS::HLE::Device::FS (class). If we use FS::Foo in a file that doesn't
know about the class, everything will work fine. However, as soon as
Device::FS is declared via a header include or even just forward
declared, that code will cease to compile because FS:: now resolves
to Device::FS if FS::Foo was used in the Device namespace.
It also leads to having to write IOS::ES:: to access ES types and
utilities even for code that is already under the IOS namespace.
The fix for this is simple: rename the device classes and give them
a "device" suffix in their names if the existing ones may be ambiguous.
This makes it clear whether we're referring to the device class or to
something else.
This is not any longer to type, considering it lets us get rid of the
Device namespace, which is now wholly unnecessary.
There are no functional changes in this commit.
A future commit will fix unnecessarily qualified names.
According to the C standard, an offsetof expression must evaluate to an
address constant, otherwise it's undefined behavior.
Fixes https://bugs.dolphin-emu.org/issues/12409
See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95942
There are still improper uses of offsetof (mostly in JitArm64) but
fixing that will take more effort since there's a PPCSTATE_OFF wrapper
macro that is sometimes used with non-array members and sometimes used
with arrays and variable indices... Let's keep that for another PR.
Fixes the expression window being spammed with the first entry in the
Operators or Functions select menus when scrolling the mouse wheel while
hovering over them.
Fixes https://bugs.dolphin-emu.org/issues/12405
The dolphin-redirect.php script seems to have been present since 2012
at least, but we accidentally stopped using it when the "open wiki"
feature was reimplemented in DolphinQt2 in 2016.
<@delroth> dolphin-redirect.php is slightly smarter and tries to find gameid aliases for e.g. same region
<@delroth> uh, I mean different region
PR 9262 added a bunch of Jit64 optimizations, some of
which were already in JitArm64 and some which weren't.
This change ports the latter ones to JitArm64.