According to hwtests, older versions of IOS are slower at performing
various filesystem operations:
https://docs.google.com/spreadsheets/d/1OKo9IUuKCrniz4m0kYIaMP_qFtOCmAzHZ_zAmobvBcc/edit
(courtesy of JMC)
A quick glance at IOS9 reveals that older versions of IOS have a
simplistic implementation of memcpy that does not optimize large copies
by copying 16 bytes or 32 bytes per chunk, which makes cached reads
and writes noticeably slower -- the difference was significant enough
that the OoT speedrunning community noticed that IOS9 (the IOS that
is used for the OoT VC title) was slower.
More or less a complete rewrite of the function which aims
to be equally good or better for each given input, without
relying on special cases like the old implementation did.
In particular, we now have more extensive support for
MOVN, as mentioned in a TODO comment.
Instead of constructing IPCCommandResult with static member functions
in the Device class, we can just add the relevant constructors to the
reply struct itself. Makes more sense than putting it in Device
when the struct is used in the kernel code and doesn't use any Device
specific members...
This commit also changes the IPC command handlers to return an optional
IPCCommandResult rather than an IPCCommandResult. This removes the need
for a separate boolean that indicates whether the "result" is actually
a reply, and also avoids the need to set dummy result values and ticks.
It also makes it really obvious which commands can result in no reply
being generated.
Finally, this commit renames IPCCommandResult to IPCReply since the
struct is now only used for actual replies. This new name is less
verbose in my opinion.
The diff is quite large since this touches every command handler, but
the only functional change is that I fixed EnqueueIPCReply to
take a s64 for cycles_in_future to match IPCReply.
PrepareForState is now unnecessary with the new implementation of
HostFileSystem::DoState, which does what the old implementation
(CWII_IPC_HLE_Device_FileIO::PrepareForState) used to do.
I don't really see the use of this. (Maybe in the past it
was used for when we need a constant number of instructions
for backpatching? But we don't use MOVI2R for that now.)
Now that the ES class (now called ESDevice) and the ES namespace do
not conflict anymore, "IOS::" can be dropped in a lot of cases.
This also removes "IOS::HLE::" for code that is already in that
namespace. Some of those names used to be explicitly qualified
only for historical reasons.
There are no functional changes.
Some of the device names can be ambiguous and require fully or partly
qualifying the name (e.g. IOS::HLE::FS::) in a somewhat verbose way.
Additionally, insufficiently qualified names are prone to breaking.
Consider the example of IOS::HLE::FS:: (namespace) and
IOS::HLE::Device::FS (class). If we use FS::Foo in a file that doesn't
know about the class, everything will work fine. However, as soon as
Device::FS is declared via a header include or even just forward
declared, that code will cease to compile because FS:: now resolves
to Device::FS if FS::Foo was used in the Device namespace.
It also leads to having to write IOS::ES:: to access ES types and
utilities even for code that is already under the IOS namespace.
The fix for this is simple: rename the device classes and give them
a "device" suffix in their names if the existing ones may be ambiguous.
This makes it clear whether we're referring to the device class or to
something else.
This is not any longer to type, considering it lets us get rid of the
Device namespace, which is now wholly unnecessary.
There are no functional changes in this commit.
A future commit will fix unnecessarily qualified names.
According to the C standard, an offsetof expression must evaluate to an
address constant, otherwise it's undefined behavior.
Fixes https://bugs.dolphin-emu.org/issues/12409
See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95942
There are still improper uses of offsetof (mostly in JitArm64) but
fixing that will take more effort since there's a PPCSTATE_OFF wrapper
macro that is sometimes used with non-array members and sometimes used
with arrays and variable indices... Let's keep that for another PR.
Fixes the expression window being spammed with the first entry in the
Operators or Functions select menus when scrolling the mouse wheel while
hovering over them.
Fixes https://bugs.dolphin-emu.org/issues/12405
The dolphin-redirect.php script seems to have been present since 2012
at least, but we accidentally stopped using it when the "open wiki"
feature was reimplemented in DolphinQt2 in 2016.
<@delroth> dolphin-redirect.php is slightly smarter and tries to find gameid aliases for e.g. same region
<@delroth> uh, I mean different region
PR 9262 added a bunch of Jit64 optimizations, some of
which were already in JitArm64 and some which weren't.
This change ports the latter ones to JitArm64.
Let's reset m_last_used for each register that will be used
in an instruction before we start allocating any of them,
so that one of the earlier allocations doesn't spill a
register that we want in a later allocation. (We must still
also increment/reset m_last_used in R and RW, otherwise we
end up in trouble when emulating lmw/stmw since those access
more guest registers than there are available host registers.)
This should ensure that the asserts added earlier in this
pull request are never triggered.
If the register pressure is high when allocating registers,
Arm64FPRCache may spill a guest register which we are going to
allocate later during the current instruction, which has the
side effect of turning it into double precision. This will have
bad consequences if we are assuming that it is single precision,
so let's add some asserts to detect if that ever happens.
PR #2663 added a Jit64 implementation of dcbX and a fast path to skip JIT cache invalidation. Unfortunately, there is a mismatch between address spaces in this optimization. It tests the effective address (with the top 3 bits cleared) against the valid block bitset which is intended to be indexed by physical address. While this works in the common case, it fails (for example) when the effective address is in the 7E... region (a.k.a. "fake VMEM"). This may also fall apart under more complex memory mapping scenarios requiring full MMU emulation.
The good news is that even without this fast path, the underlying call to JitInterface::InvalidateICache() still does very little work in the common case. It correctly translates the effective address to a physical address which it tests against the valid block bitset, skipping invalidation if it is not necessary. As such, the cost of removing the fast path should not be too high.
The Jit64 implementation is retained, though all it does now is emit a call. This is marginally more efficient than simple interpreter fallback, which involves an extra call. The JitArm64 implementation has also been fixed.
The game Happy Feet is fixed by this change, as it loads code in the 7E... address region and depends upon JIT cache invalidation in reponse to dcbf.
https://bugs.dolphin-emu.org/issues/12133
At least on some CPUs (I found out about this from the
Arm Cortex-A76 Software Optimization Guide), using X30
with BLR is one cycle slower than using another register.
Check return value of calls to File::CreateTempDir() from CoreTiming,
FileSystem, and MMIO test classes to verify the test user directory
exists, and fail the tests otherwise.