When making 92d1d60, I checked whether the ~0x1f masking in dcbx
actually was necessary. I came to the conclusion that it wasn't,
so I removed it. However, I hadn't checked the second half of
InvalidateICache closely enough - the masking is actually needed.
This commit re-adds the masking, but this time in C++ code instead
of in jitted code in order to save icache. Though I suppose the
difference doesn't matter all that much, since this is in farcode
and all...
Hopefully fixes https://bugs.dolphin-emu.org/issues/12612.
This implements the behavior described in
https://bugs.dolphin-emu.org/issues/12565.
Thank you to eigenform, delroth, phire, marcan, segher, and Extrems
for all helping in one way or another with the efforts to reverse
engineer this behavior, and to Rylie for reporting the issue.
Write_U16_Swap leaves the upper 32 bits alone. Reimplementing this
correctly in the JIT would require more than one instruction,
so let's just call Write_U16_Swap instead, like Jit64 does.
One of the following commits will add emulation of a quirk
that only happens when writing to memory which is mapped as
write-through or cache-inhibited, so let's keep track of
which memory is mapped in this way.
This adds about a frame of latency, and since most games don't change
VI registers during scanout, we can get away with outputting the XFB at
the start of scanout. WWE Crush Hour is the (only currently known)
exception, which has flickering problems when doing it this way.
This adds a path to perform the output at the end of scanout, and gates
it behind an option which defaults to using the latency-reducing
pre-scanout path.
PR https://github.com/dolphin-emu/dolphin/pull/9700 removed spaces from within control names, which some user complained about, and their point of view is kind of understandable:
https://bugs.dolphin-emu.org/issues/12605
with this change, only spaces outside (between) control names are trimmed, which are the ones we wanted to trim in the first place.
This will still retain the major advantages from 9700.
Basically, "`Button 1` + `Button 2`" was showing as "`Button1`+`Button2`", while it will now show as "`Button 1`+`Button 2`".
Originally, 1479 (for example) would disassemble as `lsr $ACC0, #-7`. At some point (likely the conversion to fmt), this regressed to `lsr $ACC0, #4294967289`. Now, it disassembles as `lsr $ACC0, #7`.
The CPU-side AX library enables it by default and uses hardcoded parameters.
CMD_COMPRESSOR_TABLE_ADDR (0x0A) was incorrect. It's always a nop on the
GameCube and was probably confused with the Wii version.
It was believed that this only mattered when the rounding mode was
set to round to infinity, which games generally don't do, but it
can also affect the sign of the output when the inputs are all zero.
So it turns out you have to pass XMM0 as the clobber register
to HandleNaNs, because HandleNaNs uses BLENDVPD and BLENDVPD
implicitly uses XMM0, and nobody noticed when I broke this in
2c38d64 because nobody plays the one game that needs accurate NaNs.
This was added because YAGCD's info on MAXANISO (near TX_SETMODE0 in Section 5.11.1) claims it's the case, but Extrems says it does work. I haven't tested anything myself, and dolphin still does not actually implement anisotropic filtering based on this field.
This reverts commit 66b992cfe4.
A new (additional) correctness issue was revealed in the old
AArch64 code when applying it on top of modern JitArm64:
LSR was being used when LSRV was intended. This commit uses LSRV.
The workaround added in 30f9f31 caused a regression where Dolphin
incorrectly replaced runs of one byte with runs of another byte
when writing WIA and RVZ files. ReuseID::operator< was always
returning false unless the ReuseIDs being compared had different
partition keys, which caused std::map<ReuseID, GroupEntry>
to treat all ReuseIDs with the same partition key as equal.
This actually eliminates any setting pertaining to SD cards from the
NetPlay dialog, as it would effectively just be a duplicate of the
setting in the Wii pane, potentially causing confusion.
This also enables save data writing by default, as this is probably
what most players want, and should avoid them losing hours of progress
because they forgot to tick a checkbox.
This implementation is pretty efficient in my opinion. And "As
long as we aren't falling back to interpreter we're winning a lot"
applies to basically every instruction to some degree anyway.
The dcbz instruction needs to lock W30 so that the slowmem code will
push and pop it when calling into C++. Also, the slowmem code expects
that the address is present in W0, so replace the use of W0 as a scratch
register in the fastmem code with the now locked W30.
We currently have a bug when calling Arm64GPRCache::Flush with
FlushMode::MaintainState, zero free host registers, and at least
one guest register containing an immediate. We end up grabbing
a temporary register from the register cache in order to be
able to write the immediate to memory, but grabbing a temporary
register when there are zero free registers causes the least
recently used register to be flushed in a way which does not
maintain the state of the register cache.
To get around this, require callers to pass in a temporary
register in the GPR MaintainState case. In other cases,
passing in a temporary register is not required but can help
avoid spilling a register (if the caller already had a
temporary register at hand anyway, which in particular will
be the case in my upcoming memcheck pull request).
release-ubu-x64 currently fails with "sorry, unimplemented: non-trivial
designated initializers not supported". pr-ubu-x64 doesn't for some
reason, but we might as well remove the designated initializer.