Optimize division by a constant into multiplication. This method is
also used by GCC and LLVM.
We also add optimized paths for divisors 0, 1, and -1, because they
don't work using this method. They don't occur very often, but are
necessary for correctness.
Makes the enum strongly typed instead of interacting with a raw u32
value. While we're at it, we can add helpers to the NWC24Config to make
using code poke at the internals of the class a little bit less and also
make the querying a little nicer to read.
Currently we were using heap allocating maps that last for the entire
duration of the emulator running.
Given the size N of both of these maps are very small (< 20 elements),
we can just make use of an array of pairs and perform linear scans. This
is also fine, given this code isn't particularly "hot" either, so this
won't be run often.
It seems like we spend a lot of the game list scanning time in
updateAdditionalMetadata, which I suppose makes sense considering
how many different files that function attempts to open.
With the addition of just one little atomic operation, we can make
it safe to call updateAdditionalMetadata without holding a lock.
These functions don't touch any class state, so they can be turned into
internal helper functions.
While we're at it, we can move the enumerations as well.
Although it's not clear what the xA and xB conditions are intended to do, the pattern indicates that xB is the regular version and xA is the inverted version, so for consistency, IsConditionB should be the main function.
Since the merge of b24b79e, we've gotten reports that the
following games are broken on JitArm64:
* Sonic Heroes
* The SpongeBob SquarePants Movie
* Astérix & Obélix XXL
* The Incredibles: Rise of the Underminer
Disabling the register cache avoids the issue, so the cause
of the bug might not actually have anything to do with the
newly implemented instructions. Nevertheless, I don't want
to ship a beta with this problem present, so I would like to
disable these instructions for the time being.
HandleFastmemFault works correctly when faults only happen in
expected locations, but it does some things that are rather
dangerous for faults in unexpected locations, like decrementing
an iterator without checking whether it's equal to begin.
This change cleans up the logic by making m_fault_to_handler's
key be the end of the fastmem region instead of the start.
Hardware testing indicated that SRS uses a different list of registers than LRS (specifically, acS.h can be used with SRSH but not LRS, and SRS does not support AX registers, and there are 2 encodings that do nothing).
* DSP*Arithmetic: Fix grammar for ANDCF and ANDF
* DSP*Arithmetic: Fix registers used by MOVAX and MOV
* DSP*Branch: Fix documentation for JMPR
* DSP*Branch: Fix HALT encoding ("I think I saw a two")
* DSP*ExtOps: Fix 'LN encoding (The listed encoding was for 'L)
* DSP*ExtOps: Improve documentation for 'LD and 'LDAX
* DSPJitExtOps: Correct typo
* DSP*LoadStore: Remove obsolete comment about pc in SRS (This was fixed in 1419e7e5b2)
* DSP*LoadStore: Fix comments for LRR/SRR
* DSP*Misc: Improve documentation for SBCLR and SBSET
* DSP*Multiplier: Fix MULXAC encoding (The previous encoding was for MULXMVZ)
* DSP*Multiplier: Fix tabs in MULCAC and MULCMVZ (There are some other tabs in comments in the JIT, but these are the only ones that are in instruction comments instead of indicating the corresponding interpreter code. Those other comments can be corrected in a different PR, as they're not documentation related.)
* DSPJitMultiplier: Fix MULXMVZ typo
These instructions were already implememented by Dolphin, but never added to the manual. Extension instructions will be handled in a later commit, as wlil instructions that were not previously implememented by Dolphin.
We were using a "value" register to avoid clobbering physical_addr,
but this isn't actually needed anymore. The only bits we need from
physical_addr after we start clobbering it are bits 5-9, and
those bits are identical in effective_addr and physical_addr,
so we can read them from effective_addr instead.
Fixes https://bugs.dolphin-emu.org/issues/12620
The changed code did not match the corresponding code in VertexShaderGen. Some parts of the sky have 2 color channels in each vertex, while others only have 1, despite only color channel 0 being used and XFMEM_SETNUMCHAN being set to 1 for both of them. The old code (from #4601) caused channel 0 to be set to channel 1 if the vertex contained both color channels but the number of channels was set to 1, which is wrong.
Makes our conversions between the different signs explicit to indicate
that they're intentional and also silences compiler warnings when
compiling with sign conversion or stricter truncation warnings enabled.
The extension needs to happen in SetLongAcc, not GetLongAcc, as the extension needs to always be reflected in acS.h.
There is no functional difference with the write handler for acS.h, but it is more readable than 4 casts in a row.
`IsLess` would incorrectly return true if both `SR_OVERFLOW` and `SR_SIGN` are set, as `(sr & SR_OVERFLOW) != (sr & SR_SIGN)` becomes `SR_OVERFLOW != SR_SIGN` which is true as the two masks are different. This broke in e651592ef5.
This issue only affected the DSP LLE Interpreter, and not the DSP LLE JIT.
I've also included a simple test case for this. `ax0.l` (on the top left) is set to 0 if the instruction following `IFL` does not execute and to 1 if it is executed.
Retail-signed discs use the format: IOS56-64-v5661.wad
Debug-signed discs use the format: firmware.64.56.22.29.wad
Debug-signed discs usually have a 128 version of the firmware as well,
since some devkits have 128 MB MEM2. (Retail has 64 MB.)
I found it a little bit annoying that you can't start typing
the desired address immediately after opening the window.
Also getting rid of the window's ? button while I'm at it.
When you come across a cheat code in a place like the Dolphin
wiki, it's often posted like this:
$16:9 Widescreen
0441187C 3FE38E39
Sometimes users try to paste this in its entirety into the Code
field, which leads to Dolphin reporting an error on the first line.
I think it would be nice to make this a little smoother by having
Dolphin accept having a first line that starts with $.
For several reasons:
- It pegs the CPU at 95% for scanning even when Dolphin is idle
- WiimoteScannerHidapi works fine on macOS
- Less macOS code to maintain
When making 92d1d60, I checked whether the ~0x1f masking in dcbx
actually was necessary. I came to the conclusion that it wasn't,
so I removed it. However, I hadn't checked the second half of
InvalidateICache closely enough - the masking is actually needed.
This commit re-adds the masking, but this time in C++ code instead
of in jitted code in order to save icache. Though I suppose the
difference doesn't matter all that much, since this is in farcode
and all...
Hopefully fixes https://bugs.dolphin-emu.org/issues/12612.
This implements the behavior described in
https://bugs.dolphin-emu.org/issues/12565.
Thank you to eigenform, delroth, phire, marcan, segher, and Extrems
for all helping in one way or another with the efforts to reverse
engineer this behavior, and to Rylie for reporting the issue.
Write_U16_Swap leaves the upper 32 bits alone. Reimplementing this
correctly in the JIT would require more than one instruction,
so let's just call Write_U16_Swap instead, like Jit64 does.
One of the following commits will add emulation of a quirk
that only happens when writing to memory which is mapped as
write-through or cache-inhibited, so let's keep track of
which memory is mapped in this way.
This adds about a frame of latency, and since most games don't change
VI registers during scanout, we can get away with outputting the XFB at
the start of scanout. WWE Crush Hour is the (only currently known)
exception, which has flickering problems when doing it this way.
This adds a path to perform the output at the end of scanout, and gates
it behind an option which defaults to using the latency-reducing
pre-scanout path.
PR https://github.com/dolphin-emu/dolphin/pull/9700 removed spaces from within control names, which some user complained about, and their point of view is kind of understandable:
https://bugs.dolphin-emu.org/issues/12605
with this change, only spaces outside (between) control names are trimmed, which are the ones we wanted to trim in the first place.
This will still retain the major advantages from 9700.
Basically, "`Button 1` + `Button 2`" was showing as "`Button1`+`Button2`", while it will now show as "`Button 1`+`Button 2`".
Originally, 1479 (for example) would disassemble as `lsr $ACC0, #-7`. At some point (likely the conversion to fmt), this regressed to `lsr $ACC0, #4294967289`. Now, it disassembles as `lsr $ACC0, #7`.
The CPU-side AX library enables it by default and uses hardcoded parameters.
CMD_COMPRESSOR_TABLE_ADDR (0x0A) was incorrect. It's always a nop on the
GameCube and was probably confused with the Wii version.
It was believed that this only mattered when the rounding mode was
set to round to infinity, which games generally don't do, but it
can also affect the sign of the output when the inputs are all zero.
So it turns out you have to pass XMM0 as the clobber register
to HandleNaNs, because HandleNaNs uses BLENDVPD and BLENDVPD
implicitly uses XMM0, and nobody noticed when I broke this in
2c38d64 because nobody plays the one game that needs accurate NaNs.
This was added because YAGCD's info on MAXANISO (near TX_SETMODE0 in Section 5.11.1) claims it's the case, but Extrems says it does work. I haven't tested anything myself, and dolphin still does not actually implement anisotropic filtering based on this field.
This reverts commit 66b992cfe4.
A new (additional) correctness issue was revealed in the old
AArch64 code when applying it on top of modern JitArm64:
LSR was being used when LSRV was intended. This commit uses LSRV.
The workaround added in 30f9f31 caused a regression where Dolphin
incorrectly replaced runs of one byte with runs of another byte
when writing WIA and RVZ files. ReuseID::operator< was always
returning false unless the ReuseIDs being compared had different
partition keys, which caused std::map<ReuseID, GroupEntry>
to treat all ReuseIDs with the same partition key as equal.
This actually eliminates any setting pertaining to SD cards from the
NetPlay dialog, as it would effectively just be a duplicate of the
setting in the Wii pane, potentially causing confusion.
This also enables save data writing by default, as this is probably
what most players want, and should avoid them losing hours of progress
because they forgot to tick a checkbox.
This implementation is pretty efficient in my opinion. And "As
long as we aren't falling back to interpreter we're winning a lot"
applies to basically every instruction to some degree anyway.
The dcbz instruction needs to lock W30 so that the slowmem code will
push and pop it when calling into C++. Also, the slowmem code expects
that the address is present in W0, so replace the use of W0 as a scratch
register in the fastmem code with the now locked W30.
We currently have a bug when calling Arm64GPRCache::Flush with
FlushMode::MaintainState, zero free host registers, and at least
one guest register containing an immediate. We end up grabbing
a temporary register from the register cache in order to be
able to write the immediate to memory, but grabbing a temporary
register when there are zero free registers causes the least
recently used register to be flushed in a way which does not
maintain the state of the register cache.
To get around this, require callers to pass in a temporary
register in the GPR MaintainState case. In other cases,
passing in a temporary register is not required but can help
avoid spilling a register (if the caller already had a
temporary register at hand anyway, which in particular will
be the case in my upcoming memcheck pull request).
release-ubu-x64 currently fails with "sorry, unimplemented: non-trivial
designated initializers not supported". pr-ubu-x64 doesn't for some
reason, but we might as well remove the designated initializer.
This fixes various texture offsetting issues with negative texture coordinates (bringing the software renderer in line with the hardware renderers). It also handles the invalid wrap mode accurately (as was done for the hardware renderers in the previous commit). Lastly, it handles wrapping with non-power-of-2 texture sizes in a hardware-accurate way (which is somewhat broken looking, as games aren't supposed to use wrapping with non-power-of-2 sizes); this has not been done for the hardware renderers.