The # option means that 0x is prepended already, so the old code resulted in 0x0xDEADBEEF instead of the intended 0xDEADBEEF. WriteMailboxLow was already correct.
I don't know what happened here, unfortunately. The version of dsp_mixer.s added to libogc on Nov 14, 2008 in c76d8b851f uses andcf and jlz here, and the version we have matches the one from Feb 5, 2009 in ae5c3a5fb5 exactly (prior to the fixes in my previous commit). I can't see any reason why wait_for_dsp_mail would be changed like this.
ANDCF and ANDF were previously swapped and JNE/JEQ/JZR/JNZ became JNZ/JZ/JLNZ/JLZ on Apr 3, 2009 in 7c4e654253, corresponding to a change Hermes made on Nov 10, 2008 in 2cea6d99ad. But these predate the test being added.
The only other information I can find is that ASNDLIB 1.0 released on November 11, 2008, at https://web.archive.org/web/20120326145022/http://www.entuwii.net/foro/viewtopic.php?f=6&t=87 (but there aren't any surviving links from there).
This change makes assembling DSPTestText match DSPTestBinary, though HermesText doesn't yet match HermesBinary.
The test data was originally added on April 18, 2009 in e7e4ef4481. Then, set16 and set40 were swapped on April 22, 2009 89178f411c, which updated the DSPSpy version of dsp_code, but not the version in DSPTool used for testing. So, when the test was made, the assembled data matched the text, but a few days after it no longer did.
Similarly, on Jul 7, 2009 in 1654c582ab the conditional instructions were adjusted, and 0x1706 was changed from JRL to JRNC and 0x0297 was changed from JGE to JC.
For what it's worth, devkitPro made the same changes on May 31, 2010 in 8a65c85c9b and updated their version of the asnd ucode (which is this ucode) on June 11, 2011 in b1b8ecab3a (though this update also includes other feature changes). Note that at the time, they didn't reassemble the ucode unless they made changes to it; the assembled was stored in the repo until bfb705fe16~...d20f9bdcfb43260c6c759f4fb98d724931443f93.
This fixes the following failures with Hermes:
!! 0015 : 8e00 vs 8f00 - set16 vs set40
!! 016f : 8e00 vs 8f00 - set16 vs set40
and with Hermes:
!! 0014 : 8e00 vs 8f00 - set16 vs set40
!! 0063 : 8e00 vs 8f00 - set16 vs set40
!! 019b : 1706 vs 1701 - jrnc $AR0 vs jrl $AR0
!! 01bf : 0297 vs 0290 - jc 0x01dc vs jge 0x01dc
!! 01d2 : 0297 vs 0290 - jc 0x01dc vs jge 0x01dc
Hermes has the remaining failures:
!! 027b : 03c0 vs 03a0 - andcf $AC1.M, #0x8000 vs andf $AC1.M, #0x8000
!! 027d : 029d vs 0294 - jlz 0x027a vs jnz 0x027a
Before, both 1441 and 147f would disassemble as `lsr $acc0, #1`, when the second should be `lsr $acc0, #-1`, and both 14c1 and 14ff would be `asr $acc0, #1` when the second should be `asr $acc0, #-1`. I'm not entirely sure whether the minus signs actually make sense here, but this change is consistent with the assembler so that's an improvement at least.
devkitPro previously changed the formatting to not require negative signs for lsr and asr; this is probably something we should do in the future: 8a65c85c9b
This fixes the HermesText and HermesBinary tests (HermesText already wrote `lsr $ACC0, #-5`, so this is consistent with what it used before.)
For instance, ending with 0x009e (which you can do with CW 0x009e) indicates a LRI $ac0.m instruction, but there is no immediate value to load, so before whatever garbage in memory existed after the end of the file was used.
The bounds-checking also previously assumed that IRAM or IROM was being used, both of which were exactly 0x1000 long.
Spirv-cross's MSL codegen makes the amazing choice of compiling calls to inout functions as `State temp = s; call_function(temp); s = temp`. Not all Metal backends handle this mess well. In particular, it causes register spills on Intel, losing about 5% in performance.
X30 is used in fewer situations than the comment was claiming.
(I think that when I wrote the comment I was counting the use of X30
as a temp variable in the slowmem code as clobbering X30, but that
happens after pushing X30, so it doesn't actually get clobbered.)
This is used when fastmem isn't available. Instead of always falling
back to the C++ code in MMU.cpp, the JIT translates addresses on its
own by looking them up in a table that Dolphin constructs. This is
slower than fastmem, but faster than the old non-fastmem code.
This is primarily useful for iOS, since that's the only major platform
nowadays where you can't reliably get fastmem. I think it would make
sense to merge this feature to master despite this, since there's
nothing actually iOS-specific about the feature. It would be of use
for me when I have to disable fastmem to stop Android Studio from
constantly breaking on segfaults, for instance.
Co-authored-by: OatmealDome <julian@oatmealdome.me>