So somehow I forgot that AArch64 uses three-operand encoding...
Fixes a regression from 6303416201 which manifested in various ways,
such as incorrect rendering of the Wind Waker title screen.
The codegen for the functions themselves, not for the emitted code.
This seems to save 32 bytes per function. We also get rid of the oddity
we had before where ANDI2R would do masking for 32-bit operations but
the other functions wouldn't.
The alias for __m128i is typically something like:
typedef long long __m128i __attribute__((__vector_size__(16), __may_alias__));
and the part that ends up not getting preserved is the __may_alias__
attribute specifier.
So, in order to preserve that, we can just use a wrapper struct, so the
data type itself isn't being passed through the template.
We don't need to enforce the use of std::string instances with
AddSetting(). We can accept views and only construct one string,
rather than three temporaries.
Internal details: The large region is split into individual same-sized blocks of memory. On creation, we allocate a single block of memory that will always remain zero, and map that into the entire memory region. Then, the first time any of these blocks is written to, we swap the mapped zero block out with a newly allocated block of memory. On clear, we swap back to the zero block and deallocate the data blocks. That way we only actually allocate one zero block as well as a handful of real data blocks where the JitCache actually writes to.
This is a little trick I came up with that lets us restructure our float
classification code so we can exit earlier when the float is normal,
which is the case more often than not.
First we shift left by 1 to get rid of the sign bit, and then we count
the number of leading sign bits. If the result is less than 10 (for
doubles) or 7 (for floats), the float is normal. This is because, if the
float isn't normal, the exponent is either all zeroes or all ones.
With this, situations where multiple arguments need to be moved
from multiple registers become easy to handle, and we also get
compile-time checking that the number of arguments is correct.
This is needed so that the checks added in the previous commit will be
reevaluated if the value of m_enable_dcache changes.
JitArm64 was already recompiling its asm routines on cache clear by
necessity. It doesn't have the same setup as Jit64 where the asm
routines are in a separate region, so clearing the JitArm64 cache
results in the asm routines being cleared too.
This value is used in a multiplication. The result of this
multiplication is then subtracted from m_base. By negating m_dec, we are
free to use an addition instead.
On x64, this saves an instruction.
A deep-copy method CopyReader has been added to BlobReader (virtual) and all of its subclasses (override). This should create a second BlobReader to open the same set of data but with an independent read pointer so that it doesn't interfere with any reads done on the original Reader.
As part of this, IOFile has added code to create a deep copy IOFile pointer onto the same file, with code based on the platform in question to find the file ID from the file pointer and open a new one. There has also been a small piece added to FileInfo to enable a deep copy, but its only subclass at this time already had a copy constructor so this was relatively minor.