- Factor common work into a helper function.
- Replace confusingly named "noProlog" with "rsp_alignment". Now that
x86 is not supported, we can just specify it explicitly as 8 for
clarity.
- Add the option to include more frame size, which I'll need later.
- Revert a change by magumagu in March which replaced MOVAPD with MOVUPD
on account of 32-bit Windows, since it's no longer supported. True,
apparently recent processors don't execute the former any faster if the
pointer is, in fact, aligned, but there's no point using MOVUPD for
something that's guaranteed to be aligned...
(I discovered that GenFrsqrte and GenFres were incorrectly passing false
to noProlog - they were, in fact, functions without prologs, the
original meaning of the parameter - which caused the previous change to
break. This is now fixed.)
The special case is where the registers are actually to be swapped (i.e.
func(ABI_PARAM2, ABI_PARAM1); this was previously impossible but would
be ugly not to handle anyway.
The new NOP emitter breaks when called with a negative count. As it
turns out, it did happen when deoptimizing 8 bit MOVs because they are
only 4 bytes long and need no BSWAP.
Fixes issue 6990.
This uses a bit of templating to remove the duplicate code that is the CodeBlocks in each emitter headers.
No actual functionality change in this.
Our defines were never clear between what meant 64bit or x86_64
This makes a clear cut between bitness and architecture.
This commit also has the side effect of bringing up aarch64 compiling support.
Floating-point is complicated...
Some background: Denormals are floats that are too close to zero to be
stored in a normalized way (their exponent would need more bits). Since
they are stored unnormalized, they are hard to work with, even in
hardware. That's why both PowerPC and SSE can be configured to operate
in faster but non-standard-conpliant modes in which these numbers are
simply rounded ('flushed') to zero.
Internally, we do the same as the PowerPC CPU and store all floats in
double format. This means that for loading and storing singles we need a
conversion. The PowerPC CPU does this in hardware. We previously did
this using CVTSS2SD/CVTSD2SS. Unfortunately, these instructions are
considered arithmetic and therefore flush denormals to zero if non-IEEE
mode is active. This normally wouldn't be a problem since the next
arithmetic floating-point instruction would do the same anyway but as it
turns out some games actually use floating-point instructions for
copying arbitrary data.
My idea for fixing this problem was to use x87 instructions since the
x87 FPU never supported flush-to-zero and thus doesn't mangle denormals.
However, there is one more problem to deal with: SNaNs are automatically
converted to QNaNs (by setting the most-significant bit of the
fraction). I opted to fix this by manually resetting the QNaN bit of all
values with all-1s exponent.