Fixes loading of the 1x1 linear 8_8_8_8 texture containing just a single #FFFFFFFF texel in 4D5307E6, which is used for screen fade and the lobby map loading bar background
Added recognition of impossible comparisons via NZM and optimize them away
Recognize (x + -y) and transform to (x - y) for constants
Recognize (~x ) + 1 and transform to -x
Check and transform comparisons if theyre semantically equal to others
Detect comparisons of single-bit values with their only possible non-zero value and transform to true/false tests
Transform ==0 to IS_FALSE, !=0 to IS_TRUE
Truncate to int8 if operand for IS_TRUE/IS_FALSE has a nzm of 1
Reduced code generated for SubDidCarry slightly
Add special case for InstrEmit_srawix if mask == 1
Cut down the code generated for trap instructions, instead of naive or'ing or compare results do a switch and select the best condition
Rerun simplification pass until no changes, as some optimizations will enable others to be done
Enable rel32 call optimization by default
Make it have no effect on the texture resource as a resource may be used with samplers with different overrides. Also make sure magnification vs. minification is not undefined with it on Direct3D 12.
According to the integral promotion rules https://eel.is/c++draft/conv.prom#5.sentence-1 bit fields can be promoted to `int` if it's wide enough to store their value, and then otherwise, to `unsigned int`. Hopefully fixes Clang building (the `width_div_8` case).
Add special case to TYPE_INT64's EmitAnd for UINT_MAX mask. Do mov32 to 32 if detected to take advantage of implicit zero xt/reg renaming
Add helper function for skipping assignment defs in instr.
Add helper function for checking if an opcode is binary value type
Add several new optimizations to simplificationpass, plus weak NZM calculation code (better full evaluation of Z/NZ will be done later) .
List of optimizations:
If a value is anded with a bitmask that it was already masked against, reuse the old value (this cuts out most FPSCR update garbage, although it does cause a local variable to be allocated for the masked FPSCR and it still repeatedly stores the masked value to the context)
If masking a value that was or'ed against another check whether our mask only considers bits from one value or another. if so, change the operand to the OR input that actually matters
If the only usage of a rotate left's output is an AND against a mask that discards the bits that were rotated in change the opcode to SHIFT_LEFT
If masking against all ones, become an assign.
If XOR or OR against 0, become an assign (additional FPSCR codegen cleanup)
If XOR against all ones, become a NOT
Adding a direct CPUID check to x64_emitter for lzcnt, the version of xbyak we are using is skipping checking for lzcnt on all non-intel cpus, meaning we are generating the much slower bitscan path for AMD cpus.