In the `Int8` case of `VECTOR_SH{R,L}_V128`, when all the values are the
same, then a single-instruction `gf2p8affineqb` can be emitted that does
an int8-based arithmetic-shift, utilizing GF(8) arithmetic.
More info here:
https://wunkolo.github.io/post/2020/11/gf2p8affineqb-int8-shifting/
Also fixes the iteration-type for when detecting if all of the simd
lanes are the same value(was iterating `u16` and not `u8`)
#1333 has been closed, so the README should reflect that.
This line can be replaced with another suggestion from the good-first-issue tag.
I suggest contributing to #1526 in README.md
[XAM] Improvements to XamUserReadProfileSettingsEx/
XamUserWriteProfileSettings.
- Unify X_USER_READ_PROFILE_SETTING and X_USER_WRITE_PROFILE_SETTING
into X_USER_PROFILE_SETTING.
- Clean up Setting serialization to use X_USER_PROFILE_SETTING_DATA
instead of manual buffer copying.
- Fix XamUserReadProfileSettingsEx case where user_index is non-zero
and xuids are being used.
- Skip unset settings in XamUserWriteProfileSettings_entry.
Rather than having a huge list of if-statements that all do the same
thing, this preprocessor allows a more concise pattern to detecting if
the emit-flag is enabled as well as the correlated Xbyak flag that it
needs to check for to before allowing the feature-flag to be emitted.
Also moved the AVX-check to the beginning to early-out rather than do a
bunch of wasted work only to find out last that the host doesn't even
support AVX.
In the `Int8` case of `VECTOR_SHA_V128`, when all the values are the same, then a single-instruction `gf2p8affineqb` can be emitted that does an int8-based arithmetic-shift, utilizing GF(8) arithmetic.
More info here:
https://wunkolo.github.io/post/2020/11/gf2p8affineqb-int8-shifting/
As of now(Dec 2021): Tremont(Lakefield), Jasper Lake, Ice lake, Tigerlake, and Rocket Lake support GNFI.
Rather than having a single bool to conditionally detect haswell-level
instruction features. The granularity is increased with a new
`x64_extension_mask` where individual features within the x64 backend
can be turned on or off in a bit-mask manner. Since we have an ARM
backend on the horizon, I've added this to the new `x64`
configuration-group rather than `CPU`. This new pattern will hopefully
allow for testing to be more targetted to certain processor features and
allows the user to determine if they want certain features to be enabled
or disabled(such as avoiding BMI2 on certain AMD processors due to
pdep/pext being incredibly slow). The default configuration is to detect
and utilize all available features.
This addresses a JIT-issue in the case that the `src1` and `dest`
register are both the same. This issue only happens in the "generic"
x86 path but not in the BMI1-accelerated path.
Thanks Rick for the extensive debugging help.
When `src1` and `dest` were the same, then the `addc` instruction at
`82099A08` in title `584108FF` might emit the following assembly:
```
.text:82099A08 andc r11, r10, r11
|
| Jitted
|
V
00000000A0011B15 mov rbx,r10
00000000A0011B18 not rbx
00000000A0011B1B and rbx,rbx
```
This was due to the src1 operand and the destination register being the
same, which used to call the "else" case in the x64 emitter when it
needs to be handled explicitly due to register aliasing/allocation.
Addresses issue #1945
The wheel shader in 4D530910 does vfetch_full to r0 with the index from r0.x, and then vfetch_mini.
Thanks @Gliniak for the finding :3
Also small formatting cleanup in commented-out code.
Fix the the case where src1 is constant and src2 is non-constant causing
an assert due to trying to call `.constant()` on the src2 operand.
Interfaces with an issue Gliniak was encountering where title `4D53082D`
encounters an assert. Also includes a BMI1-acceleration in the 64-bit
case where a temporary register is needed(the `and` x86 instruction only
supports immediate constants up to 32-bits).
Remove unneeded init:
As far as I can tell this is a leftover from the appveyor.yml Reference: https://www.appveyor.com/docs/appveyor-yml/
Cleanup command blocks:
cmd is the default shell, so it doesn't have to be specified.
Use proper multi-line syntax for install.
Make configuration into one line:
This reduces line count, but is mainly personal preference.
In the case of having two register operands for `AndNot`, the `andn` instruction can be used when the host supports `BMI1`. `andn` only supports 32-bit and 64-bit operands, so some register up-casting is needed.