xenia-canary

Commit Graph

Author	SHA1	Message	Date
Gliniak	0881725533	Merge remote-tracking branch 'GliniakRepo/const_prop_opcode_and_not' into canary_pr	2022-05-19 10:18:58 +02:00
Gliniak	75f0dfd6f3	Merge remote-tracking branch 'GliniakRepo/deleteFunctionsFromUnloadedModule' into canary_pr	2022-05-19 10:18:18 +02:00
Gliniak	5247220e73	Merge remote-tracking branch 'GliniakRepo/patchingSystem' into canary_pr	2022-05-19 10:01:33 +02:00
Gliniak	d78fd19ab4	Fixed incorrect hash generation + lint fixes	2022-04-29 20:33:21 +02:00
Wunkolo	be8b9c512f	[x64] Add GFNI optimization for SPLAT(int8) `pxor` is a zero-uop register-rename and `gf2p8affineqb dest, zero, int8` is a very quick single-instruction way to use affine galois transformations to fill a register with an immediate byte without touching memory.	2022-04-26 13:46:46 -05:00
Gliniak	67a0ccb7c0	[CPU] Unified assertions for unimplemented opcodes	2022-03-23 11:41:49 -05:00
Gliniak	0f2a7105b9	[CPU] Added constant propagation pass for: OPCODE_AND_NOT	2022-03-11 08:54:01 +01:00
Wunkolo	c1de37f381	[x64] Remove usage of `xbyak_bin2hex.h` C++ has had binary-literals since C++14. There is no need for these binary enum values from xbyak.	2022-03-08 12:18:58 -06:00
Wunkolo	f356cf5df8	[x64] Add `VECTOR_ROTATE_LEFT_I32` overflow-test Edit one of the lanes in this unit-test to be larger than the width of the element-size to ensure that this case is handled correctly. It should only mask the lower `log2(32)=5` bits of the input, causing `33`(`100001`) to be `1`(`000001`).	2022-03-08 12:18:58 -06:00
Wunkolo	337f0b2948	[x64] Add AVX512 optimization for `VECTOR_ROTATE_LEFT(Int32)` `vprolvd` is an almost 1:1 analog with this opcode and can be conditionally emitted when the host supports AVX512{F,VL}. Altivec docs say that `vrl{bhw}` masks the lower log2(n) bits of the element-size. [vprold](https://www.felixcloutier.com/x86/vprold:vprolvd:vprolq:vprolvq) modulos the shift-value by the element size in bits, which is the same as masking the lower log2(n) bits. So `vrlw` maps exactly to `vprold`.	2022-03-08 12:18:58 -06:00
Joel Linn	00e7de9297	[CPU] Improve vrsqrtefp accuracy	2022-02-16 17:09:28 -06:00
Joel Linn	d64848245d	[CPU] Improve vrefp accuracy	2022-02-16 17:09:28 -06:00
Wunkolo	ea992eda1f	[x64] Fix missing BMI2 emit-feature detection We only tested for BMI1 but not for BMI2, so we've been missing out on BMI2 performance gains for a little while. Oops.	2022-02-05 12:08:32 +03:00
TranzRail	1d51b574ec	[Kernel] Add PVR opcode (includes cvars support)	2022-01-29 02:44:55 -06:00
Wunkolo	24205ee860	[x64] Fix `VECTOR_SH{L,R,A}_V128(Int8)` masking [AltiVec](https://www.nxp.com/docs/en/reference-manual/ALTIVECPEM.pdf) doc says that it just uses the lower `log2(n)` bits of the shift-amount rather than the whole element-sized value. So there is no need to handle an overflow. Also adjusts 64-bit literals to utilize the explicit `UINT64_C` type.	2022-01-29 02:39:34 -06:00
Wunkolo	f8350b5536	[x64] Add `VECTOR_SH{R,L}_I8_SAME_CONSTANT` unit test This is to target the new GFNI-based optimization for the Int8 case.	2022-01-29 02:39:34 -06:00
Wunkolo	bd9a290b30	[x64] Add `GFNI`-based optimization for `VECTOR_SH{R,L}_V128(Int8)` In the `Int8` case of `VECTOR_SH{R,L}_V128`, when all the values are the same, then a single-instruction `gf2p8affineqb` can be emitted that does an int8-based arithmetic-shift, utilizing GF(8) arithmetic. More info here: https://wunkolo.github.io/post/2020/11/gf2p8affineqb-int8-shifting/ Also fixes the iteration-type for when detecting if all of the simd lanes are the same value(was iterating `u16` and not `u8`)	2022-01-29 02:39:34 -06:00
Wunkolo	f7c14a089d	[x64] Add host-extension detection preprocessor Rather than having a huge list of if-statements that all do the same thing, this preprocessor allows a more concise pattern to detecting if the emit-flag is enabled as well as the correlated Xbyak flag that it needs to check for to before allowing the feature-flag to be emitted. Also moved the AVX-check to the beginning to early-out rather than do a bunch of wasted work only to find out last that the host doesn't even support AVX.	2022-01-23 05:04:56 -06:00
Wunkolo	a9a365aa32	[x64] Add `GFNI`-based optimization for `VECTOR_SHA_V128(Int8)` In the `Int8` case of `VECTOR_SHA_V128`, when all the values are the same, then a single-instruction `gf2p8affineqb` can be emitted that does an int8-based arithmetic-shift, utilizing GF(8) arithmetic. More info here: https://wunkolo.github.io/post/2020/11/gf2p8affineqb-int8-shifting/ As of now(Dec 2021): Tremont(Lakefield), Jasper Lake, Ice lake, Tigerlake, and Rocket Lake support GNFI.	2022-01-13 15:32:55 -06:00
Wunkolo	fba23e3e75	[x64] Add `kX64EmitGFNI` emitter feature-flag This determines support for the `gf2p8affineqb` instruction. Even though `GFNI` is typically found with AVX512-enabled chips, it _is_ possible for there to be a chip with `GFNI` but does not support `AVX` or `AVX2` of any sort. An example of this is Tremont(Lakefield) chips as well as Jasper Lake. `13df339fe7/GenuineIntel/GenuineIntel00806A1_Lakefield_LC_InstLatX64.txt (L1297-L1299)` `13df339fe7/GenuineIntel/GenuineIntel00906C0_JasperLake_InstLatX64.txt (L1252-L1254)`	2022-01-13 15:32:55 -06:00
Wunkolo	5d1b53cd6f	[x64] Add `VECTOR_SHA_I8_SAME_CONSTANT` unit test This is to target the new GNFI-based optimization for the Int8 case.	2022-01-13 15:32:55 -06:00
Wunkolo	233ed107fe	[CPU] Remove `use_haswell_instructions` in favor of `x64_extension_mask` Rather than having a single bool to conditionally detect haswell-level instruction features. The granularity is increased with a new `x64_extension_mask` where individual features within the x64 backend can be turned on or off in a bit-mask manner. Since we have an ARM backend on the horizon, I've added this to the new `x64` configuration-group rather than `CPU`. This new pattern will hopefully allow for testing to be more targetted to certain processor features and allows the user to determine if they want certain features to be enabled or disabled(such as avoiding BMI2 on certain AMD processors due to pdep/pext being incredibly slow). The default configuration is to detect and utilize all available features.	2022-01-11 03:57:32 -06:00
Wunkolo	37aa3d129c	[x64] Explicitly handle AND_NOT `dest == src1` This addresses a JIT-issue in the case that the `src1` and `dest` register are both the same. This issue only happens in the "generic" x86 path but not in the BMI1-accelerated path. Thanks Rick for the extensive debugging help. When `src1` and `dest` were the same, then the `addc` instruction at `82099A08` in title `584108FF` might emit the following assembly: ``` .text:82099A08 andc r11, r10, r11 \| \| Jitted \| V 00000000A0011B15 mov rbx,r10 00000000A0011B18 not rbx 00000000A0011B1B and rbx,rbx ``` This was due to the src1 operand and the destination register being the same, which used to call the "else" case in the x64 emitter when it needs to be handled explicitly due to register aliasing/allocation. Addresses issue #1945	2022-01-10 15:48:49 -06:00
Wunkolo	4303f6b200	[x64] Fix OPCODE_AND_NOT src1-constant case Fix the the case where src1 is constant and src2 is non-constant causing an assert due to trying to call `.constant()` on the src2 operand. Interfaces with an issue Gliniak was encountering where title `4D53082D` encounters an assert. Also includes a BMI1-acceleration in the 64-bit case where a temporary register is needed(the `and` x86 instruction only supports immediate constants up to 32-bits).	2022-01-06 13:00:58 -06:00
Wunkolo	24d4e1e0e5	[x64] Add `BMI1`-based acceleration for `AndNot` In the case of having two register operands for `AndNot`, the `andn` instruction can be used when the host supports `BMI1`. `andn` only supports 32-bit and 64-bit operands, so some register up-casting is needed.	2022-01-04 16:16:49 -06:00
Wunkolo	3ab43d480d	[x64] Add `kX64EmitBMI1` feature-flag and detection The `BMI1 feature` fits into the current pattern of `use_haswell_instructions` as BMI1 was only introduced in haswell. Also moved the aliases to the end of the enum rather than interleave it with the bit definitions.	2022-01-04 16:16:49 -06:00
Wunkolo	0fdb855a11	[JIT, x64] Add and implement `OPCODE_AND_NOT` Verified the x64 implementation using `xenia-cpu-ppc-tests`.	2022-01-04 16:16:49 -06:00
Wunkolo	5317907523	[x64] Add `kX64EmitAVX512*` feature-flags Implements the detection of some baseline `AVX512` subsets and some common aliases into `X64EmitterFeatureFlags`. So far, `AVX512{F,VL,BW,DQ}` are the only subsets of `AVX512` that are detected with this PR since I anticipate these are the ones that will actually be used a lot in the x64 backend. Some aliases are also implemented such as `kX64EmitAVX512Ortho` which is `AVX512F` and `AVX512VL` combined which are the two subsets of AVX512 required to allow for `AVX512` operations upon `ymm` and `xmm` registers. These aliases can possibly be collapsed since we could just always require `AVX512VL` to be supported to allow for _any_ kind of `AVX512` to be used since we will practically always want to use `AVX512` on `xmm` registers at the very least as there is no use-case where we want to use the 512-bit `zmm` registers exclusively.	2022-01-02 11:52:31 -06:00
Wunkolo	1a8068b151	[Base] Add user-literals for several memory sizes Rather than using `n * 1024 * 1024`, this adds a convenient `_MiB`/`_KiB` user-literal to the new `literals.h` header to concisely describe units of memory in a much more readable way. Any other useful literals can be added to this header. These literals exist in the `xe::literals` namespace so they are opt-in, similar to `std::chrono` literals, and require a `using namespace xe::literals` statement to utilize it within the current scope. I've done a pass through the codebase to replace trivial instances of `1024 * 1024 * ...` expressions being used but avoided anything that added additional casting complexity from `size_t` to `uint32_t` and such to keep this commit concise.	2022-01-02 11:51:31 -06:00
Wunkolo	b64b4c6761	[x64] IsFeatureEnabled: Allow parallel feature checks Just checking if the resulting mask is non-zero means we cannot allow this function to check for multiple features in parallel. A hypothetical computer that supports FMA but not AVX2 will return `true` if you try to call `IsFeatureEnabled(kX64EmitFMA \| kX64EmitAVX2)`. We should make sure all the masked flags return `true` rather than check for non-zero. This is ramping up to allow for particular subsets of AVX512 to be checked for in parallel with a single function call.	2021-12-28 20:57:32 -06:00
Gliniak	371441ec3a	[XModule] Remove module and its functions while unloading	2021-12-27 09:18:44 +01:00
Triang3l	fdec0ab332	[Code] Make union usage more consistent	2021-11-03 20:45:09 +03:00
Triang3l	e720e0a540	[Code] Remove game names from code comments (most of at least)	2021-09-05 21:27:40 +03:00
Triang3l	6ce5330f5f	[UI] Loop thread to main thread WindowedAppContext	2021-08-28 19:38:24 +03:00
Triang3l	f540c188bf	[Lint] Revert incorrect clang-format changes	2021-08-26 21:18:18 +03:00
Triang3l	7edfdc2672	Merge branch 'master' into linux_windowing	2021-08-26 22:58:14 +03:00
gibbed	a3535be416	[CPU] Suppress C4065 warning in SyscallHandler.	2021-06-29 02:41:29 -05:00
gibbed	0cf4cab59b	[CPU] Add syscall handler.	2021-06-28 20:32:52 -05:00
gibbed	6c0d03fad3	[CPU] Reduce complexity of Value::Round.	2021-06-28 20:32:52 -05:00
Joel Linn	d87bf995e1	Satisfy linter Apply changes suggested by clang-format-12	2021-06-27 16:33:35 -05:00
emoose	c889a8af3f	[CPU] Load alt-title-ids XEX header into XexModule::opt_alternate_title_ids_	2021-06-25 23:48:25 -05:00
emoose	e3c14419f6	[CPU/XEX] Use correct size for XEXP-patched header buffer	2021-06-06 04:50:21 -05:00
Joel Linn	ceb382f8ec	Update Catch2 test framework - Use their main() method to fix command line options. Fix CLion testing - Change to correct tag syntax.	2021-06-02 22:28:43 -05:00
Joel Linn	fa7f292432	[CPU] `ResolveFunction()` Fix declaration mismatch	2021-06-02 22:28:43 -05:00
Joel Linn	5284075cf9	[CPU] Misc GCC build fixes	2021-06-02 22:28:43 -05:00
Joel Linn	86722be9ca	[Base] Make Arena alignment aware - Add align parameter - Templated Alloc() implicitly aligns type correctly - Rewind may leak padding that was added due to alignment	2021-06-02 22:28:43 -05:00
Joel Linn	a86d7173e1	Refactor FourCC magic uses - Use new fourcc_t type - Improves compiler compatibility by removing multi chars	2021-06-02 22:28:43 -05:00
Triang3l	ff23b1d9f9	[x64] LoadConstantXmm: Don't load -0 as +0 + cleanup	2021-05-16 14:26:57 +03:00
gibbed	38c3db1afb	[CPU/Kernel] Transparently byteswap xex2_version.	2021-05-01 17:29:05 -05:00
emoose	bb7c5b8266	[CPU/XEX] Move SecurityInfo conversion code to ReadSecurityInfo & call that during ApplyPatch	2021-01-31 23:18:54 -06:00

1 2 3 4 5 ...

1088 Commits