Commit Graph

5333 Commits

Author SHA1 Message Date
chrisps 08232de8cc
patch a mistake in NZM calculation for OPCODE_NOT 2022-06-26 09:30:56 -07:00
chss95cs@gmail.com 327cc9eff5 drastically reduce size of final generated code for rlwinm by adding special paths for rotations of 0, masks that discard the rotated bits and using And w/ UINT_MAX instead of truncate/zero extend
Add special case to TYPE_INT64's EmitAnd for UINT_MAX mask. Do mov32 to 32 if detected to take advantage of implicit zero xt/reg renaming

Add helper function for skipping assignment defs in instr.
Add helper function for checking if an opcode is binary value type
Add several new optimizations to simplificationpass, plus weak NZM calculation code (better full evaluation of Z/NZ will be done later) .
 List of optimizations:
  If a value is anded with a bitmask that it was already masked against, reuse the old value (this cuts out most FPSCR update garbage, although it does cause a local variable to be allocated for the masked FPSCR and it still repeatedly stores the masked value to the context)
  If masking a value that was or'ed against another check whether our mask only considers bits from one value or another. if so, change the operand to the OR input that actually matters
  If the only usage of a rotate left's output is an AND against a mask that discards the bits that were rotated in change the opcode to SHIFT_LEFT
  If masking against all ones, become an assign.
  If XOR or OR against 0, become an assign (additional FPSCR codegen cleanup)
  If XOR against all ones, become a NOT
Adding a direct CPUID check to x64_emitter for lzcnt, the version of xbyak we are using is skipping checking for lzcnt on all non-intel cpus, meaning we are generating the much slower bitscan path for AMD cpus.
2022-06-25 09:58:13 -07:00
Gliniak 2b3686f0e9 [XAM] Set profile setting 'from' entry accordingly to setting existence 2022-06-24 10:10:52 +02:00
Gliniak ce3b159683 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-22 21:05:45 +02:00
Triang3l e9f129f67f [GPU] Safer and more correct depth bias conversion
Float24-as-float32 depth bias is now in the increments of 8, because conversion of the depth to float24 directly in the pixel shaders may destroy the bias qualitatively otherwise if it's too small.
2022-06-22 21:14:40 +03:00
Triang3l a7885ae1a4 [GPU] Fix CPU-side float24 conversion broken recently 2022-06-22 20:47:44 +03:00
Gliniak e7a122d943 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-22 12:18:13 +02:00
Triang3l cbf0476d42 [D3D12] Don't round float24 depth when it's known to be exact 2022-06-22 13:14:38 +03:00
Gliniak 83269315d8 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-22 12:06:42 +02:00
Triang3l 7869b080d3 [D3D12] Truncate depth to float24 in EDRAM range ownership transfers and resolves by default
Doesn't ruin the "greater or equal" depth test in subsequent rendering passes if precision is lost, unlike rounding to the nearest
2022-06-22 12:53:09 +03:00
Gliniak 87fd772393 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-21 07:54:44 +02:00
chss95cs@gmail.com 549ee28a93 ome guest function calls can now be resolved and embedded directly in
the emitted asm as rel32 calls. Disabled by default, enabled via
resolve_rel32_guest_calls
detect whether cpu has fast jrcxz, fast loop/loope/loopne
much more thorough LoadConstantXMM
New cvar elide_e0_check that allows the backend to assume accesses via
the SP or TLS register will not cross into 0xe0 range
Add x64 codegen for Vector shift uint8
If has fast jrcxz use for some traptrue/breaktrue instructions
Use phat nops
Add cvar use_fast_dot_product, which uses a four instruction sequence
for both dot product instructions which ought to be equivalent. disabled
by default.
2022-06-20 15:08:18 -07:00
Triang3l e2f632f8fa [D3D12] Use udiv by constant tile size + minor transfer cleanup
Drivers compile that to a multiplication and a shift anyway.
2022-06-20 22:39:30 +03:00
Gliniak a4ff64c465 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-20 21:07:32 +02:00
Triang3l 207e11c8d2 [GPU] Separate range arguments for fixed16 RG and RGBA in GetResolveInfo
On Vulkan, when snorm16 in unsupported, these formats may be emulated as float16, which natively can represent a wide range of numbers including -32 to 32 with blending. However, R16G16_SNORM and R16G16B16A16_SNORM are two separate formats, which may have different support on the device.
2022-06-20 12:29:45 +03:00
Triang3l 3b4845511d [Vulkan] Don't require an explicit uint64_t cast for SetDeviceObjectName 2022-06-20 12:25:52 +03:00
Triang3l 67ff108f53 [Vulkan] Explain why CreateShaderModule takes uint32_t* [ci skip] 2022-06-20 12:22:41 +03:00
Triang3l b61953374e [GPU] Make resolve EDRAM binding DS 0 and rename it
Ordering the descriptor sets by the change frequency on Vulkan, in increasing order (the opposite of D3D12 root signatures). The EDRAM binding never changes there (always one storage buffer), while the destination buffer binding may become changeable in the future (to split dispatches if exceeding `maxStorageBufferRange`, for example).
2022-06-20 12:15:52 +03:00
Triang3l 9b83d3d0f4 [GPU] XeSL resolve shaders + host depth store width fix 2022-06-19 17:50:21 +03:00
Gliniak 1e369afa3d [Memory] Allocate system heap memory from bottom of heap last quarter
Aka. From 0x30000000
2022-06-17 22:23:39 +02:00
Gliniak 0b183a3582 Merge branch 'chris_cpu_changes' of https://github.com/Gliniak/xenia.git into canary_experimental 2022-06-17 14:04:58 +02:00
chrisps e4fd015886 Juicy optimization goodness 2022-06-17 14:03:24 +02:00
chss95cs@gmail.com 8a8ff6ae46 Reuse flag results in OPCODE_BRANCH_TRUE codegen if the preceding instruction was a comparison that already set the cpu flags 2022-06-17 11:13:49 +02:00
chss95cs@gmail.com 3675b3860a Add constant folding for OPCODE_ROTATE_LEFT 2022-06-17 11:12:49 +02:00
chrisps 3ad80810b5 Optimized CONVERT_I64_TO_F64 with neat overflow trick
Reduced instruction count from 11 to 8, eliminated a movq stall.
2022-06-17 11:10:48 +02:00
chrisps 9dfbef8acf Smaller ComputeMemoryAddress/Offset sequence
Replace a movzx after setae in both ComputeMemoryAddressOffset and ComputeMemoryAddress with a xor_ of eax prior to the cmp. This reduces the length in bytes of both sequences by 1, and should be a moderate ICache usage reduction thanks to the frequency of these sequences.
2022-06-17 11:10:27 +02:00
Gliniak c0483f8bee Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-17 10:58:15 +02:00
Triang3l 166be463be [XeSL] Metal Shading Language definitions 2022-06-16 21:39:16 +03:00
Gliniak e8aaddf4d5 Merge remote-tracking branch 'GliniakRepo/patchingSystem' into canary_experimental 2022-06-14 17:50:25 +02:00
Gliniak 91f43a374d Initial support for xex patching 2022-06-12 20:10:07 +02:00
Gliniak 945976a31d Added Premake Files For PatchingSystem 2022-06-12 19:58:12 +02:00
Triang3l 820b7ba217 [GPU] Fix GetActiveTextureHostSwizzle return type 2022-06-12 18:50:38 +03:00
Gliniak 90d67ac11c [Kernel] Return X_STATUS_END_OF_FILE for async file read when offset > file_size 2022-06-09 21:36:09 +02:00
Triang3l 78d1eb8bf8 [GPU] TextureCache::GetActiveTextureHostSwizzle 2022-06-09 21:34:21 +03:00
Gliniak d0175ddf2f [XAM] Cut handle mask from socket handles, added support for: NetDll_getsockopt
Only positive values should be interpreted as valid sockets!
2022-06-08 19:59:15 +02:00
Gliniak 25f3e16baa [Patcher] Fixed issue with incorrect patches endianness 2022-06-08 19:42:18 +02:00
Gliniak 0de0f40fb5 [XAM] Added stubs for:
- NetDll_XNetCreateKey
 - NetDll_XNetRegisterKey

This will allow certain games to run local multiplayer
For example PDZ Deathmatch mode
2022-06-07 20:46:47 +02:00
Triang3l 56f72da137 [GPU] More exact PWL texture/RT gamma conversion 2022-06-07 21:26:34 +03:00
Gliniak 916eb1b9bd [XAM] Scan every controller slot if provided flags contains USER_ANY flag 2022-06-07 15:52:41 +02:00
Margen67 5701823ccf Log title_name 2022-06-07 09:43:04 +02:00
jgoyvaerts 5296d2e91e Fix xenia.log file not always being created in the executable folder. 2022-06-07 09:41:52 +02:00
Gliniak c7da7e1999 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-02 22:19:43 +02:00
Triang3l 55a91afcc7 [D3D12] Don't decompress unaligned BC textures if supported 2022-06-02 22:48:03 +03:00
Triang3l 84fcd5defa [GPU] Fix resolve destination offset and extent calculation 2022-06-02 21:47:30 +03:00
Triang3l a9a072bf00 [GPU] Explain why a 32x32x4bpp linear texture takes 2 pages, not 1 [ci skip] 2022-06-01 13:00:23 +03:00
Triang3l 8bd244f277 [GPU] Better explanation for exact texture memory extent calculation [ci skip] 2022-06-01 12:55:16 +03:00
Gliniak 3169aa2ff3 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-01 08:45:21 +02:00
Triang3l d1ad10b98c [GPU] Primitive reset comment typo correction [ci skip] 2022-05-31 23:23:53 +03:00
Triang3l efd7ef212a [D3D12] 128 megatexel limit explanation based on the spec [ci skip] 2022-05-31 23:23:10 +03:00
Triang3l 25594c918c [GPU] Fix tiled texture memory extent calculation 2022-05-31 23:17:33 +03:00