Commit Graph

6997 Commits

Author SHA1 Message Date
Triang3l b1be33004a Merge branch 'master' into vulkan 2022-06-25 20:31:26 +03:00
Triang3l 4812b4ba8b [D3D12] Fix outdated color system constants comment [ci skip] 2022-06-25 20:31:05 +03:00
chss95cs@gmail.com 327cc9eff5 drastically reduce size of final generated code for rlwinm by adding special paths for rotations of 0, masks that discard the rotated bits and using And w/ UINT_MAX instead of truncate/zero extend
Add special case to TYPE_INT64's EmitAnd for UINT_MAX mask. Do mov32 to 32 if detected to take advantage of implicit zero xt/reg renaming

Add helper function for skipping assignment defs in instr.
Add helper function for checking if an opcode is binary value type
Add several new optimizations to simplificationpass, plus weak NZM calculation code (better full evaluation of Z/NZ will be done later) .
 List of optimizations:
  If a value is anded with a bitmask that it was already masked against, reuse the old value (this cuts out most FPSCR update garbage, although it does cause a local variable to be allocated for the masked FPSCR and it still repeatedly stores the masked value to the context)
  If masking a value that was or'ed against another check whether our mask only considers bits from one value or another. if so, change the operand to the OR input that actually matters
  If the only usage of a rotate left's output is an AND against a mask that discards the bits that were rotated in change the opcode to SHIFT_LEFT
  If masking against all ones, become an assign.
  If XOR or OR against 0, become an assign (additional FPSCR codegen cleanup)
  If XOR against all ones, become a NOT
Adding a direct CPUID check to x64_emitter for lzcnt, the version of xbyak we are using is skipping checking for lzcnt on all non-intel cpus, meaning we are generating the much slower bitscan path for AMD cpus.
2022-06-25 09:58:13 -07:00
Triang3l 5dca11a892 [SPIR-V] Fix fetch constant LOD bias signedness 2022-06-25 16:33:35 +03:00
Triang3l d8b0227cbd [SPIR-V] Fix cubemap X axis 2022-06-25 16:25:29 +03:00
Triang3l fdcbf67623 [Vulkan] Enable VK_KHR_sampler_ycbcr_conversion 2022-06-25 15:46:02 +03:00
Triang3l 758db4ccb3 [Vulkan] Fix textures not loaded if using a shader for the first time 2022-06-25 15:15:06 +03:00
Triang3l 4db445c6f9 Merge branch 'master' into vulkan 2022-06-25 15:13:41 +03:00
Triang3l aa45d7b47d [D3D12] More descriptive pipeline creation call comment [ci skip] 2022-06-25 15:13:11 +03:00
Triang3l c37c05d189 [Vulkan] Remove an outdated fullscreen shader comment [ci skip] 2022-06-25 14:35:15 +03:00
Triang3l 4b4205ba00 [Vulkan] Frontbuffer presentation 2022-06-25 14:33:43 +03:00
Triang3l 3fc7d8753c Merge branch 'master' into vulkan 2022-06-24 23:38:04 +03:00
Triang3l f4a634c617 [XeSL] xesl_write*Store > xesl_*Store 2022-06-24 23:37:29 +03:00
Triang3l 7a4732e14f [GPU] XeSL swap shaders 2022-06-24 23:24:30 +03:00
Gliniak 2b3686f0e9 [XAM] Set profile setting 'from' entry accordingly to setting existence 2022-06-24 10:10:52 +02:00
Triang3l b7737d70ca [D3D12] Update RequestSwapTexture resource state comment [ci skip] 2022-06-23 22:59:53 +03:00
Gliniak ce3b159683 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-22 21:05:45 +02:00
Triang3l 227d495738 Merge branch 'master' into vulkan 2022-06-22 21:19:29 +03:00
Triang3l e9f129f67f [GPU] Safer and more correct depth bias conversion
Float24-as-float32 depth bias is now in the increments of 8, because conversion of the depth to float24 directly in the pixel shaders may destroy the bias qualitatively otherwise if it's too small.
2022-06-22 21:14:40 +03:00
Triang3l a7885ae1a4 [GPU] Fix CPU-side float24 conversion broken recently 2022-06-22 20:47:44 +03:00
Triang3l 4514050f55 [Vulkan] Truncate depth to float24 in EDRAM range ownership transfers and resolves by default
Doesn't ruin the "greater or equal" depth test in subsequent rendering passes if precision is lost, unlike rounding to the nearest
2022-06-22 13:25:06 +03:00
Gliniak e7a122d943 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-22 12:18:13 +02:00
Triang3l 0d8bd0e0c6 Merge branch 'master' into vulkan 2022-06-22 13:15:50 +03:00
Triang3l cbf0476d42 [D3D12] Don't round float24 depth when it's known to be exact 2022-06-22 13:14:38 +03:00
Gliniak 83269315d8 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-22 12:06:42 +02:00
Triang3l 7869b080d3 [D3D12] Truncate depth to float24 in EDRAM range ownership transfers and resolves by default
Doesn't ruin the "greater or equal" depth test in subsequent rendering passes if precision is lost, unlike rounding to the nearest
2022-06-22 12:53:09 +03:00
Gliniak 87fd772393 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-21 07:54:44 +02:00
Radosław Gliński 9a72d6ab05
Merge pull request #46 from chrisps/canary_experimental
ome guest function calls can now be resolved and embedded directly in
2022-06-21 00:35:40 +02:00
chss95cs@gmail.com 549ee28a93 ome guest function calls can now be resolved and embedded directly in
the emitted asm as rel32 calls. Disabled by default, enabled via
resolve_rel32_guest_calls
detect whether cpu has fast jrcxz, fast loop/loope/loopne
much more thorough LoadConstantXMM
New cvar elide_e0_check that allows the backend to assume accesses via
the SP or TLS register will not cross into 0xe0 range
Add x64 codegen for Vector shift uint8
If has fast jrcxz use for some traptrue/breaktrue instructions
Use phat nops
Add cvar use_fast_dot_product, which uses a four instruction sequence
for both dot product instructions which ought to be equivalent. disabled
by default.
2022-06-20 15:08:18 -07:00
Triang3l c0703e64db Merge branch 'master' into vulkan 2022-06-20 22:40:19 +03:00
Triang3l e2f632f8fa [D3D12] Use udiv by constant tile size + minor transfer cleanup
Drivers compile that to a multiplication and a shift anyway.
2022-06-20 22:39:30 +03:00
Triang3l 0dc480721f [Vulkan] Render target resolving 2022-06-20 22:29:07 +03:00
Triang3l c6ec6d8239 [Vulkan] Use UDiv/UMod by constant tile size + minor transfer cleanup
Drivers compile that to a multiplication and a shift anyway.
2022-06-20 22:24:07 +03:00
Gliniak a4ff64c465 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-20 21:07:32 +02:00
Triang3l 61c4c49d76 Merge branch 'master' into vulkan 2022-06-20 12:34:41 +03:00
Triang3l 207e11c8d2 [GPU] Separate range arguments for fixed16 RG and RGBA in GetResolveInfo
On Vulkan, when snorm16 in unsupported, these formats may be emulated as float16, which natively can represent a wide range of numbers including -32 to 32 with blending. However, R16G16_SNORM and R16G16B16A16_SNORM are two separate formats, which may have different support on the device.
2022-06-20 12:29:45 +03:00
Triang3l 3b4845511d [Vulkan] Don't require an explicit uint64_t cast for SetDeviceObjectName 2022-06-20 12:25:52 +03:00
Triang3l 67ff108f53 [Vulkan] Explain why CreateShaderModule takes uint32_t* [ci skip] 2022-06-20 12:22:41 +03:00
Triang3l b61953374e [GPU] Make resolve EDRAM binding DS 0 and rename it
Ordering the descriptor sets by the change frequency on Vulkan, in increasing order (the opposite of D3D12 root signatures). The EDRAM binding never changes there (always one storage buffer), while the destination buffer binding may become changeable in the future (to split dispatches if exceeding `maxStorageBufferRange`, for example).
2022-06-20 12:15:52 +03:00
Triang3l 1200b205cf Merge branch 'master' into vulkan 2022-06-19 17:52:28 +03:00
Triang3l 9b83d3d0f4 [GPU] XeSL resolve shaders + host depth store width fix 2022-06-19 17:50:21 +03:00
Gliniak 1e369afa3d [Memory] Allocate system heap memory from bottom of heap last quarter
Aka. From 0x30000000
2022-06-17 22:23:39 +02:00
Gliniak 0b183a3582 Merge branch 'chris_cpu_changes' of https://github.com/Gliniak/xenia.git into canary_experimental 2022-06-17 14:04:58 +02:00
chrisps e4fd015886 Juicy optimization goodness 2022-06-17 14:03:24 +02:00
chss95cs@gmail.com 8a8ff6ae46 Reuse flag results in OPCODE_BRANCH_TRUE codegen if the preceding instruction was a comparison that already set the cpu flags 2022-06-17 11:13:49 +02:00
chss95cs@gmail.com 3675b3860a Add constant folding for OPCODE_ROTATE_LEFT 2022-06-17 11:12:49 +02:00
chrisps 3ad80810b5 Optimized CONVERT_I64_TO_F64 with neat overflow trick
Reduced instruction count from 11 to 8, eliminated a movq stall.
2022-06-17 11:10:48 +02:00
chrisps 9dfbef8acf Smaller ComputeMemoryAddress/Offset sequence
Replace a movzx after setae in both ComputeMemoryAddressOffset and ComputeMemoryAddress with a xor_ of eax prior to the cmp. This reduces the length in bytes of both sequences by 1, and should be a moderate ICache usage reduction thanks to the frequency of these sequences.
2022-06-17 11:10:27 +02:00
Gliniak c0483f8bee Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-17 10:58:15 +02:00
Triang3l 166be463be [XeSL] Metal Shading Language definitions 2022-06-16 21:39:16 +03:00