Commit Graph

6997 Commits

Author SHA1 Message Date
Gliniak 57b514ea6a Removed (again) unnecessary include 2022-07-18 09:40:45 +02:00
Radosław Gliński 3757580f45
Merge pull request #52 from chrisps/canary_experimental
Fix previous batch of CPU changes
2022-07-18 09:20:35 +02:00
Gliniak fd78ab4dfc [Patcher] Allow loading patches from non-utf8 paths 2022-07-18 08:46:04 +02:00
chss95cs@gmail.com 11817f0a3b vshufps accident broke things, this fixes 2022-07-17 14:44:09 -07:00
Gliniak 6e1e62378f Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-17 21:27:52 +02:00
Triang3l 14fdf4b270 [GPU] Up to 7x7 resolution scaling 2022-07-17 20:41:50 +03:00
chss95cs@gmail.com 3717167bbe Preload ThreeFloatMask in DOT_PRODUCT_3
Use shuffle_ps instead of broadcastss, broadcastss is slower on many intel and amd processors and encodes to the same number of bytes as shuffle_ps
Detect and optimize away PERMUTE with a zero src2 and src3 in constant_propagation_pass instead of in the x64 sequence
For constant PERMUTE, do the Xor/And prior to LoadConstantXmm instead of in the generated code
Simplified code for PERMUTE
Added simplification rule that detects (lzcnt(x) >> log2(bitsizeof_x)) == ( x == 0)
Added set_srcN(value, idx) which can be used to set the nth source of an instruction, which makes more sense than having three different functions that only differ by the field they touch
Added Value::VisitValueOperands for iterating all Value operands an instruction has.
Add BackpropTruncations code to simplification_pass
Changed the (void**) dereferences of raw_context that are done to grab thread_state to instead reference PPCContext and the thread_state field. Moved the thread_state field to the tail of PPCContext.
Moved membase to the tail of PPCContext, since now it is reloaded very infrequently.
Rearranged PPCContext so that the condition registers come first (most accesses to them cant get SSA'd), moved lr and ctr to after gp regs since they are not accessed as much as the main gpregs. This way the most frequently
accessed registers will be accessible via a rel8 displacement instead of rel32 (ideally, we would have only certain CRs at the start, but xenia does pointer arithmetic on CR0's offset to get CRn)
Use alignas(64) to ensure PPCContext's padding
Map PPCContext specially so that the low 32 bits of the context register is 0xE0000000, for the 4k page offset check. Also allocate the page before, so that backends can store their own information that is not relevant to the PPCContext on that page and
reference that data in the generated asm via 8-bit signed displ or 32-bit signed displ. Currently this page is not being utilized, but I plan on stashing some data critical to the x86 backend there
Changed many wrong avx instructions, they worked but they were not intended for the data they operated on, meaning they transferred domains and caused 1-2 cycle stall each time
Added SimdDomain checking/deduction to X64Emitter.
Used SimdDomain code to fix a lot of float/int domain stalls

Use the low 32 bits of the context register instead of constant 0xE0000000 in ComputeAddress
Special path for SELECT_V128 with result of comparison that will use a blend instruction instead of and/or
Many HIR optimizations added in simp pass
A bunch of other stuff running out of time to write this msg
2022-07-17 09:52:40 -07:00
Triang3l e8652e544a [GPU] Translucent trace viewer controls 2022-07-17 17:29:41 +03:00
Triang3l 273a489e2a [Android] Exclude executables from app build 2022-07-17 17:11:48 +03:00
Triang3l 23410d012d [Android] Update the target SDK to 33, NDK, and Gradle 2022-07-17 17:03:55 +03:00
Triang3l 50946e5c5f [Android] Don't require the storage permission
GPU traces in the trace viewer are accessed via the Storage Access Framework, which doesn't require the storage permissions. Games will be accessed using the SAF too (directories containing games as document trees with persistable URI permissions). The storage permissions are also not required for accessing the application's external or internal storage directory that will contain the save files.
2022-07-17 16:58:40 +03:00
Triang3l 8948b2b557 [Android] GPU trace viewer and launcher activities 2022-07-17 16:58:25 +03:00
Triang3l 421c9a80c5 [Android] Add missing `final` to surface callbacks 2022-07-17 16:42:33 +03:00
Triang3l 25663827ba [GPU] Trace viewer Android content URI loading 2022-07-17 16:37:49 +03:00
Triang3l 624f2b2d9e [Base] Android content URI file memory mapping 2022-07-17 16:34:17 +03:00
Triang3l 93a7918025 [Base] Android content URI file descriptor opening 2022-07-17 16:25:58 +03:00
Triang3l 34a952d789 [Base] Wrap strdup and strcasecmp in xe:: functions 2022-07-17 16:14:29 +03:00
Radosław Gliński 23ca3725c4
Merge pull request #50 from chrisps/canary_experimental
Ton of cpu changes
2022-07-16 19:39:45 +02:00
chss95cs@gmail.com 6a612b4d34 remove useless tag field from hir::Value
pack local_slot and constant in hir::Value
Instead of loading membase at the start of every function, just load it in HostToGuestThunk
vzeroupper in GuestToHostThunk before calling host function, and in HostToGuestThunk after calling function to prevent AVX dirty state slowdowns. In the future, check if CPU implements AVX as 128x2 and skip if so (https://john-h-k.github.io/VexTransitionPenalties.html)
Remove useless save/restore of ctx pointer, nothing modifies it and it prevents cpus from doing cross-function memory renaming (https://www.agner.org/forum/viewtopic.php?t=41). Could not remove the space on stack because of alignment issues, instead turned it into GUEST_SCRATCH64 which is a temporary that sequences may use
Reorder OpcodeInfo so that name is at offset 0, remove name and add GetOpcodeName function (name is only used for debug code, we are seperating frequently accessed data and rarely accessed data)
Add VECTOR_DENORMFLUSH opcode for handling output to DOT_PRODUCT and other opcodes that implicitly force denormal inputs/outputs to zero, will eventually use for implementing NJM
Rewrite sequences for LOAD_VECTOR_SHL/SHR. The mask with 0xf in it was pointless as all InstrEmit_ functions that create the load shift instructions do that in HIR. The tables are only used for nonzero constant inputs now, which are probably pretty rare. Instead of doing a shift and lookup, a base value is used for both in the constant table and adding/subtracting of the input is done
Reuse result of LoadVectorShl/Shr in InstrEmit_stvlx_, InstrEmit_stvrx_. We were previously calculating it twice which was contributing to the final sequences' fatness. Use OPCODE_SELECT instead of the sequence of or, andnot, and that it was using for merging
Add the proper unconditional denormal input flushing behavior to vfmadd, add it also to vfmsub (making the assumption it has the same behavior)
Remove constant propagation for DOT_PRODUCT_3/4
DOT_PRODUCT_3/4 now returns a vector with all four elements set to the result. (what we were doing before, truncating to float32 and then splatting didnt make any sense)
Add much more correct versions of DOT_PRODUCT_3/4, matching the Xb360's  to 1 bit. Still needs work to be a perfect emulation.
Add constant folding for OPCODE_SELECT, OPCODE_INSERT, OPCODE_PERMUTE, OPCODE_SWIZZLE
Remove constant folding for DOT_PRODUCT
Removed the multibyte nop code I committed earlier, it doesnt help us much because nops are only used for debug stuff and its ugly and wouldnt survive in a pr to main
Check for AVX512BMI, use vpermb to shuffle if supported
2022-07-16 10:25:04 -07:00
Triang3l 500bbe9e0d [Base] Use to_path for Android path argument loading 2022-07-16 13:42:04 +03:00
Triang3l 373b143049 [Base] Cvars from Android Bundle/Intent 2022-07-16 13:13:08 +03:00
Radosław Gliński 5f11c5d3b4
Merge pull request #49 from chrisps/canary_experimental
Optimized GetScalarNZM, add limit to how far it can recurse.
2022-07-15 00:03:17 +02:00
chss95cs@gmail.com 71c5f8f0fa Optimized GetScalarNZM, add limit to how far it can recurse. Add rlwinm elimination rule 2022-07-14 14:32:14 -07:00
Triang3l 415750252b [Base] PosixMappedMemory: Close, Flush 2022-07-14 22:51:07 +03:00
Triang3l 65137e58bd [Base] PosixMappedMemory: fd instead of stdio
Android ContentResolver, which is needed for content:// URIs, provides file descriptors rather than stdio files
2022-07-14 22:11:46 +03:00
Triang3l 9fd63519bf [Base] Make MappedMemory non-copyable 2022-07-14 22:04:06 +03:00
Triang3l 2a69d1db4d [Vulkan] Fix a typo in a comment about BC textures [ci skip] 2022-07-14 21:16:23 +03:00
Triang3l 7b8281aee0 [UI] Android ImGui touch and mouse input 2022-07-14 21:13:40 +03:00
Triang3l 3a065c35f0 [Android] -j, not ndk.jobs, in Gradle 2022-07-11 21:46:53 +03:00
Triang3l 037310f8dc [Android] Unified xenia-app with windowed apps and build prerequisites 2022-07-11 21:45:57 +03:00
Gliniak 1d00372e6b Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-10 10:50:39 +02:00
Triang3l b41bb35a20 [SPIR-V] Make interpolators an array to fix Adreno linkage 2022-07-09 17:52:26 +03:00
Triang3l b3edc56576 [Vulkan] Merge texture and sampler descriptors into a single descriptor set
Put all descriptors used by translated shaders in up to 4 descriptor sets, which is the minimum required, and the most common on Android, `maxBoundDescriptorSets` device limit value
2022-07-09 17:10:28 +03:00
Triang3l ff35a4b3a1 [Third-party] Revert premake-core downgrade caused by a merge 2022-07-09 13:43:53 +03:00
Gliniak d33be73f3d Fixed crash caused by hash calculation in specific cases 2022-07-08 08:49:43 +02:00
Triang3l e4de8663c4 [Vulkan] All guest draw uniform buffer bindings in a single descriptor set
Reduce the number of bound descriptor sets from 10 to 6, which is still above the minimum limit of 4, but closer
2022-07-07 21:05:56 +03:00
Triang3l 88c055eb30 [CPU] Null backend enough for GPU trace viewing 2022-07-06 23:28:06 +03:00
Triang3l 3ee68d79ea Revert "[GPU] Make Processor optional for GraphicsSystem setup"
The Processor is still required in many places, including the GPU command processor worker thread

This reverts commit fd03d886e9.
2022-07-06 22:43:40 +03:00
Triang3l 2507837e8d [Drone] Build cpu and gpu-trace-dump for Arm64 2022-07-06 21:34:51 +03:00
Triang3l 6852e54937 [CPU] Remove intrinsics from dot product constant propagation 2022-07-06 21:32:56 +03:00
Triang3l 326e718035 [CPU] MMIO: Arm64, load register writes + exception cleanup 2022-07-06 21:05:05 +03:00
Triang3l fd03d886e9 [GPU] Make Processor optional for GraphicsSystem setup 2022-07-05 21:21:22 +03:00
Triang3l bdfd410b13 [CPU] Cleanup x64 backend usage conditionals 2022-07-05 21:07:10 +03:00
Triang3l c1efd560fb [Drone] Enable newly buildable Android targets 2022-07-05 20:48:41 +03:00
Triang3l d263d508cd [GPU] Make operator< const 2022-07-05 20:47:53 +03:00
Triang3l 536f14d94c [GPU] Fix a typo in a Neon intrinsic name 2022-07-05 20:47:34 +03:00
Triang3l d51fafd07c [Base] Linux Arm64 exception handler 2022-07-05 20:46:49 +03:00
Triang3l 2d5602447e [Drone] Windowed apps are not yet ready for Android building (kind WindowedApp unsupported) 2022-07-04 21:28:39 +03:00
Triang3l 10819a4174 [Drone] Update Android targets 2022-07-04 23:53:55 +03:00
Triang3l 40aa73f7d7 [Linux] Swap read/write in x64 page fault handler + exception code cleanup 2022-07-04 23:51:26 +03:00