xenia-canary

Commit Graph

Author	SHA1	Message	Date
Gliniak	6730ffb7d3	Merge branch 'canary_experimental' of https://github.com/xenia-canary/xenia-canary into canary_experimental	2022-07-24 17:58:48 +02:00
Gliniak	6e501fbd61	[XAM] Set license mask for DLCs (Thanks Beeanyew)	2022-07-24 17:58:00 +02:00
Gliniak	98c2cb636f	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-24 17:38:08 +02:00
Triang3l	37579d3bf0	[GPU] Treat non-adaptive-tessellated patches as 1-control-point	2022-07-24 17:38:26 +03:00
chss95cs@gmail.com	33a6cfc0a7	Add special cases to DOT_PRODUCT_3/4 that detect whether they're calculating lengthsquared Add alternate path to DOT_PRODUCT_3/4 for use_fast_dot_product that skips all the status register stuff and just remaps inf to qnan Add OPCODE_TO_SINGLE to replace the CONVERT_F32_F64 - CONVERT_F64_F32 sequence we used to emit with the idea that a backend could implement a more correct rounding behavior if possible on its arch Remove some impossible sequences like MUL_HI_I8/I16, MUL_ADD_F32, DIV_V128. These instructions have no equivalent in PPC. Many other instructions are unused/dead code and should be removed to make the x64 backend a better reference for future ones Add backend_flags to Instr. Basically, flags field that a backend can use for whatever it wants when generating code. Add backend instr flag to x64 that tells it to not generate code for an instruction. this allows sequences to consume subsequent instructions Generate actual x64 code for VSL instruction instead of using callnativesafe Detect repeated COMPARE instructions w/ identical operands and reuse the results in FLAGS if so. this eliminates a ton of garbage compare/set instructions. If a COMPARE instructions destination is stored to context with no intervening instruction and no additional uses besides the store, do setx [ctx address] Detect prefetchw and use it in CACHE_CONTROL if prefetch for write is requested instead of doing prefetch to all cache levels Fixed an accident in an earlier commit by me, VECTOR_DENORMFLUSH was not being emitted at all, so denormal inputs to MUL_ADD_V128 were not becoming zero and outputs from DOT_PRODUCT_X were not either. I believe this introduced a bug into RDR where a wagon wouldnt spawn? (https://discord.com/channels/308194948048486401/308207592482668545/1000443975817252874) Compute fresx in double precision using RECIP_F64 and then round to single instead of doing (double)(1.0f / (float)value), matching original behavior better Refactor some of ppc_emit_fpu, much of the InstrEmit function are identical except for whether they round to single or not Added "tail emitters" to X64Emitter. These are callbacks that get invoked with their label and the X64Emitter after the epilog code. This allows us to move cold code out of the critical path and in the future place constant pools near functions guest_to_host_thunk/host_to_guest_thunk now gets directly rel32 called, instead of doing a mov Add X64BackendContext structure, represents data before the start of the PPCContext Instead of doing branchless sequence, do a compare and jump to tail emitted code for address translation. This makes converting addresses a 3 uop affair in most cases. Do qnan move for dot product in a tail emitter Detect whether EFLAGS bits are independent variables for the current cpu (not really detecting it ehe, just checking if zen) and if so generate inc/dec for add/sub 1 Detect whether low 32 bits of membase are 0. If they are then we can use membasereg.cvt32() in place of immediate 0 in many places, particularly in stores Detect LOAD MODIFY STORE pattern for context variables (currently only done for 64 bit ones) and turn them into modify [context ptr]. This is done for add, sub, and, or, xor, not, neg Tail emit error handling for TRAP opcodes Stub out unused trap opcodes like TRAP_TRUE_I32, TRAP_TRUE_I64, TRAP_TRUE_I16 (the call_true/return_true opcodes for these types are also probably unused) Remove BackpropTruncations. It was poorly written and causes crashes on the game Viva pinata (https://discord.com/channels/308194948048486401/701111856600711208/1000249460451983420)	2022-07-23 12:10:07 -07:00
Gliniak	1fcac00924	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-23 13:26:31 +02:00
Triang3l	3c12814276	[GPU] EDRAM looped addressing (resolves #2031 )	2022-07-22 23:51:50 +03:00
Gliniak	0c782ade8e	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-21 18:52:33 +02:00
Triang3l	6ff312afb1	[DXBC] Update PsParamGen comment [ci skip]	2022-07-21 12:42:06 +03:00
Triang3l	1a95bef8b3	[GPU] Eliminate unused shader I/O, UCP culling, centroid on Vulkan For more optimal usage of exports and the parameter cache on the host regardless of how effective the optimizations in the host GPU driver are. Also reserve space for Vulkan/Metal/D3D11-specific HostVertexShaderTypes to use one more bit for the host vertex shader type in the shader modification bits, so that won't have to be done in the future as that would require invalidating shader storages (which are invalidated by this commit) again.	2022-07-21 12:32:28 +03:00
Gliniak	0f60e23208	[Kernel] Removed input change notifications from initial notify list	2022-07-19 10:46:36 +02:00
Gliniak	bc315d21e0	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-19 10:45:14 +02:00
Triang3l	0a94b86cb8	[GPU] Remove orphaned GetPresentArea declaration [ci skip]	2022-07-18 21:02:34 +03:00
Gliniak	57b514ea6a	Removed (again) unnecessary include	2022-07-18 09:40:45 +02:00
Radosław Gliński	3757580f45	Merge pull request #52 from chrisps/canary_experimental Fix previous batch of CPU changes	2022-07-18 09:20:35 +02:00
Gliniak	fd78ab4dfc	[Patcher] Allow loading patches from non-utf8 paths	2022-07-18 08:46:04 +02:00
chss95cs@gmail.com	11817f0a3b	vshufps accident broke things, this fixes	2022-07-17 14:44:09 -07:00
Gliniak	6e1e62378f	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-17 21:27:52 +02:00
Triang3l	14fdf4b270	[GPU] Up to 7x7 resolution scaling	2022-07-17 20:41:50 +03:00
chss95cs@gmail.com	3717167bbe	Preload ThreeFloatMask in DOT_PRODUCT_3 Use shuffle_ps instead of broadcastss, broadcastss is slower on many intel and amd processors and encodes to the same number of bytes as shuffle_ps Detect and optimize away PERMUTE with a zero src2 and src3 in constant_propagation_pass instead of in the x64 sequence For constant PERMUTE, do the Xor/And prior to LoadConstantXmm instead of in the generated code Simplified code for PERMUTE Added simplification rule that detects (lzcnt(x) >> log2(bitsizeof_x)) == ( x == 0) Added set_srcN(value, idx) which can be used to set the nth source of an instruction, which makes more sense than having three different functions that only differ by the field they touch Added Value::VisitValueOperands for iterating all Value operands an instruction has. Add BackpropTruncations code to simplification_pass Changed the (void**) dereferences of raw_context that are done to grab thread_state to instead reference PPCContext and the thread_state field. Moved the thread_state field to the tail of PPCContext. Moved membase to the tail of PPCContext, since now it is reloaded very infrequently. Rearranged PPCContext so that the condition registers come first (most accesses to them cant get SSA'd), moved lr and ctr to after gp regs since they are not accessed as much as the main gpregs. This way the most frequently accessed registers will be accessible via a rel8 displacement instead of rel32 (ideally, we would have only certain CRs at the start, but xenia does pointer arithmetic on CR0's offset to get CRn) Use alignas(64) to ensure PPCContext's padding Map PPCContext specially so that the low 32 bits of the context register is 0xE0000000, for the 4k page offset check. Also allocate the page before, so that backends can store their own information that is not relevant to the PPCContext on that page and reference that data in the generated asm via 8-bit signed displ or 32-bit signed displ. Currently this page is not being utilized, but I plan on stashing some data critical to the x86 backend there Changed many wrong avx instructions, they worked but they were not intended for the data they operated on, meaning they transferred domains and caused 1-2 cycle stall each time Added SimdDomain checking/deduction to X64Emitter. Used SimdDomain code to fix a lot of float/int domain stalls Use the low 32 bits of the context register instead of constant 0xE0000000 in ComputeAddress Special path for SELECT_V128 with result of comparison that will use a blend instruction instead of and/or Many HIR optimizations added in simp pass A bunch of other stuff running out of time to write this msg	2022-07-17 09:52:40 -07:00
Triang3l	e8652e544a	[GPU] Translucent trace viewer controls	2022-07-17 17:29:41 +03:00
Triang3l	25663827ba	[GPU] Trace viewer Android content URI loading	2022-07-17 16:37:49 +03:00
Triang3l	624f2b2d9e	[Base] Android content URI file memory mapping	2022-07-17 16:34:17 +03:00
Triang3l	93a7918025	[Base] Android content URI file descriptor opening	2022-07-17 16:25:58 +03:00
Triang3l	34a952d789	[Base] Wrap strdup and strcasecmp in xe:: functions	2022-07-17 16:14:29 +03:00
chss95cs@gmail.com	6a612b4d34	remove useless tag field from hir::Value pack local_slot and constant in hir::Value Instead of loading membase at the start of every function, just load it in HostToGuestThunk vzeroupper in GuestToHostThunk before calling host function, and in HostToGuestThunk after calling function to prevent AVX dirty state slowdowns. In the future, check if CPU implements AVX as 128x2 and skip if so (https://john-h-k.github.io/VexTransitionPenalties.html) Remove useless save/restore of ctx pointer, nothing modifies it and it prevents cpus from doing cross-function memory renaming (https://www.agner.org/forum/viewtopic.php?t=41). Could not remove the space on stack because of alignment issues, instead turned it into GUEST_SCRATCH64 which is a temporary that sequences may use Reorder OpcodeInfo so that name is at offset 0, remove name and add GetOpcodeName function (name is only used for debug code, we are seperating frequently accessed data and rarely accessed data) Add VECTOR_DENORMFLUSH opcode for handling output to DOT_PRODUCT and other opcodes that implicitly force denormal inputs/outputs to zero, will eventually use for implementing NJM Rewrite sequences for LOAD_VECTOR_SHL/SHR. The mask with 0xf in it was pointless as all InstrEmit_ functions that create the load shift instructions do that in HIR. The tables are only used for nonzero constant inputs now, which are probably pretty rare. Instead of doing a shift and lookup, a base value is used for both in the constant table and adding/subtracting of the input is done Reuse result of LoadVectorShl/Shr in InstrEmit_stvlx_, InstrEmit_stvrx_. We were previously calculating it twice which was contributing to the final sequences' fatness. Use OPCODE_SELECT instead of the sequence of or, andnot, and that it was using for merging Add the proper unconditional denormal input flushing behavior to vfmadd, add it also to vfmsub (making the assumption it has the same behavior) Remove constant propagation for DOT_PRODUCT_3/4 DOT_PRODUCT_3/4 now returns a vector with all four elements set to the result. (what we were doing before, truncating to float32 and then splatting didnt make any sense) Add much more correct versions of DOT_PRODUCT_3/4, matching the Xb360's to 1 bit. Still needs work to be a perfect emulation. Add constant folding for OPCODE_SELECT, OPCODE_INSERT, OPCODE_PERMUTE, OPCODE_SWIZZLE Remove constant folding for DOT_PRODUCT Removed the multibyte nop code I committed earlier, it doesnt help us much because nops are only used for debug stuff and its ugly and wouldnt survive in a pr to main Check for AVX512BMI, use vpermb to shuffle if supported	2022-07-16 10:25:04 -07:00
Triang3l	500bbe9e0d	[Base] Use to_path for Android path argument loading	2022-07-16 13:42:04 +03:00
Triang3l	373b143049	[Base] Cvars from Android Bundle/Intent	2022-07-16 13:13:08 +03:00
chss95cs@gmail.com	71c5f8f0fa	Optimized GetScalarNZM, add limit to how far it can recurse. Add rlwinm elimination rule	2022-07-14 14:32:14 -07:00
Triang3l	415750252b	[Base] PosixMappedMemory: Close, Flush	2022-07-14 22:51:07 +03:00
Triang3l	65137e58bd	[Base] PosixMappedMemory: fd instead of stdio Android ContentResolver, which is needed for content:// URIs, provides file descriptors rather than stdio files	2022-07-14 22:11:46 +03:00
Triang3l	9fd63519bf	[Base] Make MappedMemory non-copyable	2022-07-14 22:04:06 +03:00
Triang3l	2a69d1db4d	[Vulkan] Fix a typo in a comment about BC textures [ci skip]	2022-07-14 21:16:23 +03:00
Triang3l	7b8281aee0	[UI] Android ImGui touch and mouse input	2022-07-14 21:13:40 +03:00
Triang3l	037310f8dc	[Android] Unified xenia-app with windowed apps and build prerequisites	2022-07-11 21:45:57 +03:00
Gliniak	1d00372e6b	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-10 10:50:39 +02:00
Triang3l	b41bb35a20	[SPIR-V] Make interpolators an array to fix Adreno linkage	2022-07-09 17:52:26 +03:00
Triang3l	b3edc56576	[Vulkan] Merge texture and sampler descriptors into a single descriptor set Put all descriptors used by translated shaders in up to 4 descriptor sets, which is the minimum required, and the most common on Android, `maxBoundDescriptorSets` device limit value	2022-07-09 17:10:28 +03:00
Gliniak	d33be73f3d	Fixed crash caused by hash calculation in specific cases	2022-07-08 08:49:43 +02:00
Triang3l	e4de8663c4	[Vulkan] All guest draw uniform buffer bindings in a single descriptor set Reduce the number of bound descriptor sets from 10 to 6, which is still above the minimum limit of 4, but closer	2022-07-07 21:05:56 +03:00
Triang3l	88c055eb30	[CPU] Null backend enough for GPU trace viewing	2022-07-06 23:28:06 +03:00
Triang3l	3ee68d79ea	Revert "[GPU] Make Processor optional for GraphicsSystem setup" The Processor is still required in many places, including the GPU command processor worker thread This reverts commit `fd03d886e9`.	2022-07-06 22:43:40 +03:00
Triang3l	6852e54937	[CPU] Remove intrinsics from dot product constant propagation	2022-07-06 21:32:56 +03:00
Triang3l	326e718035	[CPU] MMIO: Arm64, load register writes + exception cleanup	2022-07-06 21:05:05 +03:00
Triang3l	fd03d886e9	[GPU] Make Processor optional for GraphicsSystem setup	2022-07-05 21:21:22 +03:00
Triang3l	bdfd410b13	[CPU] Cleanup x64 backend usage conditionals	2022-07-05 21:07:10 +03:00
Triang3l	d263d508cd	[GPU] Make operator< const	2022-07-05 20:47:53 +03:00
Triang3l	536f14d94c	[GPU] Fix a typo in a Neon intrinsic name	2022-07-05 20:47:34 +03:00
Triang3l	d51fafd07c	[Base] Linux Arm64 exception handler	2022-07-05 20:46:49 +03:00
Triang3l	40aa73f7d7	[Linux] Swap read/write in x64 page fault handler + exception code cleanup	2022-07-04 23:51:26 +03:00
Triang3l	a9cbd9cc5f	[Linux] Update RIP after handling an exception	2022-07-04 23:24:26 +03:00
uytvbn	54aac81268	[Linux] Implement exception handler	2022-07-04 23:04:27 +03:00
Triang3l	35d4ea59c6	[Base] Remove exception_handler_linux.cc	2022-07-04 23:02:11 +03:00
Triang3l	feaad639fb	[Vulkan] Destroy all RTs before VulkanRenderTargetCache is destroyed	2022-07-04 11:27:51 +03:00
Gliniak	6e753c6399	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-04 08:11:04 +02:00
Triang3l	2621dabf0f	[Vulkan] Native 24-bit unorm depth where available	2022-07-03 21:21:17 +03:00
Triang3l	83e9984539	[Vulkan] Remove required feature checks Fallbacks for those will be added more or less soon, the stable version won't hard-require anything beyond 1.0 and the portability subset	2022-07-03 20:54:34 +03:00
Triang3l	bbae909fd7	[GPU] Reasons to keep non-Vulkan backends [ci skip]	2022-07-03 20:39:44 +03:00
Triang3l	ed61e15fc3	[App] Make D3D12 the default GPU backend on Windows again	2022-07-03 19:49:11 +03:00
Triang3l	ee84f4e267	[Vulkan] Update title bar warning	2022-07-03 19:45:48 +03:00
Triang3l	f7ef051025	[Vulkan] Disable validation by default	2022-07-03 19:42:22 +03:00
Triang3l	001f64852c	[Vulkan] VMA for textures	2022-07-03 19:40:48 +03:00
Gliniak	a8df744ea6	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-03 13:07:15 +02:00
Triang3l	636585e0aa	[Vulkan] Trace viewer	2022-07-01 19:53:41 +03:00
Triang3l	ad1ef84145	Merge branch 'master' into vulkan	2022-07-01 19:53:08 +03:00
Triang3l	e37e3ef382	[GPU] Display swap output in the trace viewer Resolve output is unreliable because resolving may be done to a subregion of a texture and even to 3D textures, and to any color format	2022-07-01 19:50:19 +03:00
Triang3l	c8a4a9504f	[Vulkan] Remove an unneeded scale from RefreshGuestOutput aspect ratio	2022-07-01 12:52:12 +03:00
Triang3l	d174762a40	Merge branch 'master' into vulkan	2022-07-01 12:51:34 +03:00
Triang3l	28670d8ec2	[UI] Presenter: Rename display size to aspect ratio	2022-07-01 12:50:45 +03:00
Triang3l	f8b351138e	[Vulkan] Alpha test	2022-06-30 22:20:51 +03:00
Triang3l	6772c88141	Merge branch 'master' into vulkan	2022-06-30 22:15:29 +03:00
Triang3l	7e691d5ef1	[DXBC] Handle NaN in not equal alpha test as passed	2022-06-30 22:15:01 +03:00
Triang3l	c0c3666e12	[Vulkan] Align texture extents in loading to vector size accessed by the shader Fixes loading of the 1x1 linear 8_8_8_8 texture containing just a single #FFFFFFFF texel in 4D5307E6, which is used for screen fade and the lobby map loading bar background	2022-06-29 23:41:32 +03:00
Triang3l	9392fff369	Merge branch 'master' into vulkan	2022-06-29 23:39:54 +03:00
Triang3l	a11b070fee	[GPU] Align texture extents in loading to host buffer texel size accessed by the shader	2022-06-29 23:38:06 +03:00
Triang3l	7c2df55209	[Vulkan] Cache clear: shared memory, scratch buffer	2022-06-29 13:24:45 +03:00
Triang3l	d5815d9e6a	[Vulkan] Float24 depth range remapping fixes	2022-06-29 13:14:00 +03:00
Gliniak	efe3cd96d6	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-29 09:21:09 +02:00
Triang3l	05ef7a273a	[Vulkan] Samplers (only 1.0 core features for now)	2022-06-28 22:42:18 +03:00
Triang3l	5d9061cf99	Merge branch 'master' into vulkan	2022-06-28 22:05:45 +03:00
Triang3l	243683d2e9	[GPU] Cleanup Texture::MarkAsUsed conditionals	2022-06-28 22:04:26 +03:00
Triang3l	382710bab7	[GPU] Normalize sampler clamp modes	2022-06-28 21:58:58 +03:00
Triang3l	cedc94679b	[GPU] Don't drop the rest of the command list if IssueDraw fails	2022-06-28 21:40:06 +03:00
chss95cs@gmail.com	3c06921cd4	Added optimizations for combining conditions together when their results are OR'ed Added recognition of impossible comparisons via NZM and optimize them away Recognize (x + -y) and transform to (x - y) for constants Recognize (~x ) + 1 and transform to -x Check and transform comparisons if theyre semantically equal to others Detect comparisons of single-bit values with their only possible non-zero value and transform to true/false tests Transform ==0 to IS_FALSE, !=0 to IS_TRUE Truncate to int8 if operand for IS_TRUE/IS_FALSE has a nzm of 1 Reduced code generated for SubDidCarry slightly Add special case for InstrEmit_srawix if mask == 1 Cut down the code generated for trap instructions, instead of naive or'ing or compare results do a switch and select the best condition Rerun simplification pass until no changes, as some optimizations will enable others to be done Enable rel32 call optimization by default	2022-06-26 12:49:04 -07:00
Gliniak	e6898fda66	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-26 20:11:33 +02:00
chrisps	08232de8cc	patch a mistake in NZM calculation for OPCODE_NOT	2022-06-26 09:30:56 -07:00
Triang3l	9672230d9f	Merge branch 'master' into vulkan	2022-06-26 18:59:49 +03:00
Triang3l	ec008463b6	[GPU] CrYCb/YCrCb border colors	2022-06-26 18:56:50 +03:00
Triang3l	2606fa5709	[GPU] Apply BaseMap MipFilter via samplers as it may be overridden Make it have no effect on the texture resource as a resource may be used with samplers with different overrides. Also make sure magnification vs. minification is not undefined with it on Direct3D 12.	2022-06-26 18:41:38 +03:00
Triang3l	e191430091	Merge branch 'master' into vulkan	2022-06-26 16:58:27 +03:00
Triang3l	086a070fa9	[GPU] Explicitly cast bit field values in std::min/max According to the integral promotion rules https://eel.is/c++draft/conv.prom#5.sentence-1 bit fields can be promoted to `int` if it's wide enough to store their value, and then otherwise, to `unsigned int`. Hopefully fixes Clang building (the `width_div_8` case).	2022-06-26 16:54:11 +03:00
Triang3l	e0b890fe5c	[DXBC] Remove alphatest/A2C with [earlydepthstencil]	2022-06-26 15:31:08 +03:00
Triang3l	6688b13773	[Vulkan] PsParamGen	2022-06-26 15:01:27 +03:00
Triang3l	a99a1be880	Merge branch 'master' into vulkan	2022-06-26 15:00:21 +03:00
Triang3l	b787f2dec1	[GPU] GPR count limit is 128, not 64	2022-06-26 14:45:49 +03:00
Triang3l	a5c8df7a37	[Vulkan] Remove UB-based independent blend logic On Vulkan, unlike Direct3D, not writing to a color target in the fragment shader produces an undefined result.	2022-06-25 20:57:44 +03:00
Triang3l	d8b2944caa	[Vulkan] Handle unsupported fillModeNonSolid + fix portability subset feature checks	2022-06-25 20:46:52 +03:00
Triang3l	d30d59883a	[Vulkan] Color exponent bias and gamma conversion	2022-06-25 20:35:13 +03:00
Triang3l	b1be33004a	Merge branch 'master' into vulkan	2022-06-25 20:31:26 +03:00
Triang3l	4812b4ba8b	[D3D12] Fix outdated color system constants comment [ci skip]	2022-06-25 20:31:05 +03:00

1 2 3 4 5 ...

6171 Commits