Commit Graph

5759 Commits

Author SHA1 Message Date
chss95cs@gmail.com 457296850e Add OPCODE_NEGATED_MUL_ADD/OPCODE_NEGATED_MUL_SUB
Proper handling of nans for VMX max/min on x64 (minps/maxps has special behavior depending on the operand order that vmx does not have for vminfp/vmaxfp)
Add extremely unintrusive guest code profiler utilizing KUSER_SHARED systemtime. This profiler is disabled on platforms other than windows, and on windows is disabled by default by a cvar
Repurpose GUEST_SCRATCH64 stack offset to instead be for storing guest function profile times, define GUEST_SCRATCH as 0 instead, since thats already meant to be a scratch area
Fix xenia silently closing on config errors/other fatal errors by setting has_console_attached_'s default to false
Add alternative code path for guest clock that uses kusershared systemtime instead of QueryPerformanceCounter. This is way faster and I have tested it and found it to be working, but i have disabled it because i do not know how well it works on wine or on processors other than mine
Significantly reduce log spam by setting XELOGAPU and XELOGGPU to be LogLevel::Debug
Changed some LOGI to LOGD in places to reduce log spam
Mark VdSwap as kHighFrequency, it was spamming up logs
Make logging calls less intrusive for the caller by forcing the test of log level inline and moving the format/AppendLogLine stuff to an outlined cold function
Add swcache namespace for software cache operations like prefetches, streaming stores and streaming loads.
Add XE_MSVC_REORDER_BARRIER for preventing msvc from propagating a value too close to its store or from its load
Add xe_unlikely_mutex for locks we know have very little contention
add XE_HOST_CACHE_LINE_SIZE and XE_RESTRICT to platform.h
Microoptimization: Changed most uses of size_t to ring_size_t in RingBuffer, this reduces the size of the inlined ringbuffer operations slightly by eliminating rex prefixes, depending on register allocation
Add BeginPrefetchedRead to ringbuffer, which prefetches the second range if there is one according to the provided PrefetchTag
added inline_loadclock cvar, which will directly use the value of the guest clock from clock.cc in jitted guest code. off by default
change uses of GUEST_SCRATCH64 to GUEST_SCRATCH
Add fast vectorized xenos_half_to_float/xenos_float_to_half (currently resides in x64_seq_vector, move to gpu code maybe at some point)
Add fast x64 codegen for PackFloat16_4/UnpackFloat16_4. Same code can be used for Float16_2 in future commit. This should speed up some games that use these functions heavily
Remove cvar for toggling old float16 behavior
Add VRSAVE register, support mfspr/mtspr vrsave
Add cvar for toggling off codegen for trap instructions and set it to true by default.
Add specialized methods to CommandProcessor: WriteRegistersFromMem, WriteRegisterRangeFromRing, and WriteOneRegisterFromRing. These reduce the overall cost of WriteRegister
Use a fixed size vmem vector for upload ranges, realloc/memsetting on resize  in the inner loop of requestranges was showing up on the profiler (the search in requestranges itself needs work)
Rename fixed_vmem_vector to better fit xenia's naming convention
Only log unknown register writes in WriteRegister if DEBUG :/. We're stuck on MSVC with c++17 so we have no way of influencing the branch ordering for that function without profile guided optimization
Remove binding stride assert in shader_translator.cc, triangle told me its leftover ogl stuff
Mark xe::FatalError as noreturn
If a controller is not connected, delay by 1.1 seconds before checking if it has been reconnected. Asking Xinput about a controller slot that is unused is extremely slow, and XinputGetState/SetState were taking up
an enormous amount of time in profiles. this may have caused a bit of input lag
Protect accesses to input_system with a lock
Add proper handling for user_index>= 4 in XamInputGetState/SetState, properly return zeroed state in GetState
Add missing argument to NtQueryVirtualMemory_entry
Fixed RtlCompareMemoryUlong_entry, it actually does not care if the source is misaligned, and for length it aligns down
Fixed RtlUpperChar and RtlLowerChar, added a table that has their correct return values precomputed
2022-08-20 11:40:19 -07:00
Gliniak e06978e5be [Premake] Cleanup & Fixed references in cpu-tests 2022-08-17 09:43:55 +02:00
Gliniak 0df92130e6 [Memory] Changed amount of kernel reserved pages.
This fixes flickering in games with resoultion scaling enabled
2022-08-15 17:51:29 +02:00
chss95cs@gmail.com 7cc364dcb8 squash reallocs in command buffers by using large prealloced buffer, directly use virtual memory with it so os allocs on demand
mark raw clock functions as noinline, the way msvc was inlining them and ordering the branches meant that rdtsc would often be speculatively executed
add alternative clock impl for win, instead of using queryperformancecounter we grab systemtime from kusershared. it does not have the same precision as queryperformancecounter, we only have 100 nanosecond precision, but we round to milliseconds so it never made sense to use the performance counter in the first place
stubbed out the "guest clock mutex"... (the entirety of clock.cc needs a rewrite)
added some helpers for minf/maxf without the nan handling behavior
2022-08-14 13:42:08 -07:00
chss95cs@gmail.com c9b2d10e17 alternative mutex impl on windows works but i really can't tell if it helps much. use larger size in deferred_command_list to cut down on resizes in big scenes on m:dur 2022-08-14 10:26:50 -07:00
chss95cs@gmail.com 08f7a28920 Alternative mutex 2022-08-14 08:59:11 -07:00
chss95cs@gmail.com 495b1f8bc8 once again return to spinloop 2022-08-13 14:05:35 -07:00
chss95cs@gmail.com c9e4119428 Add branch of ffmpeg with non-recursive split_radix_permutation
Add branch of disruptorplus with working blocking_wait_stategy
Switch back to blocking wait for timer queue
2022-08-13 13:43:45 -07:00
chss95cs@gmail.com 020d64a1a1 revert to using old bad spinwait, disruptorplus' blocking_wait code does not compile 2022-08-13 13:20:35 -07:00
chss95cs@gmail.com cb85fe401c Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90%
But for normal msvc builds i would put it at around 30-50%
Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up
Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger
fixed a number of errors where temporaries were being passed by reference/pointer
Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes.
Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold.
Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me.
Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it
Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total
Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time
Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op
Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x
For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases
^this can be toggled off in the platform_win header
handle indirect call true with constant function pointer, was occurring in h3
lookup host format swizzle in denser array
by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar
^looking up whether its known or not took approx 0.3% cpu time
Changed some things in /cpu to make the project UNITYBUILD friendly
The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead
tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds)
Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed
added support for docdecaduple precision floating point so that we can represent our performance gains numerically
tons of other stuff im probably forgetting
2022-08-13 12:59:00 -07:00
Radosław Gliński 2f59487bf3
Merge pull request #59 from Uraniumm/canary_experimental
Add nullptr check in CheckScalarConstCmp
2022-08-08 19:47:35 +02:00
Uraniumm a16acbaf59
add nullptr check to mitigate crashes
wip for reach untracked tags build fixes
2022-08-08 02:02:25 -04:00
chss95cs@gmail.com 324a8eb818 A bunch of fixes for division logic:
"turns out theres a lot of quirks with the div instructions we havent been covering
if the denom is 0, we jump to the end and mov eax/rax to dst, which is correct because ppc raises no exceptions for divide by 0 unlike x86
except we don't initialize eax before that jump, so whatever garbage from the previous sequence that has been left in eax/rax is what the result of the instruction will be
and then in our constant folding, we don't do the same zero check in Value::Div, so if we constant folded the denom to 0 we will host crash
the ppc manual says the result for a division by 0 is undefined, but in reality it seems it is always 0
there are a few posts i saw from googling about it, and tests on my rgh gave me 0, but then another issue came up
and that is that we dont check for signed overflow in our division, so we raise an exception if guest code ever does (1<<signbit_pos) / -1
signed overflow in division also produces 0 on ppc
the last thing is that if src2 is constant we skip the 0 check for division
without checking if its nonzero
all weird, likely very rare edge cases, except for maybe the signed overflow division
chrispy — Today at 9:51 AM
oh yeah, and because the int members of constantvalue are all signed ints, we were actually doing signed division always with constant folding"

fixed an earlier mistake by me with the precision of fresx
made some optimization disableable

implemented vkpkx
fixed possible bugs with vsr/vsl constant folding
disabled the nice imul code for now, there was a bug with int64 version and i dont have time to check
started on multiplication/addition/subtraction/division identities
Removed optimized VSL implementation, it's going to have to be rewritten anyway
Added ppc_ctx_t to xboxkrnl shim for direct context access
started working on KeSaveFloatingPointState, re'ed most of it
Exposed some more state/functionality to the kernel for implementing lower level routines like the save/restore ones
Add cvar to re-enable incorrect mxcsr behavior if a user doesnt care and wants better cpu performance
Stubbed out more impossible sequences, replace mul_hi_i32 with a 64 bit multiply
2022-08-07 10:41:26 -07:00
Gliniak f45e9e5e9a [Kernel] Improved handling of internal display resolution 2022-08-02 12:09:25 +02:00
Gliniak 0e1353aa71 Implemented Opcode: mcrf 2022-08-01 14:54:05 +02:00
chss95cs@gmail.com 968f656d96 Add separate VMX/fpu mxcsr
Add support for constant operands for most fpu instructions
Remove constant folding for most fpu cpde
half float
2022-07-31 08:56:36 -07:00
Gliniak 5d1b641197 [Emulator] Added possiblity to install multiple packages at once 2022-07-30 15:52:41 +02:00
Gliniak 79ffbe3971 Merge branch 'importContent' of https://github.com/Gliniak/xenia.git into canary_experimental 2022-07-30 12:44:24 +02:00
Gliniak 0e3403d6da Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-30 12:42:51 +02:00
Gliniak 433a8a8a5e [Emulator] Added option for content installation 2022-07-30 12:41:26 +02:00
Triang3l 7595cdb52b [Vulkan] Non-GS point sprites + minor SPIR-V fixes 2022-07-27 17:14:28 +03:00
Triang3l ff7ef05063 [SPIR-V] Clamp cube face using NClamp, not NMax/FMin 2022-07-26 17:08:12 +03:00
Triang3l 66c995f3aa [SPIR-V] Saturate point sprite coordinates 2022-07-26 17:04:22 +03:00
Triang3l 8fb5da18ea [Vulkan] Add forgotten fullDrawIndexUint32 check 2022-07-26 16:24:14 +03:00
Triang3l 9fa41c27bc [Vulkan] Point sprite geometry shader 2022-07-26 16:01:20 +03:00
Gliniak 0c3019981c [Video] Added option to set internal output resolution 2022-07-26 11:25:03 +02:00
Gliniak 76806e08c5 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-26 10:22:38 +02:00
Triang3l f248e23079 [DXBC] Skip backface check in point PsParamGen 2022-07-25 21:48:25 +03:00
Triang3l 77e85ecaa4 [Vulkan] 32-bit index fetch without fullDrawIndexUint32 2022-07-25 16:53:12 +03:00
Gliniak 061000af01 [Base] Changed size of bitstream accessed data (Risky)
This prevents crashing in situation when buffer_ + offset_bytes is
at the end of allocated memory range and can go into unallocated space
2022-07-25 10:52:21 +02:00
Gliniak 364137ef5f [XAM] Send UI On notification on start of XamShowSigninUI 2022-07-25 10:50:32 +02:00
Gliniak 6730ffb7d3 Merge branch 'canary_experimental' of https://github.com/xenia-canary/xenia-canary into canary_experimental 2022-07-24 17:58:48 +02:00
Gliniak 6e501fbd61 [XAM] Set license mask for DLCs (Thanks Beeanyew) 2022-07-24 17:58:00 +02:00
Gliniak 98c2cb636f Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-24 17:38:08 +02:00
Triang3l 37579d3bf0 [GPU] Treat non-adaptive-tessellated patches as 1-control-point 2022-07-24 17:38:26 +03:00
chss95cs@gmail.com 33a6cfc0a7 Add special cases to DOT_PRODUCT_3/4 that detect whether they're calculating lengthsquared
Add alternate path to DOT_PRODUCT_3/4 for use_fast_dot_product that skips all the status register stuff and just remaps inf to qnan
Add OPCODE_TO_SINGLE to replace the CONVERT_F32_F64 - CONVERT_F64_F32 sequence we used to emit with the idea that a backend could implement a more correct rounding behavior if possible on its arch
Remove some impossible sequences like MUL_HI_I8/I16, MUL_ADD_F32, DIV_V128. These instructions have no equivalent in PPC. Many other instructions are unused/dead code and should be removed to make the x64 backend a better reference for future ones
Add backend_flags to Instr. Basically, flags field that a backend can use for whatever it wants when generating code.
Add backend instr flag to x64 that tells it to not generate code for an instruction. this allows sequences to consume subsequent instructions
Generate actual x64 code for VSL instruction instead of using callnativesafe
Detect repeated COMPARE instructions w/ identical operands and reuse the results in FLAGS if so. this eliminates a ton of garbage compare/set instructions.
If a COMPARE instructions destination is stored to context with no intervening instruction and no additional uses besides the store, do setx [ctx address]
Detect prefetchw and use it in CACHE_CONTROL if prefetch for write is requested instead of doing prefetch to all cache levels
Fixed an accident in an earlier commit by me, VECTOR_DENORMFLUSH was not being emitted at all, so denormal inputs to MUL_ADD_V128 were not becoming zero and outputs from DOT_PRODUCT_X were not either. I believe this introduced a bug into RDR where a wagon wouldnt spawn? (https://discord.com/channels/308194948048486401/308207592482668545/1000443975817252874)
Compute fresx in double precision using RECIP_F64 and then round to single instead of doing (double)(1.0f / (float)value), matching original behavior better
Refactor some of ppc_emit_fpu, much of the InstrEmit function are identical except for whether they round to single or not
Added "tail emitters" to X64Emitter. These are callbacks that get invoked with their label and the X64Emitter after the epilog code. This allows us to move cold code out of the critical path and in the future place constant pools near functions
guest_to_host_thunk/host_to_guest_thunk now gets directly rel32 called, instead of doing a mov
Add X64BackendContext structure, represents data before the start of the PPCContext
Instead of doing branchless sequence, do a compare and jump to tail emitted code for address translation. This makes converting addresses a 3 uop affair in most cases.
Do qnan move for dot product in a tail emitter
Detect whether EFLAGS bits are independent variables for the current cpu (not really detecting it ehe, just checking if zen) and if so generate inc/dec for add/sub 1
Detect whether low 32 bits of membase are 0. If they are then we can use membasereg.cvt32() in place of immediate 0 in many places, particularly in stores
Detect LOAD MODIFY STORE pattern for context variables (currently only done for 64 bit ones) and turn them into modify [context ptr]. This is done for add, sub, and, or, xor, not, neg
Tail emit error handling for TRAP opcodes
Stub out unused trap opcodes like TRAP_TRUE_I32, TRAP_TRUE_I64, TRAP_TRUE_I16 (the call_true/return_true opcodes for these types are also probably unused)
Remove BackpropTruncations. It was poorly written and causes crashes on the game Viva pinata (https://discord.com/channels/308194948048486401/701111856600711208/1000249460451983420)
2022-07-23 12:10:07 -07:00
Gliniak 1fcac00924 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-23 13:26:31 +02:00
Triang3l 3c12814276 [GPU] EDRAM looped addressing (resolves #2031) 2022-07-22 23:51:50 +03:00
Gliniak 0c782ade8e Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-21 18:52:33 +02:00
Triang3l 6ff312afb1 [DXBC] Update PsParamGen comment [ci skip] 2022-07-21 12:42:06 +03:00
Triang3l 1a95bef8b3 [GPU] Eliminate unused shader I/O, UCP culling, centroid on Vulkan
For more optimal usage of exports and the parameter cache on the host regardless of how effective the optimizations in the host GPU driver are. Also reserve space for Vulkan/Metal/D3D11-specific HostVertexShaderTypes to use one more bit for the host vertex shader type in the shader modification bits, so that won't have to be done in the future as that would require invalidating shader storages (which are invalidated by this commit) again.
2022-07-21 12:32:28 +03:00
Gliniak 0f60e23208 [Kernel] Removed input change notifications from initial notify list 2022-07-19 10:46:36 +02:00
Gliniak bc315d21e0 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-19 10:45:14 +02:00
Triang3l 0a94b86cb8 [GPU] Remove orphaned GetPresentArea declaration [ci skip] 2022-07-18 21:02:34 +03:00
Gliniak 57b514ea6a Removed (again) unnecessary include 2022-07-18 09:40:45 +02:00
Radosław Gliński 3757580f45
Merge pull request #52 from chrisps/canary_experimental
Fix previous batch of CPU changes
2022-07-18 09:20:35 +02:00
Gliniak fd78ab4dfc [Patcher] Allow loading patches from non-utf8 paths 2022-07-18 08:46:04 +02:00
chss95cs@gmail.com 11817f0a3b vshufps accident broke things, this fixes 2022-07-17 14:44:09 -07:00
Gliniak 6e1e62378f Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-17 21:27:52 +02:00
Triang3l 14fdf4b270 [GPU] Up to 7x7 resolution scaling 2022-07-17 20:41:50 +03:00
chss95cs@gmail.com 3717167bbe Preload ThreeFloatMask in DOT_PRODUCT_3
Use shuffle_ps instead of broadcastss, broadcastss is slower on many intel and amd processors and encodes to the same number of bytes as shuffle_ps
Detect and optimize away PERMUTE with a zero src2 and src3 in constant_propagation_pass instead of in the x64 sequence
For constant PERMUTE, do the Xor/And prior to LoadConstantXmm instead of in the generated code
Simplified code for PERMUTE
Added simplification rule that detects (lzcnt(x) >> log2(bitsizeof_x)) == ( x == 0)
Added set_srcN(value, idx) which can be used to set the nth source of an instruction, which makes more sense than having three different functions that only differ by the field they touch
Added Value::VisitValueOperands for iterating all Value operands an instruction has.
Add BackpropTruncations code to simplification_pass
Changed the (void**) dereferences of raw_context that are done to grab thread_state to instead reference PPCContext and the thread_state field. Moved the thread_state field to the tail of PPCContext.
Moved membase to the tail of PPCContext, since now it is reloaded very infrequently.
Rearranged PPCContext so that the condition registers come first (most accesses to them cant get SSA'd), moved lr and ctr to after gp regs since they are not accessed as much as the main gpregs. This way the most frequently
accessed registers will be accessible via a rel8 displacement instead of rel32 (ideally, we would have only certain CRs at the start, but xenia does pointer arithmetic on CR0's offset to get CRn)
Use alignas(64) to ensure PPCContext's padding
Map PPCContext specially so that the low 32 bits of the context register is 0xE0000000, for the 4k page offset check. Also allocate the page before, so that backends can store their own information that is not relevant to the PPCContext on that page and
reference that data in the generated asm via 8-bit signed displ or 32-bit signed displ. Currently this page is not being utilized, but I plan on stashing some data critical to the x86 backend there
Changed many wrong avx instructions, they worked but they were not intended for the data they operated on, meaning they transferred domains and caused 1-2 cycle stall each time
Added SimdDomain checking/deduction to X64Emitter.
Used SimdDomain code to fix a lot of float/int domain stalls

Use the low 32 bits of the context register instead of constant 0xE0000000 in ComputeAddress
Special path for SELECT_V128 with result of comparison that will use a blend instruction instead of and/or
Many HIR optimizations added in simp pass
A bunch of other stuff running out of time to write this msg
2022-07-17 09:52:40 -07:00
Triang3l e8652e544a [GPU] Translucent trace viewer controls 2022-07-17 17:29:41 +03:00
Triang3l 25663827ba [GPU] Trace viewer Android content URI loading 2022-07-17 16:37:49 +03:00
Triang3l 624f2b2d9e [Base] Android content URI file memory mapping 2022-07-17 16:34:17 +03:00
Triang3l 93a7918025 [Base] Android content URI file descriptor opening 2022-07-17 16:25:58 +03:00
Triang3l 34a952d789 [Base] Wrap strdup and strcasecmp in xe:: functions 2022-07-17 16:14:29 +03:00
chss95cs@gmail.com 6a612b4d34 remove useless tag field from hir::Value
pack local_slot and constant in hir::Value
Instead of loading membase at the start of every function, just load it in HostToGuestThunk
vzeroupper in GuestToHostThunk before calling host function, and in HostToGuestThunk after calling function to prevent AVX dirty state slowdowns. In the future, check if CPU implements AVX as 128x2 and skip if so (https://john-h-k.github.io/VexTransitionPenalties.html)
Remove useless save/restore of ctx pointer, nothing modifies it and it prevents cpus from doing cross-function memory renaming (https://www.agner.org/forum/viewtopic.php?t=41). Could not remove the space on stack because of alignment issues, instead turned it into GUEST_SCRATCH64 which is a temporary that sequences may use
Reorder OpcodeInfo so that name is at offset 0, remove name and add GetOpcodeName function (name is only used for debug code, we are seperating frequently accessed data and rarely accessed data)
Add VECTOR_DENORMFLUSH opcode for handling output to DOT_PRODUCT and other opcodes that implicitly force denormal inputs/outputs to zero, will eventually use for implementing NJM
Rewrite sequences for LOAD_VECTOR_SHL/SHR. The mask with 0xf in it was pointless as all InstrEmit_ functions that create the load shift instructions do that in HIR. The tables are only used for nonzero constant inputs now, which are probably pretty rare. Instead of doing a shift and lookup, a base value is used for both in the constant table and adding/subtracting of the input is done
Reuse result of LoadVectorShl/Shr in InstrEmit_stvlx_, InstrEmit_stvrx_. We were previously calculating it twice which was contributing to the final sequences' fatness. Use OPCODE_SELECT instead of the sequence of or, andnot, and that it was using for merging
Add the proper unconditional denormal input flushing behavior to vfmadd, add it also to vfmsub (making the assumption it has the same behavior)
Remove constant propagation for DOT_PRODUCT_3/4
DOT_PRODUCT_3/4 now returns a vector with all four elements set to the result. (what we were doing before, truncating to float32 and then splatting didnt make any sense)
Add much more correct versions of DOT_PRODUCT_3/4, matching the Xb360's  to 1 bit. Still needs work to be a perfect emulation.
Add constant folding for OPCODE_SELECT, OPCODE_INSERT, OPCODE_PERMUTE, OPCODE_SWIZZLE
Remove constant folding for DOT_PRODUCT
Removed the multibyte nop code I committed earlier, it doesnt help us much because nops are only used for debug stuff and its ugly and wouldnt survive in a pr to main
Check for AVX512BMI, use vpermb to shuffle if supported
2022-07-16 10:25:04 -07:00
Triang3l 500bbe9e0d [Base] Use to_path for Android path argument loading 2022-07-16 13:42:04 +03:00
Triang3l 373b143049 [Base] Cvars from Android Bundle/Intent 2022-07-16 13:13:08 +03:00
chss95cs@gmail.com 71c5f8f0fa Optimized GetScalarNZM, add limit to how far it can recurse. Add rlwinm elimination rule 2022-07-14 14:32:14 -07:00
Triang3l 415750252b [Base] PosixMappedMemory: Close, Flush 2022-07-14 22:51:07 +03:00
Triang3l 65137e58bd [Base] PosixMappedMemory: fd instead of stdio
Android ContentResolver, which is needed for content:// URIs, provides file descriptors rather than stdio files
2022-07-14 22:11:46 +03:00
Triang3l 9fd63519bf [Base] Make MappedMemory non-copyable 2022-07-14 22:04:06 +03:00
Triang3l 2a69d1db4d [Vulkan] Fix a typo in a comment about BC textures [ci skip] 2022-07-14 21:16:23 +03:00
Triang3l 7b8281aee0 [UI] Android ImGui touch and mouse input 2022-07-14 21:13:40 +03:00
Triang3l 037310f8dc [Android] Unified xenia-app with windowed apps and build prerequisites 2022-07-11 21:45:57 +03:00
Gliniak 1d00372e6b Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-10 10:50:39 +02:00
Triang3l b41bb35a20 [SPIR-V] Make interpolators an array to fix Adreno linkage 2022-07-09 17:52:26 +03:00
Triang3l b3edc56576 [Vulkan] Merge texture and sampler descriptors into a single descriptor set
Put all descriptors used by translated shaders in up to 4 descriptor sets, which is the minimum required, and the most common on Android, `maxBoundDescriptorSets` device limit value
2022-07-09 17:10:28 +03:00
Gliniak d33be73f3d Fixed crash caused by hash calculation in specific cases 2022-07-08 08:49:43 +02:00
Triang3l e4de8663c4 [Vulkan] All guest draw uniform buffer bindings in a single descriptor set
Reduce the number of bound descriptor sets from 10 to 6, which is still above the minimum limit of 4, but closer
2022-07-07 21:05:56 +03:00
Triang3l 88c055eb30 [CPU] Null backend enough for GPU trace viewing 2022-07-06 23:28:06 +03:00
Triang3l 3ee68d79ea Revert "[GPU] Make Processor optional for GraphicsSystem setup"
The Processor is still required in many places, including the GPU command processor worker thread

This reverts commit fd03d886e9.
2022-07-06 22:43:40 +03:00
Triang3l 6852e54937 [CPU] Remove intrinsics from dot product constant propagation 2022-07-06 21:32:56 +03:00
Triang3l 326e718035 [CPU] MMIO: Arm64, load register writes + exception cleanup 2022-07-06 21:05:05 +03:00
Triang3l fd03d886e9 [GPU] Make Processor optional for GraphicsSystem setup 2022-07-05 21:21:22 +03:00
Triang3l bdfd410b13 [CPU] Cleanup x64 backend usage conditionals 2022-07-05 21:07:10 +03:00
Triang3l d263d508cd [GPU] Make operator< const 2022-07-05 20:47:53 +03:00
Triang3l 536f14d94c [GPU] Fix a typo in a Neon intrinsic name 2022-07-05 20:47:34 +03:00
Triang3l d51fafd07c [Base] Linux Arm64 exception handler 2022-07-05 20:46:49 +03:00
Triang3l 40aa73f7d7 [Linux] Swap read/write in x64 page fault handler + exception code cleanup 2022-07-04 23:51:26 +03:00
Triang3l a9cbd9cc5f [Linux] Update RIP after handling an exception 2022-07-04 23:24:26 +03:00
uytvbn 54aac81268 [Linux] Implement exception handler 2022-07-04 23:04:27 +03:00
Triang3l 35d4ea59c6 [Base] Remove exception_handler_linux.cc 2022-07-04 23:02:11 +03:00
Triang3l feaad639fb [Vulkan] Destroy all RTs before VulkanRenderTargetCache is destroyed 2022-07-04 11:27:51 +03:00
Gliniak 6e753c6399 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-04 08:11:04 +02:00
Triang3l 2621dabf0f [Vulkan] Native 24-bit unorm depth where available 2022-07-03 21:21:17 +03:00
Triang3l 83e9984539 [Vulkan] Remove required feature checks
Fallbacks for those will be added more or less soon, the stable version won't hard-require anything beyond 1.0 and the portability subset
2022-07-03 20:54:34 +03:00
Triang3l bbae909fd7 [GPU] Reasons to keep non-Vulkan backends [ci skip] 2022-07-03 20:39:44 +03:00
Triang3l ed61e15fc3 [App] Make D3D12 the default GPU backend on Windows again 2022-07-03 19:49:11 +03:00
Triang3l ee84f4e267 [Vulkan] Update title bar warning 2022-07-03 19:45:48 +03:00
Triang3l f7ef051025 [Vulkan] Disable validation by default 2022-07-03 19:42:22 +03:00
Triang3l 001f64852c [Vulkan] VMA for textures 2022-07-03 19:40:48 +03:00
Gliniak a8df744ea6 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-03 13:07:15 +02:00
Triang3l 636585e0aa [Vulkan] Trace viewer 2022-07-01 19:53:41 +03:00
Triang3l ad1ef84145 Merge branch 'master' into vulkan 2022-07-01 19:53:08 +03:00
Triang3l e37e3ef382 [GPU] Display swap output in the trace viewer
Resolve output is unreliable because resolving may be done to a subregion of a texture and even to 3D textures, and to any color format
2022-07-01 19:50:19 +03:00
Triang3l c8a4a9504f [Vulkan] Remove an unneeded scale from RefreshGuestOutput aspect ratio 2022-07-01 12:52:12 +03:00
Triang3l d174762a40 Merge branch 'master' into vulkan 2022-07-01 12:51:34 +03:00
Triang3l 28670d8ec2 [UI] Presenter: Rename display size to aspect ratio 2022-07-01 12:50:45 +03:00
Triang3l f8b351138e [Vulkan] Alpha test 2022-06-30 22:20:51 +03:00
Triang3l 6772c88141 Merge branch 'master' into vulkan 2022-06-30 22:15:29 +03:00
Triang3l 7e691d5ef1 [DXBC] Handle NaN in not equal alpha test as passed 2022-06-30 22:15:01 +03:00
Triang3l c0c3666e12 [Vulkan] Align texture extents in loading to vector size accessed by the shader
Fixes loading of the 1x1 linear 8_8_8_8 texture containing just a single #FFFFFFFF texel in 4D5307E6, which is used for screen fade and the lobby map loading bar background
2022-06-29 23:41:32 +03:00
Triang3l 9392fff369 Merge branch 'master' into vulkan 2022-06-29 23:39:54 +03:00
Triang3l a11b070fee [GPU] Align texture extents in loading to host buffer texel size accessed by the shader 2022-06-29 23:38:06 +03:00
Triang3l 7c2df55209 [Vulkan] Cache clear: shared memory, scratch buffer 2022-06-29 13:24:45 +03:00
Triang3l d5815d9e6a [Vulkan] Float24 depth range remapping fixes 2022-06-29 13:14:00 +03:00
Gliniak efe3cd96d6 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-29 09:21:09 +02:00
Triang3l 05ef7a273a [Vulkan] Samplers (only 1.0 core features for now) 2022-06-28 22:42:18 +03:00
Triang3l 5d9061cf99 Merge branch 'master' into vulkan 2022-06-28 22:05:45 +03:00
Triang3l 243683d2e9 [GPU] Cleanup Texture::MarkAsUsed conditionals 2022-06-28 22:04:26 +03:00
Triang3l 382710bab7 [GPU] Normalize sampler clamp modes 2022-06-28 21:58:58 +03:00
Triang3l cedc94679b [GPU] Don't drop the rest of the command list if IssueDraw fails 2022-06-28 21:40:06 +03:00
chss95cs@gmail.com 3c06921cd4 Added optimizations for combining conditions together when their results are OR'ed
Added recognition of impossible comparisons via NZM and optimize them away
Recognize (x + -y) and transform to (x - y) for constants
Recognize (~x ) + 1 and transform to -x
Check and transform comparisons if theyre semantically equal to others
Detect comparisons of single-bit values with their only possible non-zero value and transform to true/false tests
Transform ==0 to IS_FALSE, !=0 to IS_TRUE
Truncate to int8 if operand for IS_TRUE/IS_FALSE has a nzm of 1
Reduced code generated for SubDidCarry slightly
Add special case for InstrEmit_srawix if mask == 1
Cut down the code generated for trap instructions, instead of naive or'ing or compare results do a switch and select the best condition
Rerun simplification pass until no changes, as some optimizations will enable others to be done
Enable rel32 call optimization by default
2022-06-26 12:49:04 -07:00
Gliniak e6898fda66 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-26 20:11:33 +02:00
chrisps 08232de8cc
patch a mistake in NZM calculation for OPCODE_NOT 2022-06-26 09:30:56 -07:00
Triang3l 9672230d9f Merge branch 'master' into vulkan 2022-06-26 18:59:49 +03:00
Triang3l ec008463b6 [GPU] CrYCb/YCrCb border colors 2022-06-26 18:56:50 +03:00
Triang3l 2606fa5709 [GPU] Apply BaseMap MipFilter via samplers as it may be overridden
Make it have no effect on the texture resource as a resource may be used with samplers with different overrides. Also make sure magnification vs. minification is not undefined with it on Direct3D 12.
2022-06-26 18:41:38 +03:00
Triang3l e191430091 Merge branch 'master' into vulkan 2022-06-26 16:58:27 +03:00
Triang3l 086a070fa9 [GPU] Explicitly cast bit field values in std::min/max
According to the integral promotion rules https://eel.is/c++draft/conv.prom#5.sentence-1 bit fields can be promoted to `int` if it's wide enough to store their value, and then otherwise, to `unsigned int`. Hopefully fixes Clang building (the `width_div_8` case).
2022-06-26 16:54:11 +03:00
Triang3l e0b890fe5c [DXBC] Remove alphatest/A2C with [earlydepthstencil] 2022-06-26 15:31:08 +03:00
Triang3l 6688b13773 [Vulkan] PsParamGen 2022-06-26 15:01:27 +03:00
Triang3l a99a1be880 Merge branch 'master' into vulkan 2022-06-26 15:00:21 +03:00
Triang3l b787f2dec1 [GPU] GPR count limit is 128, not 64 2022-06-26 14:45:49 +03:00
Triang3l a5c8df7a37 [Vulkan] Remove UB-based independent blend logic
On Vulkan, unlike Direct3D, not writing to a color target in the fragment shader produces an undefined result.
2022-06-25 20:57:44 +03:00
Triang3l d8b2944caa [Vulkan] Handle unsupported fillModeNonSolid + fix portability subset feature checks 2022-06-25 20:46:52 +03:00
Triang3l d30d59883a [Vulkan] Color exponent bias and gamma conversion 2022-06-25 20:35:13 +03:00
Triang3l b1be33004a Merge branch 'master' into vulkan 2022-06-25 20:31:26 +03:00
Triang3l 4812b4ba8b [D3D12] Fix outdated color system constants comment [ci skip] 2022-06-25 20:31:05 +03:00
chss95cs@gmail.com 327cc9eff5 drastically reduce size of final generated code for rlwinm by adding special paths for rotations of 0, masks that discard the rotated bits and using And w/ UINT_MAX instead of truncate/zero extend
Add special case to TYPE_INT64's EmitAnd for UINT_MAX mask. Do mov32 to 32 if detected to take advantage of implicit zero xt/reg renaming

Add helper function for skipping assignment defs in instr.
Add helper function for checking if an opcode is binary value type
Add several new optimizations to simplificationpass, plus weak NZM calculation code (better full evaluation of Z/NZ will be done later) .
 List of optimizations:
  If a value is anded with a bitmask that it was already masked against, reuse the old value (this cuts out most FPSCR update garbage, although it does cause a local variable to be allocated for the masked FPSCR and it still repeatedly stores the masked value to the context)
  If masking a value that was or'ed against another check whether our mask only considers bits from one value or another. if so, change the operand to the OR input that actually matters
  If the only usage of a rotate left's output is an AND against a mask that discards the bits that were rotated in change the opcode to SHIFT_LEFT
  If masking against all ones, become an assign.
  If XOR or OR against 0, become an assign (additional FPSCR codegen cleanup)
  If XOR against all ones, become a NOT
Adding a direct CPUID check to x64_emitter for lzcnt, the version of xbyak we are using is skipping checking for lzcnt on all non-intel cpus, meaning we are generating the much slower bitscan path for AMD cpus.
2022-06-25 09:58:13 -07:00
Triang3l 5dca11a892 [SPIR-V] Fix fetch constant LOD bias signedness 2022-06-25 16:33:35 +03:00
Triang3l d8b0227cbd [SPIR-V] Fix cubemap X axis 2022-06-25 16:25:29 +03:00
Triang3l fdcbf67623 [Vulkan] Enable VK_KHR_sampler_ycbcr_conversion 2022-06-25 15:46:02 +03:00
Triang3l 758db4ccb3 [Vulkan] Fix textures not loaded if using a shader for the first time 2022-06-25 15:15:06 +03:00
Triang3l 4db445c6f9 Merge branch 'master' into vulkan 2022-06-25 15:13:41 +03:00
Triang3l aa45d7b47d [D3D12] More descriptive pipeline creation call comment [ci skip] 2022-06-25 15:13:11 +03:00
Triang3l c37c05d189 [Vulkan] Remove an outdated fullscreen shader comment [ci skip] 2022-06-25 14:35:15 +03:00
Triang3l 4b4205ba00 [Vulkan] Frontbuffer presentation 2022-06-25 14:33:43 +03:00
Triang3l 3fc7d8753c Merge branch 'master' into vulkan 2022-06-24 23:38:04 +03:00
Triang3l f4a634c617 [XeSL] xesl_write*Store > xesl_*Store 2022-06-24 23:37:29 +03:00
Triang3l 7a4732e14f [GPU] XeSL swap shaders 2022-06-24 23:24:30 +03:00
Gliniak 2b3686f0e9 [XAM] Set profile setting 'from' entry accordingly to setting existence 2022-06-24 10:10:52 +02:00
Triang3l b7737d70ca [D3D12] Update RequestSwapTexture resource state comment [ci skip] 2022-06-23 22:59:53 +03:00
Gliniak ce3b159683 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-22 21:05:45 +02:00
Triang3l 227d495738 Merge branch 'master' into vulkan 2022-06-22 21:19:29 +03:00
Triang3l e9f129f67f [GPU] Safer and more correct depth bias conversion
Float24-as-float32 depth bias is now in the increments of 8, because conversion of the depth to float24 directly in the pixel shaders may destroy the bias qualitatively otherwise if it's too small.
2022-06-22 21:14:40 +03:00
Triang3l a7885ae1a4 [GPU] Fix CPU-side float24 conversion broken recently 2022-06-22 20:47:44 +03:00
Triang3l 4514050f55 [Vulkan] Truncate depth to float24 in EDRAM range ownership transfers and resolves by default
Doesn't ruin the "greater or equal" depth test in subsequent rendering passes if precision is lost, unlike rounding to the nearest
2022-06-22 13:25:06 +03:00