Commit Graph

6219 Commits

Author SHA1 Message Date
chss95cs@gmail.com 1ffd7ecae8 Remove vpcmov print 2022-08-21 12:40:56 -07:00
chss95cs@gmail.com b5ef3453c7 Disable most XOP code by default, the manual must be wrong for the shifts or we must be assembling them incorrectly, will return to it later and fix
comparisons and select done by xop are fine though
2022-08-21 12:32:33 -07:00
chss95cs@gmail.com b26c6ee1b8 Fix some more constant folding
fabsx does NOT set fpscr
turns out that our vector unsigned compare instructions are a bit wierd?
2022-08-21 10:27:54 -07:00
chss95cs@gmail.com 0ebc109d4d add initial xop codepaths, still need to finish the rest of the compares, and then do shifts, rotates, and PERMUTE
Add vector simplification pass, so far it only recognizes whether VECTOR_DENORMFLUSH is useless and optimizes them away
Tag restgplr/savegplr/restvmx/savevmx/restfpr/savefpr with useful information, i intend to inline them (they tend to be the most heavily called guest functions)
2022-08-21 08:55:42 -07:00
Gliniak da00ede181 [XAM/Settings] Check if provided size doesn't exceed maximal setting size 2022-08-21 17:46:00 +02:00
Radosław Gliński 0b013fdc6b
Merge pull request #61 from chrisps/canary_experimental
performance improvements, kernel fixes, cpu accuracy improvements
2022-08-21 09:31:09 +02:00
chss95cs@gmail.com d85bfc1894 Dont constant evaluate MAX with V128!
Fix signed zeroes behavior for vmaxfp emulation, was causing a block in sonic to move perpetually, very slowly
2022-08-20 14:22:05 -07:00
Gliniak 010b59e81c [Emulator] Install Content: Create header for installed packages
This fixes support for certain DLCs
2022-08-20 20:44:30 +02:00
Gliniak 469d062a50 [Emulator] Updated "Install Content" function to match PR status 2022-08-20 20:44:30 +02:00
Gliniak f19cb704aa [Emulator] Added error checking while creating directories 2022-08-20 20:44:30 +02:00
chss95cs@gmail.com 457296850e Add OPCODE_NEGATED_MUL_ADD/OPCODE_NEGATED_MUL_SUB
Proper handling of nans for VMX max/min on x64 (minps/maxps has special behavior depending on the operand order that vmx does not have for vminfp/vmaxfp)
Add extremely unintrusive guest code profiler utilizing KUSER_SHARED systemtime. This profiler is disabled on platforms other than windows, and on windows is disabled by default by a cvar
Repurpose GUEST_SCRATCH64 stack offset to instead be for storing guest function profile times, define GUEST_SCRATCH as 0 instead, since thats already meant to be a scratch area
Fix xenia silently closing on config errors/other fatal errors by setting has_console_attached_'s default to false
Add alternative code path for guest clock that uses kusershared systemtime instead of QueryPerformanceCounter. This is way faster and I have tested it and found it to be working, but i have disabled it because i do not know how well it works on wine or on processors other than mine
Significantly reduce log spam by setting XELOGAPU and XELOGGPU to be LogLevel::Debug
Changed some LOGI to LOGD in places to reduce log spam
Mark VdSwap as kHighFrequency, it was spamming up logs
Make logging calls less intrusive for the caller by forcing the test of log level inline and moving the format/AppendLogLine stuff to an outlined cold function
Add swcache namespace for software cache operations like prefetches, streaming stores and streaming loads.
Add XE_MSVC_REORDER_BARRIER for preventing msvc from propagating a value too close to its store or from its load
Add xe_unlikely_mutex for locks we know have very little contention
add XE_HOST_CACHE_LINE_SIZE and XE_RESTRICT to platform.h
Microoptimization: Changed most uses of size_t to ring_size_t in RingBuffer, this reduces the size of the inlined ringbuffer operations slightly by eliminating rex prefixes, depending on register allocation
Add BeginPrefetchedRead to ringbuffer, which prefetches the second range if there is one according to the provided PrefetchTag
added inline_loadclock cvar, which will directly use the value of the guest clock from clock.cc in jitted guest code. off by default
change uses of GUEST_SCRATCH64 to GUEST_SCRATCH
Add fast vectorized xenos_half_to_float/xenos_float_to_half (currently resides in x64_seq_vector, move to gpu code maybe at some point)
Add fast x64 codegen for PackFloat16_4/UnpackFloat16_4. Same code can be used for Float16_2 in future commit. This should speed up some games that use these functions heavily
Remove cvar for toggling old float16 behavior
Add VRSAVE register, support mfspr/mtspr vrsave
Add cvar for toggling off codegen for trap instructions and set it to true by default.
Add specialized methods to CommandProcessor: WriteRegistersFromMem, WriteRegisterRangeFromRing, and WriteOneRegisterFromRing. These reduce the overall cost of WriteRegister
Use a fixed size vmem vector for upload ranges, realloc/memsetting on resize  in the inner loop of requestranges was showing up on the profiler (the search in requestranges itself needs work)
Rename fixed_vmem_vector to better fit xenia's naming convention
Only log unknown register writes in WriteRegister if DEBUG :/. We're stuck on MSVC with c++17 so we have no way of influencing the branch ordering for that function without profile guided optimization
Remove binding stride assert in shader_translator.cc, triangle told me its leftover ogl stuff
Mark xe::FatalError as noreturn
If a controller is not connected, delay by 1.1 seconds before checking if it has been reconnected. Asking Xinput about a controller slot that is unused is extremely slow, and XinputGetState/SetState were taking up
an enormous amount of time in profiles. this may have caused a bit of input lag
Protect accesses to input_system with a lock
Add proper handling for user_index>= 4 in XamInputGetState/SetState, properly return zeroed state in GetState
Add missing argument to NtQueryVirtualMemory_entry
Fixed RtlCompareMemoryUlong_entry, it actually does not care if the source is misaligned, and for length it aligns down
Fixed RtlUpperChar and RtlLowerChar, added a table that has their correct return values precomputed
2022-08-20 11:40:19 -07:00
Gliniak e06978e5be [Premake] Cleanup & Fixed references in cpu-tests 2022-08-17 09:43:55 +02:00
Gliniak 0df92130e6 [Memory] Changed amount of kernel reserved pages.
This fixes flickering in games with resoultion scaling enabled
2022-08-15 17:51:29 +02:00
chss95cs@gmail.com 7cc364dcb8 squash reallocs in command buffers by using large prealloced buffer, directly use virtual memory with it so os allocs on demand
mark raw clock functions as noinline, the way msvc was inlining them and ordering the branches meant that rdtsc would often be speculatively executed
add alternative clock impl for win, instead of using queryperformancecounter we grab systemtime from kusershared. it does not have the same precision as queryperformancecounter, we only have 100 nanosecond precision, but we round to milliseconds so it never made sense to use the performance counter in the first place
stubbed out the "guest clock mutex"... (the entirety of clock.cc needs a rewrite)
added some helpers for minf/maxf without the nan handling behavior
2022-08-14 13:42:08 -07:00
chss95cs@gmail.com c9b2d10e17 alternative mutex impl on windows works but i really can't tell if it helps much. use larger size in deferred_command_list to cut down on resizes in big scenes on m:dur 2022-08-14 10:26:50 -07:00
chss95cs@gmail.com 08f7a28920 Alternative mutex 2022-08-14 08:59:11 -07:00
chss95cs@gmail.com 495b1f8bc8 once again return to spinloop 2022-08-13 14:05:35 -07:00
chss95cs@gmail.com c9e4119428 Add branch of ffmpeg with non-recursive split_radix_permutation
Add branch of disruptorplus with working blocking_wait_stategy
Switch back to blocking wait for timer queue
2022-08-13 13:43:45 -07:00
chss95cs@gmail.com 020d64a1a1 revert to using old bad spinwait, disruptorplus' blocking_wait code does not compile 2022-08-13 13:20:35 -07:00
chss95cs@gmail.com cb85fe401c Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90%
But for normal msvc builds i would put it at around 30-50%
Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up
Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger
fixed a number of errors where temporaries were being passed by reference/pointer
Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes.
Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold.
Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me.
Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it
Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total
Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time
Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op
Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x
For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases
^this can be toggled off in the platform_win header
handle indirect call true with constant function pointer, was occurring in h3
lookup host format swizzle in denser array
by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar
^looking up whether its known or not took approx 0.3% cpu time
Changed some things in /cpu to make the project UNITYBUILD friendly
The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead
tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds)
Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed
added support for docdecaduple precision floating point so that we can represent our performance gains numerically
tons of other stuff im probably forgetting
2022-08-13 12:59:00 -07:00
Radosław Gliński 2f59487bf3
Merge pull request #59 from Uraniumm/canary_experimental
Add nullptr check in CheckScalarConstCmp
2022-08-08 19:47:35 +02:00
Uraniumm a16acbaf59
add nullptr check to mitigate crashes
wip for reach untracked tags build fixes
2022-08-08 02:02:25 -04:00
chss95cs@gmail.com 324a8eb818 A bunch of fixes for division logic:
"turns out theres a lot of quirks with the div instructions we havent been covering
if the denom is 0, we jump to the end and mov eax/rax to dst, which is correct because ppc raises no exceptions for divide by 0 unlike x86
except we don't initialize eax before that jump, so whatever garbage from the previous sequence that has been left in eax/rax is what the result of the instruction will be
and then in our constant folding, we don't do the same zero check in Value::Div, so if we constant folded the denom to 0 we will host crash
the ppc manual says the result for a division by 0 is undefined, but in reality it seems it is always 0
there are a few posts i saw from googling about it, and tests on my rgh gave me 0, but then another issue came up
and that is that we dont check for signed overflow in our division, so we raise an exception if guest code ever does (1<<signbit_pos) / -1
signed overflow in division also produces 0 on ppc
the last thing is that if src2 is constant we skip the 0 check for division
without checking if its nonzero
all weird, likely very rare edge cases, except for maybe the signed overflow division
chrispy — Today at 9:51 AM
oh yeah, and because the int members of constantvalue are all signed ints, we were actually doing signed division always with constant folding"

fixed an earlier mistake by me with the precision of fresx
made some optimization disableable

implemented vkpkx
fixed possible bugs with vsr/vsl constant folding
disabled the nice imul code for now, there was a bug with int64 version and i dont have time to check
started on multiplication/addition/subtraction/division identities
Removed optimized VSL implementation, it's going to have to be rewritten anyway
Added ppc_ctx_t to xboxkrnl shim for direct context access
started working on KeSaveFloatingPointState, re'ed most of it
Exposed some more state/functionality to the kernel for implementing lower level routines like the save/restore ones
Add cvar to re-enable incorrect mxcsr behavior if a user doesnt care and wants better cpu performance
Stubbed out more impossible sequences, replace mul_hi_i32 with a 64 bit multiply
2022-08-07 10:41:26 -07:00
Gliniak f45e9e5e9a [Kernel] Improved handling of internal display resolution 2022-08-02 12:09:25 +02:00
Gliniak 0e1353aa71 Implemented Opcode: mcrf 2022-08-01 14:54:05 +02:00
chss95cs@gmail.com 968f656d96 Add separate VMX/fpu mxcsr
Add support for constant operands for most fpu instructions
Remove constant folding for most fpu cpde
half float
2022-07-31 08:56:36 -07:00
Gliniak 5d1b641197 [Emulator] Added possiblity to install multiple packages at once 2022-07-30 15:52:41 +02:00
Gliniak 79ffbe3971 Merge branch 'importContent' of https://github.com/Gliniak/xenia.git into canary_experimental 2022-07-30 12:44:24 +02:00
Gliniak 0e3403d6da Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-30 12:42:51 +02:00
Gliniak 433a8a8a5e [Emulator] Added option for content installation 2022-07-30 12:41:26 +02:00
Triang3l 7595cdb52b [Vulkan] Non-GS point sprites + minor SPIR-V fixes 2022-07-27 17:14:28 +03:00
Triang3l ff7ef05063 [SPIR-V] Clamp cube face using NClamp, not NMax/FMin 2022-07-26 17:08:12 +03:00
Triang3l 66c995f3aa [SPIR-V] Saturate point sprite coordinates 2022-07-26 17:04:22 +03:00
Triang3l 8fb5da18ea [Vulkan] Add forgotten fullDrawIndexUint32 check 2022-07-26 16:24:14 +03:00
Triang3l 9fa41c27bc [Vulkan] Point sprite geometry shader 2022-07-26 16:01:20 +03:00
Gliniak 0c3019981c [Video] Added option to set internal output resolution 2022-07-26 11:25:03 +02:00
Gliniak 76806e08c5 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-26 10:22:38 +02:00
Triang3l f248e23079 [DXBC] Skip backface check in point PsParamGen 2022-07-25 21:48:25 +03:00
Triang3l 77e85ecaa4 [Vulkan] 32-bit index fetch without fullDrawIndexUint32 2022-07-25 16:53:12 +03:00
Gliniak 061000af01 [Base] Changed size of bitstream accessed data (Risky)
This prevents crashing in situation when buffer_ + offset_bytes is
at the end of allocated memory range and can go into unallocated space
2022-07-25 10:52:21 +02:00
Gliniak 364137ef5f [XAM] Send UI On notification on start of XamShowSigninUI 2022-07-25 10:50:32 +02:00
Gliniak 6730ffb7d3 Merge branch 'canary_experimental' of https://github.com/xenia-canary/xenia-canary into canary_experimental 2022-07-24 17:58:48 +02:00
Gliniak 6e501fbd61 [XAM] Set license mask for DLCs (Thanks Beeanyew) 2022-07-24 17:58:00 +02:00
Gliniak 98c2cb636f Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-24 17:38:08 +02:00
Triang3l 37579d3bf0 [GPU] Treat non-adaptive-tessellated patches as 1-control-point 2022-07-24 17:38:26 +03:00
chss95cs@gmail.com 33a6cfc0a7 Add special cases to DOT_PRODUCT_3/4 that detect whether they're calculating lengthsquared
Add alternate path to DOT_PRODUCT_3/4 for use_fast_dot_product that skips all the status register stuff and just remaps inf to qnan
Add OPCODE_TO_SINGLE to replace the CONVERT_F32_F64 - CONVERT_F64_F32 sequence we used to emit with the idea that a backend could implement a more correct rounding behavior if possible on its arch
Remove some impossible sequences like MUL_HI_I8/I16, MUL_ADD_F32, DIV_V128. These instructions have no equivalent in PPC. Many other instructions are unused/dead code and should be removed to make the x64 backend a better reference for future ones
Add backend_flags to Instr. Basically, flags field that a backend can use for whatever it wants when generating code.
Add backend instr flag to x64 that tells it to not generate code for an instruction. this allows sequences to consume subsequent instructions
Generate actual x64 code for VSL instruction instead of using callnativesafe
Detect repeated COMPARE instructions w/ identical operands and reuse the results in FLAGS if so. this eliminates a ton of garbage compare/set instructions.
If a COMPARE instructions destination is stored to context with no intervening instruction and no additional uses besides the store, do setx [ctx address]
Detect prefetchw and use it in CACHE_CONTROL if prefetch for write is requested instead of doing prefetch to all cache levels
Fixed an accident in an earlier commit by me, VECTOR_DENORMFLUSH was not being emitted at all, so denormal inputs to MUL_ADD_V128 were not becoming zero and outputs from DOT_PRODUCT_X were not either. I believe this introduced a bug into RDR where a wagon wouldnt spawn? (https://discord.com/channels/308194948048486401/308207592482668545/1000443975817252874)
Compute fresx in double precision using RECIP_F64 and then round to single instead of doing (double)(1.0f / (float)value), matching original behavior better
Refactor some of ppc_emit_fpu, much of the InstrEmit function are identical except for whether they round to single or not
Added "tail emitters" to X64Emitter. These are callbacks that get invoked with their label and the X64Emitter after the epilog code. This allows us to move cold code out of the critical path and in the future place constant pools near functions
guest_to_host_thunk/host_to_guest_thunk now gets directly rel32 called, instead of doing a mov
Add X64BackendContext structure, represents data before the start of the PPCContext
Instead of doing branchless sequence, do a compare and jump to tail emitted code for address translation. This makes converting addresses a 3 uop affair in most cases.
Do qnan move for dot product in a tail emitter
Detect whether EFLAGS bits are independent variables for the current cpu (not really detecting it ehe, just checking if zen) and if so generate inc/dec for add/sub 1
Detect whether low 32 bits of membase are 0. If they are then we can use membasereg.cvt32() in place of immediate 0 in many places, particularly in stores
Detect LOAD MODIFY STORE pattern for context variables (currently only done for 64 bit ones) and turn them into modify [context ptr]. This is done for add, sub, and, or, xor, not, neg
Tail emit error handling for TRAP opcodes
Stub out unused trap opcodes like TRAP_TRUE_I32, TRAP_TRUE_I64, TRAP_TRUE_I16 (the call_true/return_true opcodes for these types are also probably unused)
Remove BackpropTruncations. It was poorly written and causes crashes on the game Viva pinata (https://discord.com/channels/308194948048486401/701111856600711208/1000249460451983420)
2022-07-23 12:10:07 -07:00
Gliniak 1fcac00924 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-23 13:26:31 +02:00
Triang3l 3c12814276 [GPU] EDRAM looped addressing (resolves #2031) 2022-07-22 23:51:50 +03:00
Gliniak 0c782ade8e Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-21 18:52:33 +02:00
Triang3l 6ff312afb1 [DXBC] Update PsParamGen comment [ci skip] 2022-07-21 12:42:06 +03:00
Triang3l 1a95bef8b3 [GPU] Eliminate unused shader I/O, UCP culling, centroid on Vulkan
For more optimal usage of exports and the parameter cache on the host regardless of how effective the optimizations in the host GPU driver are. Also reserve space for Vulkan/Metal/D3D11-specific HostVertexShaderTypes to use one more bit for the host vertex shader type in the shader modification bits, so that won't have to be done in the future as that would require invalidating shader storages (which are invalidated by this commit) again.
2022-07-21 12:32:28 +03:00
Gliniak 0f60e23208 [Kernel] Removed input change notifications from initial notify list 2022-07-19 10:46:36 +02:00
Gliniak bc315d21e0 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-19 10:45:14 +02:00
Triang3l 0a94b86cb8 [GPU] Remove orphaned GetPresentArea declaration [ci skip] 2022-07-18 21:02:34 +03:00
Gliniak 57b514ea6a Removed (again) unnecessary include 2022-07-18 09:40:45 +02:00
Radosław Gliński 3757580f45
Merge pull request #52 from chrisps/canary_experimental
Fix previous batch of CPU changes
2022-07-18 09:20:35 +02:00
Gliniak fd78ab4dfc [Patcher] Allow loading patches from non-utf8 paths 2022-07-18 08:46:04 +02:00
chss95cs@gmail.com 11817f0a3b vshufps accident broke things, this fixes 2022-07-17 14:44:09 -07:00
Gliniak 6e1e62378f Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-17 21:27:52 +02:00
Triang3l 14fdf4b270 [GPU] Up to 7x7 resolution scaling 2022-07-17 20:41:50 +03:00
chss95cs@gmail.com 3717167bbe Preload ThreeFloatMask in DOT_PRODUCT_3
Use shuffle_ps instead of broadcastss, broadcastss is slower on many intel and amd processors and encodes to the same number of bytes as shuffle_ps
Detect and optimize away PERMUTE with a zero src2 and src3 in constant_propagation_pass instead of in the x64 sequence
For constant PERMUTE, do the Xor/And prior to LoadConstantXmm instead of in the generated code
Simplified code for PERMUTE
Added simplification rule that detects (lzcnt(x) >> log2(bitsizeof_x)) == ( x == 0)
Added set_srcN(value, idx) which can be used to set the nth source of an instruction, which makes more sense than having three different functions that only differ by the field they touch
Added Value::VisitValueOperands for iterating all Value operands an instruction has.
Add BackpropTruncations code to simplification_pass
Changed the (void**) dereferences of raw_context that are done to grab thread_state to instead reference PPCContext and the thread_state field. Moved the thread_state field to the tail of PPCContext.
Moved membase to the tail of PPCContext, since now it is reloaded very infrequently.
Rearranged PPCContext so that the condition registers come first (most accesses to them cant get SSA'd), moved lr and ctr to after gp regs since they are not accessed as much as the main gpregs. This way the most frequently
accessed registers will be accessible via a rel8 displacement instead of rel32 (ideally, we would have only certain CRs at the start, but xenia does pointer arithmetic on CR0's offset to get CRn)
Use alignas(64) to ensure PPCContext's padding
Map PPCContext specially so that the low 32 bits of the context register is 0xE0000000, for the 4k page offset check. Also allocate the page before, so that backends can store their own information that is not relevant to the PPCContext on that page and
reference that data in the generated asm via 8-bit signed displ or 32-bit signed displ. Currently this page is not being utilized, but I plan on stashing some data critical to the x86 backend there
Changed many wrong avx instructions, they worked but they were not intended for the data they operated on, meaning they transferred domains and caused 1-2 cycle stall each time
Added SimdDomain checking/deduction to X64Emitter.
Used SimdDomain code to fix a lot of float/int domain stalls

Use the low 32 bits of the context register instead of constant 0xE0000000 in ComputeAddress
Special path for SELECT_V128 with result of comparison that will use a blend instruction instead of and/or
Many HIR optimizations added in simp pass
A bunch of other stuff running out of time to write this msg
2022-07-17 09:52:40 -07:00
Triang3l e8652e544a [GPU] Translucent trace viewer controls 2022-07-17 17:29:41 +03:00
Triang3l 25663827ba [GPU] Trace viewer Android content URI loading 2022-07-17 16:37:49 +03:00
Triang3l 624f2b2d9e [Base] Android content URI file memory mapping 2022-07-17 16:34:17 +03:00
Triang3l 93a7918025 [Base] Android content URI file descriptor opening 2022-07-17 16:25:58 +03:00
Triang3l 34a952d789 [Base] Wrap strdup and strcasecmp in xe:: functions 2022-07-17 16:14:29 +03:00
chss95cs@gmail.com 6a612b4d34 remove useless tag field from hir::Value
pack local_slot and constant in hir::Value
Instead of loading membase at the start of every function, just load it in HostToGuestThunk
vzeroupper in GuestToHostThunk before calling host function, and in HostToGuestThunk after calling function to prevent AVX dirty state slowdowns. In the future, check if CPU implements AVX as 128x2 and skip if so (https://john-h-k.github.io/VexTransitionPenalties.html)
Remove useless save/restore of ctx pointer, nothing modifies it and it prevents cpus from doing cross-function memory renaming (https://www.agner.org/forum/viewtopic.php?t=41). Could not remove the space on stack because of alignment issues, instead turned it into GUEST_SCRATCH64 which is a temporary that sequences may use
Reorder OpcodeInfo so that name is at offset 0, remove name and add GetOpcodeName function (name is only used for debug code, we are seperating frequently accessed data and rarely accessed data)
Add VECTOR_DENORMFLUSH opcode for handling output to DOT_PRODUCT and other opcodes that implicitly force denormal inputs/outputs to zero, will eventually use for implementing NJM
Rewrite sequences for LOAD_VECTOR_SHL/SHR. The mask with 0xf in it was pointless as all InstrEmit_ functions that create the load shift instructions do that in HIR. The tables are only used for nonzero constant inputs now, which are probably pretty rare. Instead of doing a shift and lookup, a base value is used for both in the constant table and adding/subtracting of the input is done
Reuse result of LoadVectorShl/Shr in InstrEmit_stvlx_, InstrEmit_stvrx_. We were previously calculating it twice which was contributing to the final sequences' fatness. Use OPCODE_SELECT instead of the sequence of or, andnot, and that it was using for merging
Add the proper unconditional denormal input flushing behavior to vfmadd, add it also to vfmsub (making the assumption it has the same behavior)
Remove constant propagation for DOT_PRODUCT_3/4
DOT_PRODUCT_3/4 now returns a vector with all four elements set to the result. (what we were doing before, truncating to float32 and then splatting didnt make any sense)
Add much more correct versions of DOT_PRODUCT_3/4, matching the Xb360's  to 1 bit. Still needs work to be a perfect emulation.
Add constant folding for OPCODE_SELECT, OPCODE_INSERT, OPCODE_PERMUTE, OPCODE_SWIZZLE
Remove constant folding for DOT_PRODUCT
Removed the multibyte nop code I committed earlier, it doesnt help us much because nops are only used for debug stuff and its ugly and wouldnt survive in a pr to main
Check for AVX512BMI, use vpermb to shuffle if supported
2022-07-16 10:25:04 -07:00
Triang3l 500bbe9e0d [Base] Use to_path for Android path argument loading 2022-07-16 13:42:04 +03:00
Triang3l 373b143049 [Base] Cvars from Android Bundle/Intent 2022-07-16 13:13:08 +03:00
chss95cs@gmail.com 71c5f8f0fa Optimized GetScalarNZM, add limit to how far it can recurse. Add rlwinm elimination rule 2022-07-14 14:32:14 -07:00
Triang3l 415750252b [Base] PosixMappedMemory: Close, Flush 2022-07-14 22:51:07 +03:00
Triang3l 65137e58bd [Base] PosixMappedMemory: fd instead of stdio
Android ContentResolver, which is needed for content:// URIs, provides file descriptors rather than stdio files
2022-07-14 22:11:46 +03:00
Triang3l 9fd63519bf [Base] Make MappedMemory non-copyable 2022-07-14 22:04:06 +03:00
Triang3l 2a69d1db4d [Vulkan] Fix a typo in a comment about BC textures [ci skip] 2022-07-14 21:16:23 +03:00
Triang3l 7b8281aee0 [UI] Android ImGui touch and mouse input 2022-07-14 21:13:40 +03:00
Triang3l 037310f8dc [Android] Unified xenia-app with windowed apps and build prerequisites 2022-07-11 21:45:57 +03:00
Gliniak 1d00372e6b Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-10 10:50:39 +02:00
Triang3l b41bb35a20 [SPIR-V] Make interpolators an array to fix Adreno linkage 2022-07-09 17:52:26 +03:00
Triang3l b3edc56576 [Vulkan] Merge texture and sampler descriptors into a single descriptor set
Put all descriptors used by translated shaders in up to 4 descriptor sets, which is the minimum required, and the most common on Android, `maxBoundDescriptorSets` device limit value
2022-07-09 17:10:28 +03:00
Gliniak d33be73f3d Fixed crash caused by hash calculation in specific cases 2022-07-08 08:49:43 +02:00
Triang3l e4de8663c4 [Vulkan] All guest draw uniform buffer bindings in a single descriptor set
Reduce the number of bound descriptor sets from 10 to 6, which is still above the minimum limit of 4, but closer
2022-07-07 21:05:56 +03:00
Triang3l 88c055eb30 [CPU] Null backend enough for GPU trace viewing 2022-07-06 23:28:06 +03:00
Triang3l 3ee68d79ea Revert "[GPU] Make Processor optional for GraphicsSystem setup"
The Processor is still required in many places, including the GPU command processor worker thread

This reverts commit fd03d886e9.
2022-07-06 22:43:40 +03:00
Triang3l 6852e54937 [CPU] Remove intrinsics from dot product constant propagation 2022-07-06 21:32:56 +03:00
Triang3l 326e718035 [CPU] MMIO: Arm64, load register writes + exception cleanup 2022-07-06 21:05:05 +03:00
Triang3l fd03d886e9 [GPU] Make Processor optional for GraphicsSystem setup 2022-07-05 21:21:22 +03:00
Triang3l bdfd410b13 [CPU] Cleanup x64 backend usage conditionals 2022-07-05 21:07:10 +03:00
Triang3l d263d508cd [GPU] Make operator< const 2022-07-05 20:47:53 +03:00
Triang3l 536f14d94c [GPU] Fix a typo in a Neon intrinsic name 2022-07-05 20:47:34 +03:00
Triang3l d51fafd07c [Base] Linux Arm64 exception handler 2022-07-05 20:46:49 +03:00
Triang3l 40aa73f7d7 [Linux] Swap read/write in x64 page fault handler + exception code cleanup 2022-07-04 23:51:26 +03:00
Triang3l a9cbd9cc5f [Linux] Update RIP after handling an exception 2022-07-04 23:24:26 +03:00
uytvbn 54aac81268 [Linux] Implement exception handler 2022-07-04 23:04:27 +03:00
Triang3l 35d4ea59c6 [Base] Remove exception_handler_linux.cc 2022-07-04 23:02:11 +03:00
Triang3l feaad639fb [Vulkan] Destroy all RTs before VulkanRenderTargetCache is destroyed 2022-07-04 11:27:51 +03:00
Gliniak 6e753c6399 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-04 08:11:04 +02:00
Triang3l 2621dabf0f [Vulkan] Native 24-bit unorm depth where available 2022-07-03 21:21:17 +03:00
Triang3l 83e9984539 [Vulkan] Remove required feature checks
Fallbacks for those will be added more or less soon, the stable version won't hard-require anything beyond 1.0 and the portability subset
2022-07-03 20:54:34 +03:00
Triang3l bbae909fd7 [GPU] Reasons to keep non-Vulkan backends [ci skip] 2022-07-03 20:39:44 +03:00
Triang3l ed61e15fc3 [App] Make D3D12 the default GPU backend on Windows again 2022-07-03 19:49:11 +03:00
Triang3l ee84f4e267 [Vulkan] Update title bar warning 2022-07-03 19:45:48 +03:00
Triang3l f7ef051025 [Vulkan] Disable validation by default 2022-07-03 19:42:22 +03:00
Triang3l 001f64852c [Vulkan] VMA for textures 2022-07-03 19:40:48 +03:00
Gliniak a8df744ea6 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-07-03 13:07:15 +02:00
Triang3l 636585e0aa [Vulkan] Trace viewer 2022-07-01 19:53:41 +03:00
Triang3l ad1ef84145 Merge branch 'master' into vulkan 2022-07-01 19:53:08 +03:00
Triang3l e37e3ef382 [GPU] Display swap output in the trace viewer
Resolve output is unreliable because resolving may be done to a subregion of a texture and even to 3D textures, and to any color format
2022-07-01 19:50:19 +03:00
Triang3l c8a4a9504f [Vulkan] Remove an unneeded scale from RefreshGuestOutput aspect ratio 2022-07-01 12:52:12 +03:00
Triang3l d174762a40 Merge branch 'master' into vulkan 2022-07-01 12:51:34 +03:00
Triang3l 28670d8ec2 [UI] Presenter: Rename display size to aspect ratio 2022-07-01 12:50:45 +03:00
Triang3l f8b351138e [Vulkan] Alpha test 2022-06-30 22:20:51 +03:00
Triang3l 6772c88141 Merge branch 'master' into vulkan 2022-06-30 22:15:29 +03:00
Triang3l 7e691d5ef1 [DXBC] Handle NaN in not equal alpha test as passed 2022-06-30 22:15:01 +03:00
Triang3l c0c3666e12 [Vulkan] Align texture extents in loading to vector size accessed by the shader
Fixes loading of the 1x1 linear 8_8_8_8 texture containing just a single #FFFFFFFF texel in 4D5307E6, which is used for screen fade and the lobby map loading bar background
2022-06-29 23:41:32 +03:00
Triang3l 9392fff369 Merge branch 'master' into vulkan 2022-06-29 23:39:54 +03:00
Triang3l a11b070fee [GPU] Align texture extents in loading to host buffer texel size accessed by the shader 2022-06-29 23:38:06 +03:00
Triang3l 7c2df55209 [Vulkan] Cache clear: shared memory, scratch buffer 2022-06-29 13:24:45 +03:00
Triang3l d5815d9e6a [Vulkan] Float24 depth range remapping fixes 2022-06-29 13:14:00 +03:00
Gliniak efe3cd96d6 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-29 09:21:09 +02:00
Triang3l 05ef7a273a [Vulkan] Samplers (only 1.0 core features for now) 2022-06-28 22:42:18 +03:00
Triang3l 5d9061cf99 Merge branch 'master' into vulkan 2022-06-28 22:05:45 +03:00
Triang3l 243683d2e9 [GPU] Cleanup Texture::MarkAsUsed conditionals 2022-06-28 22:04:26 +03:00
Triang3l 382710bab7 [GPU] Normalize sampler clamp modes 2022-06-28 21:58:58 +03:00
Triang3l cedc94679b [GPU] Don't drop the rest of the command list if IssueDraw fails 2022-06-28 21:40:06 +03:00
chss95cs@gmail.com 3c06921cd4 Added optimizations for combining conditions together when their results are OR'ed
Added recognition of impossible comparisons via NZM and optimize them away
Recognize (x + -y) and transform to (x - y) for constants
Recognize (~x ) + 1 and transform to -x
Check and transform comparisons if theyre semantically equal to others
Detect comparisons of single-bit values with their only possible non-zero value and transform to true/false tests
Transform ==0 to IS_FALSE, !=0 to IS_TRUE
Truncate to int8 if operand for IS_TRUE/IS_FALSE has a nzm of 1
Reduced code generated for SubDidCarry slightly
Add special case for InstrEmit_srawix if mask == 1
Cut down the code generated for trap instructions, instead of naive or'ing or compare results do a switch and select the best condition
Rerun simplification pass until no changes, as some optimizations will enable others to be done
Enable rel32 call optimization by default
2022-06-26 12:49:04 -07:00
Gliniak e6898fda66 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-26 20:11:33 +02:00
chrisps 08232de8cc
patch a mistake in NZM calculation for OPCODE_NOT 2022-06-26 09:30:56 -07:00
Triang3l 9672230d9f Merge branch 'master' into vulkan 2022-06-26 18:59:49 +03:00
Triang3l ec008463b6 [GPU] CrYCb/YCrCb border colors 2022-06-26 18:56:50 +03:00
Triang3l 2606fa5709 [GPU] Apply BaseMap MipFilter via samplers as it may be overridden
Make it have no effect on the texture resource as a resource may be used with samplers with different overrides. Also make sure magnification vs. minification is not undefined with it on Direct3D 12.
2022-06-26 18:41:38 +03:00
Triang3l e191430091 Merge branch 'master' into vulkan 2022-06-26 16:58:27 +03:00
Triang3l 086a070fa9 [GPU] Explicitly cast bit field values in std::min/max
According to the integral promotion rules https://eel.is/c++draft/conv.prom#5.sentence-1 bit fields can be promoted to `int` if it's wide enough to store their value, and then otherwise, to `unsigned int`. Hopefully fixes Clang building (the `width_div_8` case).
2022-06-26 16:54:11 +03:00
Triang3l e0b890fe5c [DXBC] Remove alphatest/A2C with [earlydepthstencil] 2022-06-26 15:31:08 +03:00
Triang3l 6688b13773 [Vulkan] PsParamGen 2022-06-26 15:01:27 +03:00
Triang3l a99a1be880 Merge branch 'master' into vulkan 2022-06-26 15:00:21 +03:00
Triang3l b787f2dec1 [GPU] GPR count limit is 128, not 64 2022-06-26 14:45:49 +03:00
Triang3l a5c8df7a37 [Vulkan] Remove UB-based independent blend logic
On Vulkan, unlike Direct3D, not writing to a color target in the fragment shader produces an undefined result.
2022-06-25 20:57:44 +03:00
Triang3l d8b2944caa [Vulkan] Handle unsupported fillModeNonSolid + fix portability subset feature checks 2022-06-25 20:46:52 +03:00
Triang3l d30d59883a [Vulkan] Color exponent bias and gamma conversion 2022-06-25 20:35:13 +03:00
Triang3l b1be33004a Merge branch 'master' into vulkan 2022-06-25 20:31:26 +03:00
Triang3l 4812b4ba8b [D3D12] Fix outdated color system constants comment [ci skip] 2022-06-25 20:31:05 +03:00
chss95cs@gmail.com 327cc9eff5 drastically reduce size of final generated code for rlwinm by adding special paths for rotations of 0, masks that discard the rotated bits and using And w/ UINT_MAX instead of truncate/zero extend
Add special case to TYPE_INT64's EmitAnd for UINT_MAX mask. Do mov32 to 32 if detected to take advantage of implicit zero xt/reg renaming

Add helper function for skipping assignment defs in instr.
Add helper function for checking if an opcode is binary value type
Add several new optimizations to simplificationpass, plus weak NZM calculation code (better full evaluation of Z/NZ will be done later) .
 List of optimizations:
  If a value is anded with a bitmask that it was already masked against, reuse the old value (this cuts out most FPSCR update garbage, although it does cause a local variable to be allocated for the masked FPSCR and it still repeatedly stores the masked value to the context)
  If masking a value that was or'ed against another check whether our mask only considers bits from one value or another. if so, change the operand to the OR input that actually matters
  If the only usage of a rotate left's output is an AND against a mask that discards the bits that were rotated in change the opcode to SHIFT_LEFT
  If masking against all ones, become an assign.
  If XOR or OR against 0, become an assign (additional FPSCR codegen cleanup)
  If XOR against all ones, become a NOT
Adding a direct CPUID check to x64_emitter for lzcnt, the version of xbyak we are using is skipping checking for lzcnt on all non-intel cpus, meaning we are generating the much slower bitscan path for AMD cpus.
2022-06-25 09:58:13 -07:00
Triang3l 5dca11a892 [SPIR-V] Fix fetch constant LOD bias signedness 2022-06-25 16:33:35 +03:00
Triang3l d8b0227cbd [SPIR-V] Fix cubemap X axis 2022-06-25 16:25:29 +03:00
Triang3l fdcbf67623 [Vulkan] Enable VK_KHR_sampler_ycbcr_conversion 2022-06-25 15:46:02 +03:00
Triang3l 758db4ccb3 [Vulkan] Fix textures not loaded if using a shader for the first time 2022-06-25 15:15:06 +03:00
Triang3l 4db445c6f9 Merge branch 'master' into vulkan 2022-06-25 15:13:41 +03:00
Triang3l aa45d7b47d [D3D12] More descriptive pipeline creation call comment [ci skip] 2022-06-25 15:13:11 +03:00
Triang3l c37c05d189 [Vulkan] Remove an outdated fullscreen shader comment [ci skip] 2022-06-25 14:35:15 +03:00
Triang3l 4b4205ba00 [Vulkan] Frontbuffer presentation 2022-06-25 14:33:43 +03:00
Triang3l 3fc7d8753c Merge branch 'master' into vulkan 2022-06-24 23:38:04 +03:00
Triang3l f4a634c617 [XeSL] xesl_write*Store > xesl_*Store 2022-06-24 23:37:29 +03:00
Triang3l 7a4732e14f [GPU] XeSL swap shaders 2022-06-24 23:24:30 +03:00
Gliniak 2b3686f0e9 [XAM] Set profile setting 'from' entry accordingly to setting existence 2022-06-24 10:10:52 +02:00
Triang3l b7737d70ca [D3D12] Update RequestSwapTexture resource state comment [ci skip] 2022-06-23 22:59:53 +03:00
Gliniak ce3b159683 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-22 21:05:45 +02:00
Triang3l 227d495738 Merge branch 'master' into vulkan 2022-06-22 21:19:29 +03:00
Triang3l e9f129f67f [GPU] Safer and more correct depth bias conversion
Float24-as-float32 depth bias is now in the increments of 8, because conversion of the depth to float24 directly in the pixel shaders may destroy the bias qualitatively otherwise if it's too small.
2022-06-22 21:14:40 +03:00
Triang3l a7885ae1a4 [GPU] Fix CPU-side float24 conversion broken recently 2022-06-22 20:47:44 +03:00
Triang3l 4514050f55 [Vulkan] Truncate depth to float24 in EDRAM range ownership transfers and resolves by default
Doesn't ruin the "greater or equal" depth test in subsequent rendering passes if precision is lost, unlike rounding to the nearest
2022-06-22 13:25:06 +03:00
Gliniak e7a122d943 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-22 12:18:13 +02:00
Triang3l 0d8bd0e0c6 Merge branch 'master' into vulkan 2022-06-22 13:15:50 +03:00
Triang3l cbf0476d42 [D3D12] Don't round float24 depth when it's known to be exact 2022-06-22 13:14:38 +03:00
Gliniak 83269315d8 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-22 12:06:42 +02:00
Triang3l 7869b080d3 [D3D12] Truncate depth to float24 in EDRAM range ownership transfers and resolves by default
Doesn't ruin the "greater or equal" depth test in subsequent rendering passes if precision is lost, unlike rounding to the nearest
2022-06-22 12:53:09 +03:00
Gliniak 87fd772393 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-21 07:54:44 +02:00
chss95cs@gmail.com 549ee28a93 ome guest function calls can now be resolved and embedded directly in
the emitted asm as rel32 calls. Disabled by default, enabled via
resolve_rel32_guest_calls
detect whether cpu has fast jrcxz, fast loop/loope/loopne
much more thorough LoadConstantXMM
New cvar elide_e0_check that allows the backend to assume accesses via
the SP or TLS register will not cross into 0xe0 range
Add x64 codegen for Vector shift uint8
If has fast jrcxz use for some traptrue/breaktrue instructions
Use phat nops
Add cvar use_fast_dot_product, which uses a four instruction sequence
for both dot product instructions which ought to be equivalent. disabled
by default.
2022-06-20 15:08:18 -07:00
Triang3l c0703e64db Merge branch 'master' into vulkan 2022-06-20 22:40:19 +03:00
Triang3l e2f632f8fa [D3D12] Use udiv by constant tile size + minor transfer cleanup
Drivers compile that to a multiplication and a shift anyway.
2022-06-20 22:39:30 +03:00
Triang3l 0dc480721f [Vulkan] Render target resolving 2022-06-20 22:29:07 +03:00
Triang3l c6ec6d8239 [Vulkan] Use UDiv/UMod by constant tile size + minor transfer cleanup
Drivers compile that to a multiplication and a shift anyway.
2022-06-20 22:24:07 +03:00
Gliniak a4ff64c465 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-20 21:07:32 +02:00
Triang3l 61c4c49d76 Merge branch 'master' into vulkan 2022-06-20 12:34:41 +03:00
Triang3l 207e11c8d2 [GPU] Separate range arguments for fixed16 RG and RGBA in GetResolveInfo
On Vulkan, when snorm16 in unsupported, these formats may be emulated as float16, which natively can represent a wide range of numbers including -32 to 32 with blending. However, R16G16_SNORM and R16G16B16A16_SNORM are two separate formats, which may have different support on the device.
2022-06-20 12:29:45 +03:00
Triang3l 3b4845511d [Vulkan] Don't require an explicit uint64_t cast for SetDeviceObjectName 2022-06-20 12:25:52 +03:00
Triang3l 67ff108f53 [Vulkan] Explain why CreateShaderModule takes uint32_t* [ci skip] 2022-06-20 12:22:41 +03:00
Triang3l b61953374e [GPU] Make resolve EDRAM binding DS 0 and rename it
Ordering the descriptor sets by the change frequency on Vulkan, in increasing order (the opposite of D3D12 root signatures). The EDRAM binding never changes there (always one storage buffer), while the destination buffer binding may become changeable in the future (to split dispatches if exceeding `maxStorageBufferRange`, for example).
2022-06-20 12:15:52 +03:00
Triang3l 1200b205cf Merge branch 'master' into vulkan 2022-06-19 17:52:28 +03:00
Triang3l 9b83d3d0f4 [GPU] XeSL resolve shaders + host depth store width fix 2022-06-19 17:50:21 +03:00
Gliniak 1e369afa3d [Memory] Allocate system heap memory from bottom of heap last quarter
Aka. From 0x30000000
2022-06-17 22:23:39 +02:00
Gliniak 0b183a3582 Merge branch 'chris_cpu_changes' of https://github.com/Gliniak/xenia.git into canary_experimental 2022-06-17 14:04:58 +02:00
chrisps e4fd015886 Juicy optimization goodness 2022-06-17 14:03:24 +02:00
chss95cs@gmail.com 8a8ff6ae46 Reuse flag results in OPCODE_BRANCH_TRUE codegen if the preceding instruction was a comparison that already set the cpu flags 2022-06-17 11:13:49 +02:00
chss95cs@gmail.com 3675b3860a Add constant folding for OPCODE_ROTATE_LEFT 2022-06-17 11:12:49 +02:00
chrisps 3ad80810b5 Optimized CONVERT_I64_TO_F64 with neat overflow trick
Reduced instruction count from 11 to 8, eliminated a movq stall.
2022-06-17 11:10:48 +02:00
chrisps 9dfbef8acf Smaller ComputeMemoryAddress/Offset sequence
Replace a movzx after setae in both ComputeMemoryAddressOffset and ComputeMemoryAddress with a xor_ of eax prior to the cmp. This reduces the length in bytes of both sequences by 1, and should be a moderate ICache usage reduction thanks to the frequency of these sequences.
2022-06-17 11:10:27 +02:00
Gliniak c0483f8bee Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-17 10:58:15 +02:00
Triang3l 166be463be [XeSL] Metal Shading Language definitions 2022-06-16 21:39:16 +03:00
Gliniak e8aaddf4d5 Merge remote-tracking branch 'GliniakRepo/patchingSystem' into canary_experimental 2022-06-14 17:50:25 +02:00
Triang3l 127bf34264 [Vulkan] Trace dump tool 2022-06-13 13:03:02 +03:00
Gliniak 91f43a374d Initial support for xex patching 2022-06-12 20:10:07 +02:00
Gliniak 945976a31d Added Premake Files For PatchingSystem 2022-06-12 19:58:12 +02:00
Triang3l ac268afbe9 [Vulkan] Fix 1<< uint32_t constants 2022-06-12 19:45:12 +03:00
Triang3l 140ed51e9a [GPU] Fix missing xenia-ui dependency in gpu > gpu-shader-compiler (needed for gmake2) 2022-06-12 19:44:24 +03:00
Triang3l 17c835b245 Merge branch 'master' into vulkan 2022-06-12 18:51:08 +03:00
Triang3l 820b7ba217 [GPU] Fix GetActiveTextureHostSwizzle return type 2022-06-12 18:50:38 +03:00
Gliniak 90d67ac11c [Kernel] Return X_STATUS_END_OF_FILE for async file read when offset > file_size 2022-06-09 21:36:09 +02:00
Triang3l 1a22216e44 [SPIR-V] Texture fetch instructions 2022-06-09 21:42:16 +03:00
Triang3l f875a8d887 Merge branch 'master' into vulkan 2022-06-09 21:35:12 +03:00
Triang3l 78d1eb8bf8 [GPU] TextureCache::GetActiveTextureHostSwizzle 2022-06-09 21:34:21 +03:00
Gliniak d0175ddf2f [XAM] Cut handle mask from socket handles, added support for: NetDll_getsockopt
Only positive values should be interpreted as valid sockets!
2022-06-08 19:59:15 +02:00
Gliniak 25f3e16baa [Patcher] Fixed issue with incorrect patches endianness 2022-06-08 19:42:18 +02:00
Gliniak 0de0f40fb5 [XAM] Added stubs for:
- NetDll_XNetCreateKey
 - NetDll_XNetRegisterKey

This will allow certain games to run local multiplayer
For example PDZ Deathmatch mode
2022-06-07 20:46:47 +02:00
Triang3l 56f72da137 [GPU] More exact PWL texture/RT gamma conversion 2022-06-07 21:26:34 +03:00
Gliniak 916eb1b9bd [XAM] Scan every controller slot if provided flags contains USER_ANY flag 2022-06-07 15:52:41 +02:00
Margen67 5701823ccf Log title_name 2022-06-07 09:43:04 +02:00
jgoyvaerts 5296d2e91e Fix xenia.log file not always being created in the executable folder. 2022-06-07 09:41:52 +02:00
Gliniak c7da7e1999 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-02 22:19:43 +02:00
Triang3l a8cfe9bebb [Vulkan] Unsubsample odd-sized 4:2:2 textures 2022-06-02 23:10:50 +03:00
Triang3l 1ce45ee150 Merge branch 'master' into vulkan 2022-06-02 22:50:14 +03:00
Triang3l 55a91afcc7 [D3D12] Don't decompress unaligned BC textures if supported 2022-06-02 22:48:03 +03:00
Triang3l 84fcd5defa [GPU] Fix resolve destination offset and extent calculation 2022-06-02 21:47:30 +03:00
Triang3l a9a072bf00 [GPU] Explain why a 32x32x4bpp linear texture takes 2 pages, not 1 [ci skip] 2022-06-01 13:00:23 +03:00
Triang3l 8bd244f277 [GPU] Better explanation for exact texture memory extent calculation [ci skip] 2022-06-01 12:55:16 +03:00
Gliniak 3169aa2ff3 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-06-01 08:45:21 +02:00
Triang3l d1ad10b98c [GPU] Primitive reset comment typo correction [ci skip] 2022-05-31 23:23:53 +03:00
Triang3l efd7ef212a [D3D12] 128 megatexel limit explanation based on the spec [ci skip] 2022-05-31 23:23:10 +03:00
Triang3l 25594c918c [GPU] Fix tiled texture memory extent calculation 2022-05-31 23:17:33 +03:00
Rick Gibbed a3e5ea8575
[Base] Fix missing include in utf8.cc. 2022-05-27 17:56:14 -05:00
Gliniak 5a71b55233 [Kernel] Added missing module hash calculation 2022-05-25 09:03:03 +02:00
Gliniak 542e075699 Fixed bug between reading header content and applying TUs 2022-05-25 08:23:19 +02:00
Gliniak d7d26dc1c4 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-05-25 07:54:16 +02:00
Gliniak 3d96dfa359 Always allocate system heap from top of heap 2022-05-25 07:53:50 +02:00
Triang3l 6c9a06b2da [Vulkan] Texture loading 2022-05-24 22:42:22 +03:00
Triang3l 9c445d397b [Vulkan] Fix single-type descriptor pool reuse 2022-05-24 22:37:49 +03:00
Triang3l aac28f19d1 Merge branch 'master' into vulkan 2022-05-24 22:34:40 +03:00
Triang3l a4840e1992 [GPU] FIXME comment for 1bpb/2bpb texture tiled extent 2022-05-24 22:33:27 +03:00
Triang3l 8701c9f24e [D3D12] Texture load code cleanup and resolution scaling fixes
The resolution scale is now taken into account when copying from the mip tail.
2022-05-24 22:28:42 +03:00
Triang3l 75c185e759 [GPU] Move texture load shader info to common 2022-05-24 22:24:33 +03:00
Triang3l f994d3ebb3 [Vulkan] Single block-compressed flag for host texture formats, not block sizes 2022-05-23 13:27:43 +03:00
Triang3l f7b0edee6b [Vulkan] GBGR/BGRG decompression 2022-05-23 13:18:47 +03:00
Triang3l 4c2f8764d6 Merge branch 'master' into vulkan 2022-05-23 12:36:35 +03:00
Triang3l c1f15c86a3 [GPU] Decompress GBGR/BGRG into RGBB, not RGB1
While the alpha of the texture data is not used at all (replaced with blue using the view swizzle), still make the shader code state the intention more explicitly if the format is decompressed for use as signed. Unsigned 1.0 is 0xFF, while signed 1.0 is 0x7F.
2022-05-23 12:31:45 +03:00
Triang3l cf3069eb13 [GPU] Signedness in Cr_Y1_Cb_Y0_REP/Y1_Cr_Y0_Cb_REP comment [ci skip] 2022-05-22 22:11:59 +03:00
Triang3l ef808e9def [GPU] _REP explanation in Cr_Y1_Cb_Y0_REP/Y1_Cr_Y0_Cb_REP comment [ci skip] 2022-05-22 21:46:11 +03:00
Triang3l 6735dbd941 [GPU] Calculate, not store, texture load host X blocks per thread 2022-05-22 21:21:54 +03:00
Triang3l 888d5044e0 [GPU] 2x1-subsampled texture RGBA8 conversion shader 2022-05-22 21:07:38 +03:00
Triang3l d3561d2f47 [D3D12] Pre-swizzle 2x1-subsampled formats 2022-05-22 20:31:48 +03:00
Triang3l 5de825e3a0 [GPU] Prevent multiple evaluation of XE_TEXTURE_LOAD_TRANSFORM arguments 2022-05-22 19:48:23 +03:00
Triang3l 2f0a884438 [GPU] Add k prefix to texture load group size constants 2022-05-22 19:35:25 +03:00
Triang3l 8f06ba6f7d [D3D12] Texture host BPB in LoadModeInfo 2022-05-22 19:28:05 +03:00
Triang3l 003c62ba73 [GPU] Correct rounding of texture load row size
The original multiplication was likely added early during the development of generic resolution scaling. Before generic resolution scaling, invocations were done for unscaled guest blocks, now they're done for scaled blocks, so with 3x1 scaling, an invocation for 8 blocks writes 8 host blocks, not 24.
2022-05-22 18:33:59 +03:00
Triang3l 6aa30ed074 [GPU] 128-thread groups in all texture load shaders
Vulkan's minimum requirement (maxComputeWorkGroupInvocations) is 128.
2022-05-22 18:03:09 +03:00
Triang3l 91c4e02e96 [Vulkan] Implement ClearCaches and don't do it for pipelines 2022-05-22 15:05:15 +03:00
Triang3l 35cfb07967 Merge branch 'master' into vulkan 2022-05-22 14:56:44 +03:00
Triang3l 88784101c8 [D3D12] Remove PipelineCache::ClearCache leftovers 2022-05-22 14:56:22 +03:00
Triang3l 68e7c56918 Merge branch 'master' into vulkan 2022-05-22 14:47:20 +03:00
Triang3l d31ddd9b23 [GPU] Remove PipelineCache::ClearCache 2022-05-22 14:46:03 +03:00
Gliniak dde8adc140 Allow XamUserReadProfileSettings to use xuid to define profile 2022-05-22 13:11:29 +02:00
Gliniak 84e5b159c3 Do not store obsolete info about deleted threads 2022-05-22 13:11:21 +02:00
Gliniak b759cb23a5 Better handling of title workspace 2022-05-22 13:11:08 +02:00
Gliniak 4bfd3a6506 Reset state of event before executing overlap code 2022-05-22 13:09:37 +02:00
Gliniak 5784e7bc8d Send signin changed notification for primary user 2022-05-22 13:09:25 +02:00
Gliniak 620aa3562e Set system page blocks to gpu-written every frame 2022-05-22 13:09:12 +02:00
Gliniak ba60b94c7d Round size to 64k for allocations without base address 2022-05-22 13:09:01 +02:00
Gliniak af806ee98f Allocate guest objects in last quarter of memory heap 2022-05-22 13:08:47 +02:00
Gliniak 7be4b7a138 Increase profiler max threads to 256 2022-05-22 13:06:50 +02:00
Gliniak a190bf9fd8 Changed max component length for host and svod types 2022-05-22 13:06:42 +02:00
Triang3l 08769de68b [Vulkan] Texture object and view creation 2022-05-19 21:56:24 +03:00
Triang3l b0e1916f75 Merge branch 'master' into vulkan 2022-05-19 21:46:21 +03:00
Triang3l 9aaf19a455 [Vulkan] Remove unused variable in VulkanPresenter::GuestOutputImage::Initialize 2022-05-19 21:45:48 +03:00
Triang3l c85c2f5b79 Merge branch 'master' into vulkan 2022-05-19 21:43:19 +03:00
Triang3l 1dcc919a33 [GPU] Move k_Y1_Cr_Y0_Cb_REP usage example to xenos.h 2022-05-19 21:41:52 +03:00
Triang3l 7d63d6e1d3 [D3D12] Fix 2:1-subsampled format swizzle 2022-05-19 21:40:03 +03:00
Triang3l 825a5b176c [D3D12] Fix frontbuffer resource state 2022-05-19 21:39:11 +03:00
Gliniak 5ce75a1479 Merge remote-tracking branch 'GliniakRepo/xam_swap_disc' into canary_experimental 2022-05-19 12:07:05 +02:00
Gliniak f21dbc66ba Implemented XamSwapDisc 2022-05-19 12:04:32 +02:00
Gliniak db50db3215 Merge remote-tracking branch 'GliniakRepo/TU_APPLY' into canary_experimental 2022-05-19 11:00:34 +02:00
Gliniak 7c2cd16548 Merge remote-tracking branch 'GliniakRepo/local_multiplayer' into canary_experimental 2022-05-19 10:56:21 +02:00
Gliniak 6c6c5ac14b Merge remote-tracking branch 'GliniakRepo/experimentals' into canary_experimental 2022-05-19 10:51:44 +02:00
Philpax e901567193 Fix crash from null sample channel
Certain games, such as Forza Motorsport 3, submit XMA data with the
stereo flag set with a null second channel. This falls back to mono
conversion when the second channel is null, preventing a crash.
2022-05-19 10:22:41 +02:00
Margen67 64b336805e Add vsync_interval option 2022-05-19 10:22:32 +02:00
Gliniak 0881725533 Merge remote-tracking branch 'GliniakRepo/const_prop_opcode_and_not' into canary_pr 2022-05-19 10:18:58 +02:00
Gliniak 75f0dfd6f3 Merge remote-tracking branch 'GliniakRepo/deleteFunctionsFromUnloadedModule' into canary_pr 2022-05-19 10:18:18 +02:00
Gliniak 320cbc43c8 Merge remote-tracking branch 'GliniakRepo/physicalProtectPageCombinations' into canary_pr 2022-05-19 10:17:58 +02:00
Gliniak ef281c69c3 Merge remote-tracking branch 'GliniakRepo/xamNetSockNameAndErrorHandling' into canary_pr 2022-05-19 10:17:29 +02:00
Gliniak de03165995 Merge remote-tracking branch 'GliniakRepo/audioSkipHeaderInputOffset' into canary_pr 2022-05-19 10:16:41 +02:00
Gliniak 5ef92faf6d Merge remote-tracking branch 'GliniakRepo/createEnumeratorHandle' into canary_pr 2022-05-19 10:16:10 +02:00
Gliniak 006f3adad3 Merge remote-tracking branch 'GliniakRepo/disablePositiveVibes' into canary_pr 2022-05-19 10:03:50 +02:00
Gliniak b237b71031 Merge remote-tracking branch 'GliniakRepo/memory_stats' into canary_pr 2022-05-19 10:03:29 +02:00
Gliniak 7ac2279d34 Merge remote-tracking branch 'GliniakRepo/customConHeaderImplementation' into canary_pr 2022-05-19 10:03:05 +02:00
Gliniak 5247220e73 Merge remote-tracking branch 'GliniakRepo/patchingSystem' into canary_pr 2022-05-19 10:01:33 +02:00
Margen67 99e3a1a4b1 Disable Vulkan 2022-05-19 09:39:58 +02:00
illusion0001 f9fd3e5fec AVPack cvar 2022-05-19 09:39:56 +02:00
illusion 357d9adfca automatic aspect ratio change
aspect ratio will now change if internal resolution is set to anything 4:3
(i.e 640x480, 1024x768, 1600x1200.. etc.)
2022-05-19 09:39:56 +02:00
Margen67 bdd431cd4a Rename exe to xenia_canary 2022-05-19 09:39:55 +02:00
illusion98 7242efdeef Change default config file name 2022-05-19 09:39:55 +02:00
illusion98 471041a9b5 Change window title
xenia -> xenia-canary
2022-05-19 09:39:55 +02:00
illusion98 6036c977e8 Change ID and new description 2022-05-19 09:39:55 +02:00
illusion98 c0333ea7c6 Add Time Elasped and Description Text
Display Time Elapsed when idle or playing a game
Display description when hovering over the icon
2022-05-19 09:39:55 +02:00
Triang3l 46202dd27a [Vulkan] Basic texture descriptor set allocation/binding 2022-05-17 22:42:28 +03:00
Triang3l 3381d679b4 Merge branch 'master' into vulkan 2022-05-17 22:31:34 +03:00
Triang3l 7675b6b140 [DXBC] Cleanup texture/sampler name setting 2022-05-17 22:30:55 +03:00
Triang3l 533de3b477 [D3D12] Remove unnecessary binding count uint32_t casts 2022-05-17 21:33:17 +03:00
Triang3l 5f2b0a899a [Vulkan] Fix TransientDescriptorPool ignoring the descriptor type 2022-05-15 22:20:24 +03:00
Triang3l f9261811a9 [D3D12] Fix layouts_mutex_ lock naming 2022-05-15 18:52:28 +03:00
Triang3l 0db94a700f [Vulkan] Use pipeline layout key structures directly 2022-05-15 17:42:27 +03:00
Triang3l b80361ee3c [Vulkan] Texture cache: Maximum dimensions, null images 2022-05-15 16:59:27 +03:00
Triang3l 185c23dd50 [Vulkan] Gather shader stages that VS can be translated into 2022-05-15 16:31:24 +03:00
Triang3l 7d19a8c0e8 [Vulkan] Add missing <functional> include for std::hash 2022-05-15 16:20:12 +03:00
Triang3l 862c457761 [Vulkan] Use Shader::IsHostVertexShaderTypeDomain 2022-05-15 16:19:36 +03:00
Triang3l 05adfbc58d Merge branch 'master' into vulkan 2022-05-15 16:18:41 +03:00
Triang3l a65fd4f673 [GPU] Shader::IsHostVertexShaderTypeDomain 2022-05-15 16:13:05 +03:00
Triang3l f9b3b90a68 [D3D12] Subsystem management order cleanup 2022-05-14 22:30:06 +03:00
Triang3l 60052fb4fc [Vulkan] Don't require imageViewFormatSwizzle in the immediate drawer 2022-05-14 22:18:21 +03:00
Triang3l d6a9056952 [D3D12] D3D12Texture::SRVDescriptorKey structure 2022-05-14 18:41:15 +03:00
Triang3l 26cf717394 [GPU] Make TextureCache constructors explicit 2022-05-14 18:28:32 +03:00
Triang3l 775b4623dc Merge branch 'master' into vulkan 2022-05-14 17:05:39 +03:00
Triang3l d280b3953d [GPU] Texture object/binding management to common superclass 2022-05-14 16:18:10 +03:00
Triang3l af3158f1bf [Legacy Vulkan] Add Vulkan prefix to Pipeline/TextureCache to avoid future name collisions 2022-05-11 21:21:33 +03:00
Triang3l 73d574a046 [Vulkan] Rectangle and quad list geometry shader generation 2022-05-10 21:48:18 +03:00
Triang3l b9256fcdbd Merge branch 'master' into vulkan 2022-05-10 15:57:50 +03:00
Triang3l e6fb9883d2 [D3D12] Discard primitives with NaN position in GS 2022-05-09 22:34:17 +03:00
Triang3l 4cd4a91aa7 [D3D12] Rectangle GS comment typo fix [ci skip] 2022-05-09 19:17:55 +03:00
Triang3l 8f0e751909 [D3D12] Runtime geometry shader generation 2022-05-09 19:16:22 +03:00
Triang3l 44cda56d35 [GPU] Handle kRegisters and kGammaRamp in the trace viewer 2022-05-08 19:41:11 +03:00
Triang3l 2473496c7e [GPU] Make RegisterFile::kRegisterCount constexpr 2022-05-08 19:37:29 +03:00
Triang3l 72cf75f365 [DXBC] Geometry shader instructions 2022-05-07 22:11:31 +03:00
Caroline Joy Bell d36c3975d8 [UI] Implement Type::kDirectory in Win32FilePicker 2022-05-07 21:54:26 +03:00
Triang3l e3425b242e [DXBC] Both v[#] and v[#][#] operands for HS and GS 2022-05-07 16:17:17 +03:00
Gliniak c65f240c0b [Kernel] Improved TUs Support
- Changed name of config option to apply_title_update to better reflect what that option does
- Mount TU package to UPDATE: partition
- Simplified UserModule::title_id()
- Splitted loading module into two parts to allow applying TUs and custom patches
2022-05-06 08:04:47 +02:00
Triang3l 5875f6ab31 [UI] Windows: Disable rounded corners 2022-05-05 21:46:20 +03:00
Triang3l 9c8e0cc53e [GPU] DC_LUT_PWL_DATA comment fix [ci skip] 2022-05-05 13:13:30 +03:00
Triang3l c794d0d538 [GPU] DC_LUT_RW_INDEX/WRITE_EN_MASK + gamma ramp and registers in traces 2022-05-05 13:10:29 +03:00
Triang3l 2d90d5940f [DXBC] Jump to the loop skip address before pushing 2022-05-04 22:01:30 +03:00
Triang3l 0e0f04dc1d [D3D12] Fix point size calculation + point code cleanup
6fcf9d21fe made per-vertex diameter vs. constant radius consistent, and with that commit the shader works with direct pixel to NDC conversion, however, the NDC conversion factor was outdated in that commit (still included the 0.5 factor for diameter to radius conversion, resulting in all points being 50% narrower along each axis than needed). Now, the diameter to radius conversion factor is used there properly, and also the multiplication of the per-vertex diameter by 0.5 has been removed from the shader since the constant already includes it now (the constant diameter is passed via the system constants instead of the radius also).
2022-05-04 13:26:30 +03:00
Peter Wright 7ab5ccbbd9 Add #include <cfloat> to fix build error on Linux. 2022-05-03 19:45:10 +03:00
Triang3l 9e6f96a2fc Merge branch 'master' into vulkan 2022-05-03 16:21:30 +03:00
Triang3l 6fcf9d21fe [D3D12] Point sprite size fixes, point/line bits in PsParamGen 2022-05-03 16:15:16 +03:00
Triang3l fe50c5c2e5 [XeSL] Prefix all local names with `xesl_id/var_` 2022-05-03 13:48:32 +03:00
Triang3l 72a4d14056 Merge branch 'master' into vulkan 2022-05-03 00:13:31 +03:00
Triang3l b88f715140 Merge branch 'master' into vulkan 2022-05-03 00:13:17 +03:00
Triang3l 7a89ad16a6 [D3D12] Update D3D12RenderTargetCache::Update write mask argument name 2022-05-02 23:16:18 +03:00
Gliniak ccbb5a2ebf Cleanup 2022-04-30 11:45:22 +02:00
Gliniak d78fd19ab4 Fixed incorrect hash generation + lint fixes 2022-04-29 20:33:21 +02:00
Gliniak 585b208fc0 Added support for multiple game hashes 2022-04-29 09:41:45 +02:00
Triang3l 0fd578cafd [GPU] Get unclipped draw height by running VS on the CPU 2022-04-28 22:25:25 +03:00
Triang3l b2b1d7b518 [GPU] More accurate vertex kill + PsParamGen/point documentation 2022-04-27 23:10:56 +03:00
Triang3l 5ec0c92601 [GPU] Ignore z_enable for !z_write_enable && z_func == ALWAYS 2022-04-27 21:46:29 +03:00
Triang3l 5519dbb39f [GPU] Shader control flow documentation improvements 2022-04-27 21:34:08 +03:00
Triang3l b42680abf7 [GPU] Shader ALU refactoring + documentation
Mainly move instruction info from the ShaderTranslator to xe::gpu::ucode for future use in the CPU shader interpreter
2022-04-27 20:52:20 +03:00
Gliniak fc16e3dc40 Support for patch types:
- float
 - double
 - string
 - u16string
 - byte_array

Plus some smaller changes
2022-04-27 09:41:29 +02:00
Triang3l df9a37f798 [GPU] Ucode disasm: Fix exec formatting 2022-04-26 23:08:31 +03:00
Triang3l 69958cba9d [GPU] shader-compiler: Accept little-endian ucode 2022-04-26 22:59:02 +03:00
Triang3l 443d61c9e1 [D3D12] GetFormatCopyInfo: Remove unused divide_by_block_size variable 2022-04-26 22:42:17 +03:00
Triang3l fcf6a7ded1 [Android] Minor postInvalidateWindowSurface JNI cleanup 2022-04-26 22:41:11 +03:00
Triang3l 12ff951972 [Base] More flexible Xenos float16 conversion functions 2022-04-26 22:35:37 +03:00
Joel Linn e3dd873892 [Base] Fix wait for callback return
- If wait item has disarmed itself and is then disarmed by another
  thread, still wait for the callback to return to meet guaratees
2022-04-26 13:56:11 -05:00
Joel Linn 3b4dc7da3b [Base] Use disruptorplus spin wait
- Attempt to fix deadlocks when using valgrind on CI
2022-04-26 13:56:11 -05:00
Joel Linn e59a0e1206 [Base] Relax some timing constraints.
- Because setting the timer is scheduled by us but the wait on POSIX is
  currently scheduled by pthreads, this solves issues on overprovisioned
  CIs
2022-04-26 13:56:11 -05:00
Joel Linn 4a36a7962c [Base] Remove unneeded delay scheduler 2022-04-26 13:56:11 -05:00
Joel Linn 15950eec37 [Base] Use chrono APIs for Timers 2022-04-26 13:56:11 -05:00
Joel Linn 1478be14c7 [Base] Add chrono tests 2022-04-26 13:56:11 -05:00
Joel Linn 23eef94984 [Base] Add chrono support
- WinSystemClock is a FILETIME clock without scaling, can convert to
  system_time
- XSystemClock is a FILTETIME clock with scaling applied, can only
  convert to WinSystemClock
2022-04-26 13:56:11 -05:00
Joel Linn 9b4168cce9 [Base] Make HighResolutionTimer platform agnostic 2022-04-26 13:56:11 -05:00
Joel Linn 75357caeaf [Base] Add TimerQueue
- Cross platform functionality similar to Windows' `CreateTimerQueue`
  with `WT_EXECUTEINTIMERTHREAD`
2022-04-26 13:56:11 -05:00
Joel Linn a85fc25040 [Base] Add more tests for HighResolutionTimer 2022-04-26 13:56:11 -05:00
Wunkolo be8b9c512f [x64] Add GFNI optimization for SPLAT(int8)
`pxor` is a zero-uop register-rename and `gf2p8affineqb dest, zero, int8`
is a very quick single-instruction way to use affine galois
transformations to fill a register with an immediate byte without
touching memory.
2022-04-26 13:46:46 -05:00
Gliniak c73cdb506a Initial support for xex patching 2022-04-26 13:26:49 +02:00
Gliniak 31eb639ade Added Premake Files For PatchingSystem 2022-04-26 13:26:49 +02:00
Gliniak 3a115ae6a0 [Kernel] Restored usage of: log_string_format_kernel_calls 2022-04-14 13:48:24 -05:00
Triang3l ef8a60e011 [GPU] Round tessellation patch vertex count up (by @deaklajos #2007)
Also move the clamping of the guest index count to the index buffer size to the place before it's read in calculations
2022-04-14 21:19:12 +03:00
Triang3l 38aca269e1 [GPU] Offset and clamp tessellation patch index (#2008, thanks @deaklajos) 2022-04-14 13:04:34 +03:00
Triang3l fea430f1f9 [GPU] Fix scalar c[#+aL], shader docs/refactoring 2022-04-13 23:08:19 +03:00
Triang3l 1f324bebcd [GPU] Norm16 > float16 texture load shaders 2022-04-09 23:34:50 +03:00
Triang3l 744767f549 [D3D12] Compile all built-in shaders with the same FXC version 2022-04-09 23:24:28 +03:00
Triang3l 72f3eead63 [GPU] Texture load shader style (alignment) cleanup 2022-04-09 23:23:54 +03:00
DESKTOP-F0UGBP9\deakl 8d02c5ab21 [GPU] Fixed size 0 point sprites enlarged to default 2022-04-05 02:25:24 +03:00
Triang3l 47799163bd Merge branch 'master' into vulkan 2022-04-04 22:02:46 +03:00
Triang3l 3d48fde5ca [GPU] XeSL texture load shaders + minor XeSL cleanup 2022-04-04 21:48:27 +03:00
Triang3l 0acb97d383 [Vulkan] EDRAM range ownership transfers, resolve clears, 2x-as-4x MSAA
Transfers are functional on a D3D12-like level, but need additional work so fallbacks are used when multisampled integer sampled images are not supported, and to eliminate transfers between render targets within Vulkan format compatibility classes by using different views directly.
2022-04-03 16:40:29 +03:00
Triang3l 85fc7036b8 Merge branch 'master' into vulkan 2022-04-02 22:45:23 +03:00
Triang3l c4eae232f1 [D3D12] Fixes/cleanup for render targets and barriers 2022-04-02 22:44:10 +03:00
Triang3l 1131dff705 Merge branch 'master' into vulkan 2022-03-28 21:58:34 +03:00
Triang3l 0f3207d019 [Vulkan] Fix basePipelineIndex signedness 2022-03-28 21:57:44 +03:00
Triang3l 52d61fc94c Merge branch 'master' into vulkan 2022-03-27 16:20:21 +03:00
Triang3l 3a07559df9 [GPU] XeSL host depth store and VS passthrough shaders 2022-03-27 16:15:53 +03:00
Triang3l 328aa11283 Merge branch 'master' into vulkan 2022-03-27 00:11:45 +03:00
Triang3l 2cd6c31998 [Vulkan] Samplerless texelFetch 2022-03-27 00:09:44 +03:00
Gliniak 67a0ccb7c0 [CPU] Unified assertions for unimplemented opcodes 2022-03-23 11:41:49 -05:00
Triang3l 7048baaf21 Merge branch 'master' into vulkan 2022-03-22 21:54:34 +03:00
Triang3l fa62d395fd [Vulkan] InitializeSubresourceRange: Use return, not reference 2022-03-22 21:51:02 +03:00
Triang3l 32ab1a2df1 [D3D12] Minor RT code style/comments cleanup 2022-03-22 21:48:26 +03:00
Triang3l ee8e71cea8 [D3D12] RT dump: Fix r# allocation 2022-03-22 21:41:44 +03:00
Triang3l 920704c71a [D3D12] RT transfer: Same front/back stencil ops 2022-03-22 21:39:06 +03:00
Triang3l 1259c9f7a2 [Vulkan] Pipeline barrier merging 2022-03-21 23:02:51 +03:00
Triang3l acc4fd6846 [Vulkan] Rectangle list geometry shader 2022-03-21 22:53:19 +03:00
Triang3l c47b874a4d Merge branch 'master' into vulkan 2022-03-21 20:57:02 +03:00
Triang3l 82c1fb87aa [App] Do all fullscreen entry logic for --fullscreen=true (fixes #1999) 2022-03-14 20:42:52 +03:00
Gliniak 0f2a7105b9 [CPU] Added constant propagation pass for: OPCODE_AND_NOT 2022-03-11 08:54:01 +01:00
Wunkolo c1de37f381 [x64] Remove usage of `xbyak_bin2hex.h`
C++ has had binary-literals since C++14. There is no need for these
binary enum values from xbyak.
2022-03-08 12:18:58 -06:00
Wunkolo f356cf5df8 [x64] Add `VECTOR_ROTATE_LEFT_I32` overflow-test
Edit one of the lanes in this unit-test to be larger than the width of
the element-size to ensure that this case is handled correctly.

It should only mask the lower `log2(32)=5` bits of the input, causing
`33`(`100001`) to be `1`(`000001`).
2022-03-08 12:18:58 -06:00
Wunkolo 337f0b2948 [x64] Add AVX512 optimization for `VECTOR_ROTATE_LEFT(Int32)`
`vprolvd` is an almost 1:1 analog with this opcode and can be
conditionally emitted when the host supports AVX512{F,VL}.

Altivec docs say that `vrl{bhw}` masks the lower log2(n) bits of the
element-size.

[vprold](https://www.felixcloutier.com/x86/vprold:vprolvd:vprolq:vprolvq)
modulos the shift-value by the element size in bits, which is the same
as masking the lower log2(n) bits. So `vrlw` maps exactly to `vprold`.
2022-03-08 12:18:58 -06:00
Joel Linn 7e894d10a7 [kernel] Correct status for looked up objects
- The guest will check for 0x40000000 and replace it with
  0xb7 (ERROR_ALREADY_EXISTS), which is the correct return value.
  For example, see:
  https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-createmutexa
2022-03-08 12:17:57 -06:00
Joel Linn 91f4954967 [kernel] Refactor uses of attribute names 2022-03-08 12:17:57 -06:00
Joel Linn 38d589d1e0 [kernel] Remove unnecessary string copy 2022-03-08 12:17:57 -06:00
Joel Linn b72ab7b4a4 [Base] Refactor POSIX timers, fix user-after-free
Since timer_delete does not clean up already queued signals, signal info
data needs to be retained after timer deletion and object destruction in
order to circumvent use-after-free bugs.
2022-03-08 12:17:57 -06:00
Joel Linn 257b904a5e [Base] Add DelayScheduler class
Schedule callbacks whith the only guarantee that they will not be run for
the minimum duration specified. Useful for garbage collecting POSIX
timer_create() signal info data.
2022-03-08 12:17:57 -06:00
Joel Linn e0f34b97fb [Base] Check for correct thread in HResTimer tests 2022-03-08 12:17:57 -06:00
Joel Linn fb741db2fe [Base] Fix callback threads for POSIX timers 2022-03-08 12:17:57 -06:00
Joel Linn 986dcf4f65 [Base] Check success of sync primitive creation
- Mainly use `assert`s, since failure is very rare
- Forward failure of `CreateSemaphore` to guests because it is more easy
  to trigger with invalid initial parameters.
2022-03-08 12:17:57 -06:00
Joel Linn 6bd1279fc0 [Base] Forward `handle=null` as nullptr for win 2022-03-08 12:17:57 -06:00
Joel Linn 4ea6e45e0c [Base] Remove `Sleep`s from more test cases
Timing dependencies in this tests were causing spurious test failures:
- Create and Run Thread
- Test Thread QueueUserCallback

They have been largely replaced by spin waits.
2022-03-08 12:17:57 -06:00
Joel Linn e75e0eb39c [Base] Fix `Semaphore::Create` invalid parameters 2022-03-08 12:17:57 -06:00
Joel Linn bb42829308 [Base] Fix WaitMultiple on POSIX
- Never use `cond_.notify_one()` because it may wake a thread that is
  unrelated to the signalled wait handle, resulting in a lost wake and
  possible deadlock. Wait conditions are to be checked by the threads
  themselves.
- Refactor and simplify `WaitMultiple`
2022-03-08 12:17:57 -06:00
Joel Linn ca6296089e [Base] Remove timing dependency from test
- Use atomics and spin waits to synchronize threads for tests
- Improves test stability on CI
2022-03-08 12:17:57 -06:00
Joel Linn 49efbeaca8 [Base] Add spin wait helper to threading test 2022-03-08 12:17:57 -06:00
Gliniak 283accd876 [Kernel/Memory] Check for required protect_bits combinations 2022-02-22 19:26:56 +01:00
Radosław Gliński 6b45cf8447
[Base] Match exactly when no pattern in wildcard 2022-02-17 17:38:04 -06:00
Triang3l ba28ef9717 [Win32] Declare Windows 7-11 support in the manifest 2022-02-17 20:38:52 +03:00
Joel Linn 00e7de9297 [CPU] Improve vrsqrtefp accuracy 2022-02-16 17:09:28 -06:00
Joel Linn d64848245d [CPU] Improve vrefp accuracy 2022-02-16 17:09:28 -06:00
Triang3l 294c76f7c4 [UI] Remove `virtual` from Window::IsFullscreen (tracked entirely by common code) 2022-02-16 20:37:53 +03:00
Triang3l b41fb851c6 [Vulkan] Unsupported pipeline features assertion typo fix 2022-02-15 23:05:47 +03:00
Triang3l e13c4ae90b Merge branch 'master' into vulkan 2022-02-15 23:02:43 +03:00
Triang3l 9e803ccf25 [D3D12] Pad kBlendOpMap with dummy values for all 3 bits 2022-02-15 23:02:26 +03:00
Triang3l c75e0dd19e [Vulkan] Blend and depth/stencil state, small pipeline cleanup 2022-02-15 23:00:21 +03:00
Triang3l a64264ed77 Merge branch 'master' into vulkan 2022-02-14 12:37:49 +03:00
Triang3l 74c109273c [UI] Add PerMonitor fallback to Windows dpiAwareness 2022-02-14 12:35:08 +03:00
Triang3l 09f6081b16 [Vulkan] Fix shader bytecode path in premake5.lua 2022-02-13 23:29:46 +03:00
Triang3l b8c9d5bb8c Merge branch 'master' into vulkan 2022-02-13 23:25:39 +03:00
Triang3l e57db52285 [UI] Enable Windows PMv2 DPI awareness accidentally kept disabled after testing 2022-02-13 23:10:19 +03:00
Triang3l 7652b321d0 [UI] Fix Windows 10 1607+ DPI function loading 2022-02-13 23:07:27 +03:00
Triang3l 7fc940422c [UI] Windows AdjustWindowRect and GetClientRect usage cleanup 2022-02-13 23:01:25 +03:00
Triang3l be5f7db3ef [D3D12] Fixed-function state cleanup 2022-02-13 21:50:00 +03:00
Triang3l 325ae443da [D3D12] Rename current_cached_pipeline_ to current_guest_pipeline_ 2022-02-13 21:21:49 +03:00
Triang3l 10ec47e1fe [GPU] Move common-face polygon offset to draw_util 2022-02-13 21:18:02 +03:00
Triang3l 8d07c79897 [GPU] Cleanup RB_COLOR_MASK and RB_DEPTHCONTROL normalization 2022-02-13 20:50:31 +03:00
Triang3l 8ca67b8aa7 [Vulkan] Expose relevant portability subset features 2022-02-13 20:19:01 +03:00
Triang3l 0590346084 [Vulkan] Add Vulkan-Headers and VMA submodules 2022-02-13 20:08:08 +03:00
Triang3l 8ccb00d03d [SPIR-V] Store vfetch_full address in a variable 2022-02-07 23:00:23 +03:00
Triang3l e447cf6ed8 Merge branch 'master' into vulkan 2022-02-07 22:22:43 +03:00
Triang3l 9b1fdac986 [UI] UI common shaders to XeSL 2022-02-06 22:48:38 +03:00
Triang3l 4480437a3d [SPIR-V] xb genspirv > buildshaders + opt + remap + .xesl 2022-02-05 17:07:07 +03:00
Wunkolo ea992eda1f [x64] Fix missing BMI2 emit-feature detection
We only tested for BMI1 but not for BMI2, so we've been missing out on
BMI2 performance gains for a little while. Oops.
2022-02-05 12:08:32 +03:00
Triang3l 922efb13ce Merge branch 'master' into vulkan 2022-02-03 21:12:10 +03:00
Gliniak 613f5ebe02 [HID] Added option to turn off vibration 2022-02-03 09:12:31 +01:00
Gliniak 17b30be56a Added support for local multiplayer 2022-02-02 13:44:28 +01:00
Gliniak 332a9a2ec6 [XAM] Implemented XamCreateEnumeratorHandle
- Thanks Rick for providing names for parameters
2022-02-02 10:10:07 +01:00
Gliniak 7977d7ab98 [Base] Changed entry point to wmain for Windows
This prevents subapps from crashing when executing wmain specific functions
2022-02-01 15:50:48 -06:00
Triang3l 52ec0acd0c [App] Add text saying that post-processing is vendor-independent 2022-02-01 22:29:14 +03:00
Triang3l 413d7ded49 [UI] Android surface [skip appveyor] 2022-02-01 22:18:04 +03:00
Triang3l c6fc8f706a [Base] GetAndroidThreadJniEnv capitals, move JNI usage tips there 2022-02-01 21:33:20 +03:00
Gliniak 6ad5c39fac [XAM/Content] Implemented Custom CON Header Handling 2022-01-31 22:14:02 +01:00
Gliniak e9b9302cd3 [XAM] Small XamUserReadProfileSettings improvements 2022-01-31 21:39:56 +01:00
Radosław Gliński e8374d98fe Skip 0xbadf00d gpu packets 2022-01-31 20:27:13 +01:00
Gliniak 080a65cd4f [XAM] XGetLanguage: Get user language based on config 2022-01-31 20:26:03 +01:00
Gliniak 9ed3881b3b Skip indirect ringbuffer errors - Thermonuclear war achieved 2022-01-31 20:15:47 +01:00
Gliniak 3a772e60b0 XamContentCreate: Return X_ERROR_FUNCTION_FAILED for overlapped failures 2022-01-31 20:15:41 +01:00
Gliniak dfb4cadcfe Return success from DmRegisterCommandProcessor to prevent debug games from crashing 2022-01-31 20:15:25 +01:00
Gliniak 07a1e77218 Allow users to change max amount of queued frames 2022-01-31 20:12:39 +01:00
Gliniak 498dde6e1a Limit unspecified virtual allocation only to 3/4 of heap 2022-01-31 20:12:34 +01:00
Gliniak c20c7c930c XamEnumerate: Return X_ERROR_FUNCTION_FAILED for overlapped failures 2022-01-31 20:12:29 +01:00
Gliniak c4d64a0501 QueryRegionInfo: Adjust allocation_base to contain heap address 2022-01-31 20:12:24 +01:00
Gliniak ec976cdd33 InitializeRingBuffer - Clear buffer space to prevent random data readout 2022-01-31 20:12:20 +01:00
Gliniak c483da91a4 Stop unnecessary spam of 0x601 opcode usage 2022-01-31 20:11:53 +01:00
Gliniak 8e35a3d649 Invalidate input buffers if decoding fails
Should output be invalidated too?
2022-01-31 20:11:44 +01:00
Gliniak c80ea14d9d Check if input_buffer exist
In some really specific cases there is a chance that
one of the buffers is valid, but its pointer is null
2022-01-31 20:10:14 +01:00
Gliniak 0eaf032b71 Remove applying offset when min & max address range is provided 2022-01-31 20:09:51 +01:00
Gliniak f43e400c91 Do not block XMA when there is no work buffer available 2022-01-31 20:07:39 +01:00
Triang3l 009f709ad4 [Base] Remove Android jfieldIDs used only once from the file scope 2022-01-31 13:00:28 +03:00
Triang3l d998c13ee8 [Base] Explain why no Android activity in xenia-base [ci skip] 2022-01-31 12:12:57 +03:00
Triang3l 3f817fb241 [Base] Android JNIEnv attachment and LaunchWebBrowser 2022-01-30 23:35:40 +03:00
Triang3l d2ef8d3300 [Base] Android error reporting via SIGABRT/RuntimeException 2022-01-30 18:36:11 +03:00
Triang3l 50cf96ff36 [D3D12] Don't drain PSO preload creation queue if not queueing at all 2022-01-30 12:37:14 +03:00
gibbed 306ee85514 [App] Add Compatibility help menu item. 2022-01-29 08:02:20 -06:00
gibbed c6b2b1e8eb [App] Replace Website help menu item with FAQ. 2022-01-29 08:02:20 -06:00
gibbed 7019205810 [App] Rename ShowCommitID to ShowBuildCommit. 2022-01-29 08:02:20 -06:00
Triang3l 22eb8747d3 [GPU/Kernel] Fix space-prefixed hexadecimal number printing 2022-01-29 14:02:55 +03:00
Triang3l fe3f0f26e4 [UI] Image post-processing and full presentation/window rework
[GPU] Add FXAA post-processing
[UI] Add FidelityFX FSR and CAS post-processing
[UI] Add blue noise dithering from 10bpc to 8bpc
[GPU] Apply the DC PWL gamma ramp closer to the spec, supporting fully white color
[UI] Allow the GPU CP thread to present on the host directly, bypassing the UI thread OS paint event
[UI] Allow variable refresh rate (or tearing)
[UI] Present the newest frame (restart) on DXGI
[UI] Replace GraphicsContext with a far more advanced Presenter with more coherent surface connection and UI overlay state management
[UI] Connect presentation to windows via the Surface class, not native window handles
[Vulkan] Switch to simpler Vulkan setup with no instance/device separation due to interdependencies and to pass fewer objects around
[Vulkan] Lower the minimum required Vulkan version to 1.0
[UI/GPU] Various cleanup, mainly ComPtr usage
[UI] Support per-monitor DPI awareness v2 on Windows
[UI] DPI-scale Dear ImGui
[UI] Replace the remaining non-detachable window delegates with unified window event and input listeners
[UI] Allow listeners to safely destroy or close the window, and to register/unregister listeners without use-after-free and the ABA problem
[UI] Explicit Z ordering of input listeners and UI overlays, top-down for input, bottom-up for drawing
[UI] Add explicit window lifecycle phases
[UI] Replace Window virtual functions with explicit desired state, its application, actual state, its feedback
[UI] GTK: Apply the initial size to the drawing area
[UI] Limit internal UI frame rate to that of the monitor
[UI] Hide the cursor using a timer instead of polling due to no repeated UI thread paints with GPU CP thread presentation, and only within the window
2022-01-29 13:22:03 +03:00
Pseudo-Kernel 372bdd3ec9
[APU] XMA: Fix audio loop handling.
Handles audio loop if loop_start < loop_end.
Need to handle additional cases like loop_start > loop_end.
2022-01-29 02:49:00 -06:00
TranzRail 1d51b574ec [Kernel] Add PVR opcode (includes cvars support) 2022-01-29 02:44:55 -06:00
Wunkolo 24205ee860 [x64] Fix `VECTOR_SH{L,R,A}_V128(Int8)` masking
[AltiVec](https://www.nxp.com/docs/en/reference-manual/ALTIVECPEM.pdf)
doc says that it just uses the lower `log2(n)` bits of the shift-amount
rather than the whole element-sized value. So there is no need to handle
an overflow. Also adjusts 64-bit literals to utilize the explicit
`UINT64_C` type.
2022-01-29 02:39:34 -06:00
Wunkolo f8350b5536 [x64] Add `VECTOR_SH{R,L}_I8_SAME_CONSTANT` unit test
This is to target the new GFNI-based optimization for the Int8 case.
2022-01-29 02:39:34 -06:00
Wunkolo bd9a290b30 [x64] Add `GFNI`-based optimization for `VECTOR_SH{R,L}_V128(Int8)`
In the `Int8` case of `VECTOR_SH{R,L}_V128`, when all the values are the
same, then a single-instruction `gf2p8affineqb` can be emitted that does
an int8-based arithmetic-shift, utilizing GF(8) arithmetic.

More info here:
https://wunkolo.github.io/post/2020/11/gf2p8affineqb-int8-shifting/

Also fixes the iteration-type for when detecting if all of the simd
lanes are the same value(was iterating `u16` and not `u8`)
2022-01-29 02:39:34 -06:00
Joel Linn dbbf401205 [Base] Align test memory 2022-01-25 12:55:10 -06:00
Rick Gibbed e49916ea0a [XAM] Improvements to profile r/w setting exports
[XAM] Improvements to XamUserReadProfileSettingsEx/
XamUserWriteProfileSettings.

- Unify X_USER_READ_PROFILE_SETTING and X_USER_WRITE_PROFILE_SETTING
  into X_USER_PROFILE_SETTING.
- Clean up Setting serialization to use X_USER_PROFILE_SETTING_DATA
  instead of manual buffer copying.
- Fix XamUserReadProfileSettingsEx case where user_index is non-zero
  and xuids are being used.
- Skip unset settings in XamUserWriteProfileSettings_entry.
2022-01-24 07:29:57 -06:00
Margen67 564a6d6238 [App] Disable stuff that crashes the emulator 2022-01-23 11:57:40 -06:00
Wunkolo f7c14a089d [x64] Add host-extension detection preprocessor
Rather than having a huge list of if-statements that all do the same
thing, this preprocessor allows a more concise pattern to detecting if
the emit-flag is enabled as well as the correlated Xbyak flag that it
needs to check for to before allowing the feature-flag to be emitted.

Also moved the AVX-check to the beginning to early-out rather than do a
bunch of wasted work only to find out last that the host doesn't even
support AVX.
2022-01-23 05:04:56 -06:00
Joel Linn e4ae1d8b2f [Base] Fix `copy_and_swap_16_in_32_aligned` 2022-01-22 16:18:54 +03:00
Joel Linn 0316d1a054 [Base] Tests for `copy_and_swap_16_in_32_aligned` 2022-01-22 16:18:54 +03:00
Joel Linn 4a288dc6bd [Base, aarch64] Add `copy_and_swap` NEON impls 2022-01-22 16:18:54 +03:00
Joel Linn bfaad055a2 [Base] Add easier to debug `copy_and_swap` tests 2022-01-22 16:18:54 +03:00
Rick Gibbed 617b17e25b
[WinKey] Fix RThumbDown being mapped to RThumbLeft 2022-01-14 16:06:40 -06:00
Wunkolo a9a365aa32 [x64] Add `GFNI`-based optimization for `VECTOR_SHA_V128(Int8)`
In the `Int8` case of `VECTOR_SHA_V128`, when all the values are the same, then a single-instruction `gf2p8affineqb` can be emitted that does an int8-based arithmetic-shift, utilizing GF(8) arithmetic.

More info here:
https://wunkolo.github.io/post/2020/11/gf2p8affineqb-int8-shifting/

As of now(Dec 2021): Tremont(Lakefield), Jasper Lake, Ice lake, Tigerlake, and Rocket Lake support GNFI.
2022-01-13 15:32:55 -06:00
Wunkolo fba23e3e75 [x64] Add `kX64EmitGFNI` emitter feature-flag
This determines support for the `gf2p8affineqb` instruction. Even though `GFNI` is typically found with AVX512-enabled chips, it _is_ possible for there to be a chip with `GFNI` but does not support `AVX` or `AVX2` of any sort. An example of this is Tremont(Lakefield) chips as well as Jasper Lake.

13df339fe7/GenuineIntel/GenuineIntel00806A1_Lakefield_LC_InstLatX64.txt (L1297-L1299)

13df339fe7/GenuineIntel/GenuineIntel00906C0_JasperLake_InstLatX64.txt (L1252-L1254)
2022-01-13 15:32:55 -06:00
Wunkolo 5d1b53cd6f [x64] Add `VECTOR_SHA_I8_SAME_CONSTANT` unit test
This is to target the new GNFI-based optimization for the Int8 case.
2022-01-13 15:32:55 -06:00
Stefan Schmidt 31c9f026c5 [UI] Force use of Xwayland when running on Wayland 2022-01-12 17:37:54 +03:00
Gliniak ad6aff001b [XAM/Net] Added note about sharing storage between Rtl and WSA errors 2022-01-11 21:50:19 +01:00
Gliniak fa332e13de [XAM/Net] Removed hardcoded WSA error codes 2022-01-11 21:48:36 +01:00
Gliniak 2d514ef222 [XAM/Net] Changed parameters type for NetDll_select 2022-01-11 21:48:33 +01:00
Gliniak d4e5ecb93b [XAM/Net] Added unified method of returning WSA error codes 2022-01-11 21:46:56 +01:00
Gliniak 0b90d5edf9 [XAM/Net] Implemented NetDll_getsockname 2022-01-11 21:46:53 +01:00
Enrico Pozzobon 5e31429128 [WinKey] Rebindable keyboard controls. 2022-01-11 12:38:13 -06:00
gibbed 5384e0e174 [Base] Fix MICROPROFILE_PRINTF. 2022-01-11 06:09:26 -06:00
gibbed f4d60f3fc4 [XAM] Fix xeXMsgStartIORequestEx result check. 2022-01-11 06:09:06 -06:00
Wunkolo 233ed107fe [CPU] Remove `use_haswell_instructions` in favor of `x64_extension_mask`
Rather than having a single bool to conditionally detect haswell-level
instruction features. The granularity is increased with a new
`x64_extension_mask` where individual features within the x64 backend
can be turned on or off in a bit-mask manner. Since we have an ARM
backend on the horizon, I've added this to the new `x64`
configuration-group rather than `CPU`. This new pattern will hopefully
allow for testing to be more targetted to certain processor features and
allows the user to determine if they want certain features to be enabled
or disabled(such as avoiding BMI2 on certain AMD processors due to
pdep/pext being incredibly slow). The default configuration is to detect
and utilize all available features.
2022-01-11 03:57:32 -06:00
Wunkolo 37aa3d129c [x64] Explicitly handle AND_NOT `dest == src1`
This addresses a JIT-issue in the case that the `src1` and `dest`
register are both the same. This issue only happens in the "generic"
x86 path but not in the BMI1-accelerated path.

Thanks Rick for the extensive debugging help.

When `src1` and `dest` were the same, then the `addc` instruction at
`82099A08` in title `584108FF` might emit the following assembly:
```
.text:82099A08                 andc      r11, r10, r11
  |
  | Jitted
  |
  V
00000000A0011B15  mov         rbx,r10
00000000A0011B18  not         rbx
00000000A0011B1B  and         rbx,rbx
```

This was due to the src1 operand and the destination register being the
same, which used to call the "else" case in the x64 emitter when it
needs to be handled explicitly due to register aliasing/allocation.

Addresses issue #1945
2022-01-10 15:48:49 -06:00
gibbed 975eadf17e [Kernel] Assert export function return/arg types. 2022-01-09 14:16:37 -06:00
gibbed 12ec728989 [Kernel] Use tables for export groups. 2022-01-09 14:16:37 -06:00
gibbed 3ad0a7dab2 [Kernel] Suffix export functions with _entry. 2022-01-09 12:17:03 -06:00
Triang3l 14b69fdb00 [GPU] vfetch_full fetching nothing still must calculate the address 2022-01-09 16:26:05 +03:00
Triang3l d6188c5d7e [GPU] Reuse base+index*stride in vfetch_mini instead of reloading the index GPR
The wheel shader in 4D530910 does vfetch_full to r0 with the index from r0.x, and then vfetch_mini.
Thanks @Gliniak for the finding :3
Also small formatting cleanup in commented-out code.
2022-01-09 14:58:38 +03:00
gibbed 600c14b3f0 [xboxknrl] Implement ExTryToAcquireRWLShared.
[xboxknrl] Implement ExTryToAcquireReadWriteLockShared.
2022-01-07 10:22:48 -06:00
gibbed 1f9c434b5e [xboxkrnl] Implement ExAcquireRWLShared.
[xboxkrnl] Implement ExAcquireReadWriteLockShared.
2022-01-07 10:22:48 -06:00
gibbed 3162a6435c [xboxkrnl] Implement ExTryToAcquireRWLExclusive.
[xboxkrnl] Implement ExTryToAcquireReadWriteLockExclusive.
2022-01-07 10:22:48 -06:00
gibbed e795337071 [xboxkrnl] ExReleaseReadWriteLock fixes.
[xboxkrnl] ExReleaseReadWriteLock fixes:
- Don't unncessarily double-load lock members.
- Reset readers entry count when lock count becomes negative.
- Properly decrease writers waiting count when writer event fired.
2022-01-07 10:22:48 -06:00
gibbed b4f35635c5 [xboxkrnl] ExAcquireReadWriteLockExclusive fixes.
[xboxkrnl] ExAcquireReadWriteLockExclusive fixes:
- Don't unnecessarily double-load lock count.
- Don't release spin lock before we're done with the lock.
2022-01-07 10:22:48 -06:00
gibbed fa774f1d86 [xboxkrnl] Fix up XexGetProcedureAddress logging.
[xboxkrnl] Fix up XexGetProcedureAddress failure logging.
2022-01-07 09:35:43 -06:00
Wunkolo 4303f6b200 [x64] Fix OPCODE_AND_NOT src1-constant case
Fix the the case where src1 is constant and src2 is non-constant causing
an assert due to trying to call `.constant()` on the src2 operand.
Interfaces with an issue Gliniak was encountering where title `4D53082D`
encounters an assert. Also includes a BMI1-acceleration in the 64-bit
case where a temporary register is needed(the `and` x86 instruction only
supports immediate constants up to 32-bits).
2022-01-06 13:00:58 -06:00
Gliniak 20fe7bc4b7 [Kernel/XMP] Send correct notification when playback controller is changed
- Changed locked into playback_client enumerator
- Changed vague notification name to something more descriptive
2022-01-04 16:22:57 -06:00
Gliniak 1ba4fbec17 [Kernel/XMP] Remove responsibility of stopping audio when controller is changed 2022-01-04 16:22:57 -06:00
Wunkolo 24d4e1e0e5 [x64] Add `BMI1`-based acceleration for `AndNot`
In the case of having two register operands for `AndNot`, the `andn` instruction can be used when the host supports `BMI1`. `andn` only supports 32-bit and 64-bit operands, so some register up-casting is needed.
2022-01-04 16:16:49 -06:00
Wunkolo 3ab43d480d [x64] Add `kX64EmitBMI1` feature-flag and detection
The `BMI1 feature` fits into the current pattern of `use_haswell_instructions` as BMI1 was only introduced in haswell.

Also moved the aliases to the end of the enum rather than interleave it with the bit definitions.
2022-01-04 16:16:49 -06:00
Wunkolo 0fdb855a11 [JIT, x64] Add and implement `OPCODE_AND_NOT`
Verified the x64 implementation using `xenia-cpu-ppc-tests`.
2022-01-04 16:16:49 -06:00
Joel Linn 4f258b2ee9 [GPU, Vulkan] Fix typo in non AMD64 code
* `copy_and_swap_16_unaligned` -> `copy_cmp_swap_16_unaligned`.
2022-01-02 16:47:05 -06:00
Wunkolo 13a48e13bd [Base] Add `operator<<` string conversion for `vec128_t`
This allows `catch` to print out the contents of a particular vector when diagnosing how a `REQUIRE` expression has failed.
2022-01-02 15:14:58 -06:00
Wunkolo f645c3ba31 [Base] Fix `to_hex_string` out-of-indexing for `vec128_t` type
Trying to print five `{:08X}` when vec128_t only has four values. 🥴
2022-01-02 15:14:58 -06:00
Wunkolo 5317907523 [x64] Add `kX64EmitAVX512*` feature-flags
Implements the detection of some baseline `AVX512` subsets and some common aliases into `X64EmitterFeatureFlags`.

So far, `AVX512{F,VL,BW,DQ}` are the only subsets of `AVX512` that are detected with this PR since I anticipate these are the ones that will actually be used a lot in the x64 backend. Some aliases are also implemented such as `kX64EmitAVX512Ortho` which is `AVX512F` and `AVX512VL` combined which are the two subsets of AVX512 required to allow for `AVX512` operations upon `ymm` and `xmm` registers.

These aliases can possibly be collapsed since we could just always require `AVX512VL` to be supported to allow for _any_ kind of `AVX512` to be used since we will practically always want to use `AVX512` on `xmm` registers at the very least as there is no use-case where we want to use the 512-bit `zmm` registers exclusively.
2022-01-02 11:52:31 -06:00
Wunkolo 1a8068b151 [Base] Add user-literals for several memory sizes
Rather than using `n * 1024 * 1024`, this adds a convenient `_MiB`/`_KiB` user-literal to the new `literals.h` header to concisely describe units of memory in a much more readable way. Any other useful literals can be added to this header. These literals exist in the `xe::literals` namespace so they are opt-in, similar to `std::chrono` literals, and require a `using namespace xe::literals` statement to utilize it within the current scope.

I've done a pass through the codebase to replace trivial instances of `1024 * 1024 * ...` expressions being used but avoided anything that added additional casting complexity from `size_t` to `uint32_t` and such to keep this commit concise.
2022-01-02 11:51:31 -06:00
Wunkolo b64b4c6761 [x64] IsFeatureEnabled: Allow parallel feature checks
Just checking if the resulting mask is non-zero means we cannot allow this function to check for multiple features in parallel. A hypothetical computer that supports FMA but not AVX2 will return `true` if you try to call `IsFeatureEnabled(kX64EmitFMA | kX64EmitAVX2)`. We should make sure all the masked flags return `true` rather than check for non-zero.

This is ramping up to allow for particular subsets of AVX512 to be checked for in parallel with a single function call.
2021-12-28 20:57:32 -06:00
Gliniak 371441ec3a [XModule] Remove module and its functions while unloading 2021-12-27 09:18:44 +01:00
Gliniak f2c0ae46c1 [Kernel] Added missing month to RtlTimeFieldsToTime
Additionally added check for highest possible month day
2021-12-22 15:02:25 +03:00
Triang3l 701300e8e9 [Linux] Use sched_yield instead of the deprecated pthread_yield 2021-12-18 19:43:17 +03:00
Triang3l 39890bab6f Merge branch 'master' into vulkan 2021-12-13 22:06:09 +03:00
Dr. Chat 509a1fa386 [GPU] Fix a crash when GetWindowTitleText is called before the texture cache is initialized 2021-12-12 22:51:24 -06:00
Triang3l 95c2101ca9 Merge branch 'master' into vulkan 2021-12-12 21:32:43 +03:00
Triang3l e25167d2bc [GPU] Fix quads>triangles cvar, primitive type test cases 2021-12-12 18:28:02 +03:00
Triang3l 0846cc026d [APU] Manage XAudio 2.8 lifecycle in MTA thread + error handling cleanup 2021-12-12 17:05:01 +03:00
Triang3l 9606ff2a31 [D3D12] Fix 8192 texture size storage 2021-12-12 16:27:49 +03:00
Triang3l d813f7435b [GPU] Revert 64bpp resolve addressing regression caused by a misunderstanding 2021-12-12 14:32:03 +03:00
Triang3l 793cebd6a7 [GPU] Explain 1.5x scaling issues in a comment 2021-12-12 14:31:05 +03:00
Triang3l 38b4741c8f [GPU] Mostly generic, not square-only resolution scaling 2021-12-11 21:55:33 +03:00
Triang3l e2da8597e1
[UI] Delete the now-unused loop_gtk.h 2021-12-04 16:36:45 +03:00
Jack Harper 211cc99f42 Rename control_flow_analysis_pass.cpp to control_flow_analysis_pass.cc
All of the (non-third party) cpp impl files use the .cc extension, this one doesn't. I was digging through the code and found this one so thought I might as well rename it whilst I'm here!
2021-11-13 02:18:22 +03:00
Triang3l fdec0ab332 [Code] Make union usage more consistent 2021-11-03 20:45:09 +03:00
Triang3l ce68a09b0c Merge branch 'master' into vulkan 2021-10-31 16:49:14 +03:00
Triang3l ddc3885795 [UI] Remove dtor lock as thread join will be done anyway 2021-10-31 16:04:46 +03:00
Triang3l 7e6cf349e2 [Build] Use first-party premake-androidndk (#1878) 2021-10-30 00:01:27 +03:00
Conrad Kramer 2962a266b5 Fix xenia-core build on macOS 2021-10-25 00:48:53 +03:00
Triang3l 28fec845d5 [GPU] Document memexport/resolve formats with more details 2021-10-22 20:00:41 +03:00
Gliniak f40607041b [APU] Skip audio header when there is no valid input
Thanks Cancerous1/Randprint for initial reseach in this topic
2021-10-18 08:50:51 +02:00
Gliniak d6660ac391 [Kernel] Added %L to formatter 2021-10-14 15:05:12 -05:00
soopercool101 5161bd7ab2 Fix "404 not found" on "Build commit on Github..." 2021-09-28 16:29:22 -05:00
Joel Linn cfd18b89f8 [GPU] GCC build fix for render target cache 2021-09-27 13:43:57 +03:00
Joel Linn 247cb91ac5 [Base] Replace GCC workaround (loop opt bug)
Previous workaround was dangerous, this one is more sane.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100801#c3
2021-09-27 13:43:57 +03:00
Triang3l 26a2d814da [UI] android.app.NativeActivity > WindowedAppActivity + code style 2021-09-18 20:32:24 +03:00
Joel Linn 360e2f5414 [Kernel] Fix glibc exception on empty content_root 2021-09-15 15:24:21 -05:00
Triang3l 0335571354 [UI] Android CallInUIThread and activity onDestroy 2021-09-15 22:58:11 +03:00
Triang3l cf85bf2efd Merge branch 'master' into vulkan 2021-09-14 22:29:51 +03:00
Triang3l f91b895c9a [Base] Don't use raw clock where unsupported 2021-09-13 23:13:02 +03:00
Triang3l 7aeac37eb6 [Base/UI] Android globals initialization + WindowedAppContext parts 2021-09-13 23:09:28 +03:00
Triang3l acbd22840d [Base] Android log sink + sink cleanup 2021-09-13 22:53:19 +03:00
Triang3l b77e6eb8e6 [D3D12] Fix syntax warnings reported by Clang 2021-09-12 17:12:33 +03:00
Triang3l 4f95e094e4 [GPU] Remove outdated forward declarations from trace_dump.h 2021-09-12 14:32:41 +03:00
Triang3l ecccd02f8a Merge branch 'master' into vulkan 2021-09-12 14:10:36 +03:00
Triang3l 6241b4f907 [Kernel] stringstream<< > string.push_back as LLVM libc++ doesn't support char16_t stream 2021-09-12 13:04:03 +03:00
Triang3l 9d992e3d06 [Kernel] Rename sin_zero due to #define on Android 2021-09-11 23:31:52 +03:00
Triang3l 44847abb98 [Kernel] Remove a TODO for a verified reference 2021-09-07 21:12:06 +03:00
Triang3l e720e0a540 [Code] Remove game names from code comments (most of at least) 2021-09-05 21:27:40 +03:00
Triang3l 6986d6c7e8 [Config] Use locale-neutral fmt instead of to_string 2021-08-28 18:26:18 -05:00
Triang3l 64366979c7 [UI] Make Xenia title start from a capital letter 2021-08-28 19:44:23 +03:00
Triang3l 6ce5330f5f [UI] Loop thread to main thread WindowedAppContext 2021-08-28 19:38:24 +03:00
Triang3l f540c188bf [Lint] Revert incorrect clang-format changes 2021-08-26 21:18:18 +03:00
Triang3l 7edfdc2672 Merge branch 'master' into linux_windowing 2021-08-26 22:58:14 +03:00
Gliniak f6f524b814 Implemented ExLoadedImageName 2021-08-18 17:37:44 -05:00
emoose bf8138a886 [VFS] Add NullDevice (returns success for all calls), handle \Device\Harddisk0\ with it
XMountUtilityDrive code tries reading/writing from \Device\Harddisk0\Cache0 / Cache1 / Partition0, NullDevice handling \Device\Harddisk0 will make that code think that the reads/writes were successful, so the utility-drive mount can proceed without failing.
2021-08-18 17:34:59 -05:00
emoose e5725b5877 [Kernel] Support XFileAlignmentInformation, stub NtDeviceIoControlFile & IoCreateDevice
XMountUtilityDrive-related code checks some values returned from NtDeviceIoControlFile, stub just returns values that it seems to accept
IoCreateDevice is also used by utility-drive code, writing some values into a pointer returned by it, so stub allocs space so it can write to the pointer without errors.
2021-08-18 17:34:59 -05:00
emoose eaab7998f7 [Kernel/XAM] Run XAM-tasks in seperate thread, stub XamTaskShouldExit 2021-08-18 17:34:59 -05:00
emoose f2c706f943 [App] Add cache:\ mount for older games that use it 2021-08-18 17:34:59 -05:00
Gliniak 05bfdb02e5 [XAM] Return correct error code from GetServiceInfo 2021-08-18 17:25:44 -05:00
gibbed ed0a15dcc8 Use AppVeyor vars for extended version info. 2021-08-18 16:44:41 -05:00
sephiroth99 4861022158 [Base] Fix fpfs with GCC/Clang
The fpfs function is using strtof to convert a string to floating point
value, but the type may be a double. Using strtof in that case won't
provide enough precision, so switch to using strtod. When the type is a
float, the double will be down-converted to the correct value.
2021-08-08 10:23:52 -05:00
Gliniak f933d9c409 [XAM] XamEnumerate: Set initial item_count value to 0 2021-08-08 10:23:11 -05:00
Gliniak c9073e101f [XAM] Fix ContentCreate to pass copy of root_name.
[XAM] Fix xeXamContentCreate to pass copy of root_name for deferred
operation, as the pointer may no longer be valid when the callback
is executed.
2021-08-01 13:55:56 -05:00
Triang3l 90c4950503 [HID] Fix SDL GetKeystroke copy-paste regression 2021-07-26 10:12:17 +03:00
Gliniak 35321a10c3 [Kernel] Improvements to MmQueryStatistics
- Fixed incorrect calculation of available pages
- Changed amount of total virtual bytes
- Added real amount of reserved virtual bytes
- Removed unused methods
2021-07-15 09:45:35 +02:00
Triang3l 1e0237d404 [Vulkan] Fix XCB #ifdef 2021-07-12 12:15:47 +03:00
Triang3l 6412bb8910 [Vulkan] Remove a remaining Volk reference 2021-07-12 00:00:06 +03:00
Triang3l 692e329e9c [Vulkan] Load Vulkan manually for more lifetime and extension control 2021-07-11 22:56:01 +03:00
Triang3l 9bb104b354 Merge branch 'master' into vulkan 2021-07-03 20:59:25 +03:00
Triang3l 458e4e1a31 [GPU] Official RB name from RDNA/GCN/TeraScale/Xenos docs/news 2021-07-01 23:43:01 +03:00
Triang3l 1cf12ec70b [UI/HID] ui::VirtualKey enum 2021-07-01 23:32:26 +03:00
gibbed ddee85f0be [Kernel] Fix XStaticUntypedEnumerator item count.
[Kernel] Fix XStaticUntypedEnumerator not tracking item count.

Somehow this didn't make it into PR #1862.
2021-06-30 13:26:05 -05:00
gibbed 4498a28568 [XAM] Deferred xeXamContentCreate. 2021-06-30 03:39:22 -05:00
gibbed e8fda5878c [XAM] Enumerator improvements.
- [Kernel] XEnumerator::WriteItems no longer cares about provided
  buffer size, since we know the size when the XEnumerator was created.
- [Kernel] Added XStaticEnumerator template. Previous
  XStaticEnumerator renamed to XStaticUntypedEnumerator.
- [XAM] Deferred xeXamEnumerate.
2021-06-30 03:39:22 -05:00
gibbed b18f73b949 [Kernel] Add make_object template. 2021-06-30 03:39:22 -05:00
Joel Linn 480791a056 [Base] Implement message boxes on Linux 2021-06-29 20:41:20 -05:00
emoose e23a9b7608 [Kernel] Add APC support to NtWriteFile 2021-06-29 03:13:43 -05:00
gibbed a3535be416 [CPU] Suppress C4065 warning in SyscallHandler. 2021-06-29 02:41:29 -05:00
gibbed fb0c354b2f [xboxkrnl] Trim DbgPrint messages. 2021-06-28 20:32:52 -05:00
gibbed a0ed4ec711 [xboxkrnl] Fix xeRtlNtStatusToDosError logging. 2021-06-28 20:32:52 -05:00
gibbed 997d0555db Lint/format .inc files. 2021-06-28 20:32:52 -05:00
gibbed 8daef93207 [APU] XMA register table cleanup, documentation.
- [APU] Clean up XMA register table.
- [APU] Document observed register ranges in the XMA register table.
2021-06-28 20:32:52 -05:00
gibbed ead4818e25 [xboxkrnl] Optional string formatter logging. 2021-06-28 20:32:52 -05:00
gibbed 0cf4cab59b [CPU] Add syscall handler. 2021-06-28 20:32:52 -05:00
gibbed c6259241a2 [GPU] Complain when command packet is 0xCDCDCDCD. 2021-06-28 20:32:52 -05:00
gibbed f2a68e4b85 [Base] ByteStream assert cleanup. 2021-06-28 20:32:52 -05:00
gibbed fa8e2ee788 [VFS] Suppress error msg for ShaderDumpxe:\CB.
[VFS] Suppress error message for ShaderDumpxe:\CompareBackends.
2021-06-28 20:32:52 -05:00
gibbed a12f775c23 [Base] LaunchWebBrowser now takes a string view. 2021-06-28 20:32:52 -05:00