xenia-canary

Commit Graph

Author	SHA1	Message	Date
Triang3l	e97eb75b94	[Vulkan] Update variableMultisampleRate comments (actually supported) [ci skip]	2022-12-04 14:55:56 +03:00
Triang3l	0b4f5ef286	[SPIR-V] Decorate whole gl_PerVertex with Invariant Block members can be decorated with Invariant only since SPIR-V 1.5 Revision 2. In earlier versions, Invariant can be used only for variables. Mesa warns about this.	2022-12-03 14:27:43 +03:00
Gliniak	1eb61aa9ab	Added reccently opened titles list	2022-11-29 10:47:30 +01:00
chrisps	0674b68143	Merge pull request #96 from chrisps/host_guest_stack_synchronization Host/Guest stack sync, exception messagebox, kernel improvements, minor opt	2022-11-27 10:30:16 -08:00
Gliniak	12005acc98	[APU] Check if splitted frame length is valid	2022-11-27 18:40:27 +01:00
chss95cs@gmail.com	90c771526d	"Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false Remove dead #if 0'd code in math.h On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe and gives a modest code size reduction Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds. In some games seemingly random crashes were happening that were hard to trace because the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread. Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted. Changes were also made to allow the guest to call into a piece of an existing x64 function. This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false. It is possible it may have introduced regressions, but i dont know of any yet So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame. Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function add Processor::LookupModule Add Backend::DeinitializeBackendContext Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0) add notes about flags that trap in XamInputGetCapabilities 0 == 3 in XamInputGetCapabilities Name arg 2 of XamInputSetState PrefetchW in critical section kernel funcs if available & doing cmpxchg Add terminated field to X_KTHREAD, set it on termination Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type) and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread. Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?) Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen Use page_size_shift in more places Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues "Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false Remove dead #if 0'd code in math.h On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe and gives a modest code size reduction Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds. In some games seemingly random crashes were happening that were hard to trace because the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread. Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted. Changes were also made to allow the guest to call into a piece of an existing x64 function. This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false. It is possible it may have introduced regressions, but i dont know of any yet So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame. Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function add Processor::LookupModule Add Backend::DeinitializeBackendContext Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0) add notes about flags that trap in XamInputGetCapabilities 0 == 3 in XamInputGetCapabilities Name arg 2 of XamInputSetState PrefetchW in critical section kernel funcs if available & doing cmpxchg Add terminated field to X_KTHREAD, set it on termination Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type) and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread. Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?) Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen Use page_size_shift in more places Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues	2022-11-27 09:39:33 -08:00
Gliniak	1451ca4266	[APU] Clear host data while reseting context	2022-11-27 17:00:31 +01:00
Gliniak	9fdfd2ada9	[APU] Removed old hack that invalidates input on decoder error Added returning parsing error while decoder fails	2022-11-26 17:25:39 +01:00
chss95cs@gmail.com	7a17fad88a	fix crash from precompiling out of range funcs, add xexcache version, increment xexcache version (all priors are version 0 thanks to 0 initialization)	2022-11-07 05:40:18 -08:00
chss95cs@gmail.com	e21fd22d09	add x_kthread priority/fpu_exceptions_on fields, set fpu_exceptions_on in KeEnableFpuExceptions, set priority in SetPriority add msr field on context write to msr for mtmsr/mfmsr, do not have correct default value for msr yet, nor has mtmsrd been reimplemented do not evaluate assert expressions in release at all, while still avoiding unused variable warnings	2022-11-06 11:03:10 -08:00
chss95cs@gmail.com	c1d922eebf	Minor decoder optimizations, kernel fixes, cpu backend fixes	2022-11-05 10:50:33 -07:00
Gliniak	ba66373d8c	[APU][Janky] Fixed issues with incorrect frames on streamed data This requires a lot more research and test data!	2022-11-03 20:56:36 +01:00
Gliniak	dae508500a	[APU] Clear remaining packets skip when we're done with current stream Plus some additional logging	2022-11-03 12:59:47 +01:00
Margen67	4ba14bc35e	[APU+HID] Optimizations	2022-11-03 03:56:13 -07:00
Gliniak	b23566b823	[APU] Fix incorrect packet frame count when frame ends exactly where packet ends This resolves looping background sound in GoW	2022-11-03 11:14:37 +01:00
Gliniak	259679d53c	[APU] Handle exceeding input offset by switching buffer This should resolve crashes in FH	2022-11-02 08:47:36 +01:00
chrisps	8186792113	Revert "Minor decoder optimizations, kernel fixes, cpu backend fixes"	2022-11-01 14:45:36 -07:00
chrisps	781871e2d5	Merge pull request #87 from chrisps/canary_experimental Minor decoder optimizations, kernel fixes, cpu backend fixes	2022-11-01 11:49:10 -07:00
Gliniak	c080e2e17c	[APU] Resolved crashes related to out of bound readouts	2022-11-01 11:24:01 +01:00
Triang3l	778333b1b5	[UI] Fix ClearInput not called in ImGuiDrawer after deferred dialog removal Also cleanup the code involved in dialog registration, and update the explanation of why dialog removal is delayed until the end of drawing (the original was written back when window listener and UI drawer callback registration during the execution of the callbacks was deferred, but that was wrong as that might result in execution of callbacks belonging to now-deleted objects).	2022-10-31 18:57:54 +03:00
chss95cs@gmail.com	06bfd624de	fix failed debug build from loops variable assert	2022-10-30 12:33:08 -07:00
chss95cs@gmail.com	bff264b5fd	Fixed RtlCompareString and RtlCompareStringN, they were very wrong, for CompareString the params are struct ptrs not char ptrs Fixed a ton of clang-cl compiler warnings about unused variables, still many left. Fixed a lot of inconsistent override ones too	2022-10-30 10:47:09 -07:00
chrisps	65b9d93551	Merge branch 'xenia-canary:canary_experimental' into canary_experimental	2022-10-30 09:05:40 -07:00
chss95cs@gmail.com	550d1d0a7c	use much faster exp2/cos approximations in ffmpeg, large decrease in cpu usage on my machine on decoder thread properly byteswap r13 for spinlock Add PPCOpcodeBits stub out broken fpscr updating in ppc_hir_builder. it's just code that repeatedly does nothing right now. add note about 0 opcode bytes being executed to ppc_frontend Add assert to check that function end is greater than function start, can happen with malformed functions Disable prefetch and cachecontrol by default, automatic hardware prefetchers already do the job for the most part minor cleanup in simplification_pass, dont loop optimizations, let the pass manager do it for us Add experimental "delay_via_maybeyield" cvar, which uses MaybeYield to "emulate" the db16cyc instruction Add much faster/simpler way of directly calling guest functions, no longer have to do a byte by byte search through the generated code Generate label string ids on the fly Fix unused function warnings for prefetch on clang, fix many other clang warnings Eliminated majority of CallNativeSafes by replacing them with naive generic code paths. ^ Vector rotate left, vector shift left, vector shift right, vector shift arithmetic right, and vector average are included These naive paths are implemented small loops that stash the two inputs to the stack and load them in gprs from there, they are not particularly fast but should be an order of magnitude faster than callnativesafe to a host function, which would involve a call, stashing all volatile registers, an indirect call, potentially setting up a stack frame for the arrays that the inputs get stashed to, the actual operations, a return, loading all volatile registers, a return, etc Added the fast SHR_V128 path back in Implement signed vector average byte, signed vector average word. previously we were emitting no code for them. signed vector average byte appears in many games Fix bug with signed vector average 32, we were doing unsigned shift, turning negative values into big positive ones potentially	2022-10-30 08:48:58 -07:00
Gliniak	55877f4c61	[APU] Force buffer swap at the end of stream Plus some debugging messages and lint fixes	2022-10-25 17:20:45 +02:00
Gliniak	6b11787c93	[APU] Fixed typo that prevented last packet in stream to be processed	2022-10-24 21:33:25 +02:00
Gliniak	fac2a89d0f	Disallow offset to be set before header, header size fix, audio channels crashfix	2022-10-24 19:43:43 +02:00
Triang3l	a37b57ca8d	[GPU] Fix tiled mip tail extent calculation Previously, for mips, the dimensions of the texture weren't rounded to powers of two before calculating the mip tail extent, resulting in the mip tail for a 260 blocks tall texture, that contains mips ending at Y of up to 36, having the Y extent calculated as 32. With rounding to powers of two, it would have been 64. However, with the GetTiledAddressUpperBound functions, none of this is necessary at all (and neither is rounding the extents in TextureGuestLayout::Level to 32x32x4 blocks) - using the same code for calculating the XYZ extents of tiled textures as for linear textures now, which, for the mip tail, calculates the actual maximum coordinates of the mips stored in it - and rounding to tiles is done internally by GetTiledAddressUpperBound.	2022-10-23 21:26:47 +03:00
Triang3l	74f1f6bb6d	[Vulkan] Check depthClamp feature	2022-10-23 19:01:17 +03:00
Triang3l	4add1f67b1	[D3D12] Replace unused shared memory view with a null view Fixes the PIX validation warning about missing resource states on every guest draw. Also potentially prevents drivers from making assumptions about the shared memory buffer based on the bindings, though no such cases are currently known.	2022-10-23 18:09:47 +03:00
Wunkolo	5fde7c6aa5	[x64] Add AVX512 optimizations for `PERMUTE_V128` Uses the single-instruction AVX512 `vperm*` instructions to accelerate the `INT8_TYPE` and `INT16_TYPE` permutation opcodes. The `INT8_TYPE` is accelerated using `AVX512VBMI` subset of AVX512. Available since Icelake(Intel) and Zen4(AMD).	2022-10-21 08:47:31 -05:00
Wunkolo	f207239349	[x64] Add `kX64EmitAVX512VBMI` feature-flag and detection Allows access to byte-element 2-register permutations(32-byte look up tables) and for 64-bit multi-shifts. Particularly adding this to accelerate the assembly of our `PERMUTE` opcode.	2022-10-21 08:47:31 -05:00
Wunkolo	d73088e5ca	[x64] Add AVX512 optimization for `OPCODE_VECTOR_SUB`(saturated) Passes the `vsubuws` and `vsubsws` unit-tests from https://github.com/xenia-project/xenia/pull/1348	2022-10-21 08:45:43 -05:00
Radosław Gliński	7c375879bc	Merge pull request #85 from chrisps/canary_experimental Kernel improvements, "fix" crash on sandy bridge/ivy bridge	2022-10-21 14:18:03 +02:00
chrisps	4493d17acc	Update hir_builder.cc	2022-10-20 14:58:27 -07:00
chrisps	adc3405537	change else{if} to else if in AndNot	2022-10-20 14:56:55 -07:00
Gliniak	48fea6d9aa	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-10-18 12:19:52 +02:00
Triang3l	cdb40ddb28	[DXBC] Fix interpolator copying from v# to r# in PS The bit count was of `(1<<i)-1` itself (thus couldn't handle interpolators with a smaller index skipped), not of `bits&((1<<i)-1)`.	2022-10-18 13:12:37 +03:00
chss95cs@gmail.com	d8b7b3ecec	Fix bindless path in d3d12 that i broke in earlier commit (did not affect any users, thats a debug thing) Fix guest code profiler, it previously only worked with function precomp + all code you were about to execute already discovered Allow AndNot if type is V128	2022-10-16 07:48:43 -07:00
chss95cs@gmail.com	22e52cbecd	Canary can now run on sandy bridge/e and ivy bridge/e Stubbed out OPCODE_AND_NOT, its fallback implementation if bmi1 was not supported was broken. it's difficult to tell where the actual issue is there.	2022-10-15 05:14:53 -07:00
chss95cs@gmail.com	7204532b1c	Implement RtlUpcaseUnicodeChar	2022-10-15 04:29:13 -07:00
chrisps	a495709344	Merge branch 'xenia-canary:canary_experimental' into canary_experimental	2022-10-15 03:07:35 -07:00
chss95cs@gmail.com	efbeae660c	Drastically reduce cpu time wasted by XMADecoderThread spinning, went from 13% of all cpu time to about 0.6% in my tests Commented out lock in WatchMemoryRange, lock is always held by caller properly set the value/check the irql for spinlocks in xboxkrnl_threading	2022-10-15 03:07:07 -07:00
Gliniak	d262214c1b	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-10-14 20:13:03 +02:00
chss95cs@gmail.com	ecf6bfbbdf	Stub event query for linux, fix missing semicolon in linux SetEventBoostPriority	2022-10-09 12:30:18 -07:00
Triang3l	45050b2380	[GPU] Vulkan fragment shader interlock RB and related fixes/cleanup Also fixes addressing of MSAA samples 2 and 3 for 64bpp color render targets in the ROV RB implementation on Direct3D 12. Additionally, with FSI/ROV, alpha test and alpha to coverage are done only if the render target 0 was dynamically written to (according to the Direct3D 9 rules for writing to color render targets, though not sure if they actually apply to the alpha tests on Direct3D 9, but for safety). There is also some code cleanup for things spotted during the development of the feature.	2022-10-09 22:06:41 +03:00
Gliniak	c923ab78a9	[Config] Added note about internal_display_resolution	2022-10-09 12:33:04 +02:00
Gliniak	7975ea78d4	[Base] BitStream: Prevent readout beyond buffer	2022-10-09 12:24:46 +02:00
Gliniak	17b3939bbf	Revert "[Base] Changed size of bitstream accessed data (Risky)" This reverts commit `061000af01`.	2022-10-09 12:18:43 +02:00
chss95cs@gmail.com	2dd6f33f4b	Fix debug/ui premake too	2022-10-08 10:34:50 -07:00
chss95cs@gmail.com	d8c94b1aee	Fix premake filter mistake that broke debug builds (and likely any build other than release)	2022-10-08 10:10:36 -07:00
chss95cs@gmail.com	8f7f7dc6ad	fixed wine crash from use of NtSetEventPriorityBoost add xe::clear_lowest_bit, use it in place of shift-andnot in some bit iteration code make is_allocated_ and is_enabled_ volatile in xma_context preallocate avpacket buffer in XMAContext::Setup, the reallocations of the buffer in ffmpeg were showing up on profiles check is_enabled and is_allocated BEFORE locking an xmacontext. XMA worker was spending most of its time locking and unlocking contexts Removed XeDMAC, dma:: namespace. It was a bad idea and I couldn't make it work in the end. Kept vastcpy and moved it to the memory namespace instead Made the rest of global_critical_region's members static. They never needed an instance. Removed ifdef'ed out code from ring_buffer.h Added EventInfo struct to threading, added Event::Query to aid with implementing NtQueryEvent. Removed vector from WaitMultiple, instead use a fixed array of 64 handles that we populate. WaitForMultipleObjects cannot handle more than 64 objects. Remove XE_MSVC_OPTIMIZE_SMALL() use in x64_sequences, x64 backend is now always size optimized because of premake Make global_critical_region_ static constexpr in shared_memory.h to get rid of wasteage of 8 bytes (empty class=1byte, +alignment for next member=8) Move trace-related data to the tail of SharedMemory to keep more important data together In IssueDraw build an array of fetch constant addresses/sizes, then pre-lock the global lock before doing requestrange for each instead of individually locking within requestrange for each of them Consistent access specifier protected for pm4_command_processor_declare Devirtualize WriteOneRegisterFromRing. Move ExecutePacket and ExecutePrimaryBuffer to pm4_command_buffer_x Remove many redundant header inclusions access xenia-gpu Minor microoptimization of ExecutePacketType0 Add TextureCache::RequestTextures for batch invocation of LoadTexturesData Add TextureCache::LoadTexturesData for reducing the number of times we release and reacquire the global lock. Ideally you should hold the global lock for as little time as possible, but if you are constantly acquiring and releasing it you are actually more likely to have contention Add already_locked param to ObjectTable::LookupObject to help with reducing lock acquire/release pairs Add missing checks to XAudioRegisterRenderDriverClient_entry. this is unlikely to fix anything, it was just an easy thing to do Add NtQueryEvent system call implementation. I don't actually know of any games that need it. Instead of using std::vector + push_back in KeWaitForMultipleObjects and xeNtWaitForMultipleObjectsEx use a fixed size array of 64 and track the count. More than 64 objects is not permitted by the kernel. The repeated reallocations from push_back were appearing unusually high on the profiler, but were masked until now by waitformultipleobjects natural overhead Pre-lock the global lock before looking up each handle for xeNtWaitForMultipleObjectsEx and KeWaitForMultipleObjects. Pre-lock before looking up the signal and waiter in NtSignalAndWaitForSingleObjectEx add missing checks to NtWaitForMultipleObjectsEx Support pre-locking in XObject::GetNativeObject	2022-10-08 09:55:17 -07:00
chss95cs@gmail.com	bae63b95c5	Update to latest version of cxxopts	2022-09-30 06:51:25 -07:00
chss95cs@gmail.com	b4c175d8a3	Enable SDL_LEAN_AND_MEAN, SDL_RENDER_DISABLED, saves about 500kb in final exe Build several projects that arent performance critical with /Os and /O1 under msvc windows	2022-09-29 07:26:38 -07:00
chss95cs@gmail.com	7e58a3b320	Fix compiler errors i introduced under clang-cl remove xe_kernel_export_shim_fn field of Export function_data, trampoline is now the only way exports get invoked Remove kernelstate argument from string functions in order to conform to the trampoline signature (the argument was unused anyway) Constant-evaluated initialization of ppc_opcode_disasm_table, removal of unused std::vector fields Constant-evaluated initialization of export tables name field on export is just a const char* now, only immutable static strings are ever passed to it Remove unused callcount field of export. PM4 compare op function extracted Globally apply /Oy, /GS-, /Gw on msvc windows Remove imgui testwindow code call, it took up like 300 kb	2022-09-29 07:04:17 -07:00
Gliniak	203267b106	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-09-23 12:23:53 +02:00
Rick Gibbed	3bfa3b05e1	Lint fix.	2022-09-22 06:34:21 -05:00
Gliniak	7d970967c4	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-09-20 21:15:12 +02:00
beeanyew	cd17f1846f	[Input System] xe_mutex revert On request from chrisps, revert xe_mutex back to xe_unlikely_mutex to avoid mutex deadlocks while initializing hid-winkey.	2022-09-18 15:18:29 +02:00
chss95cs@gmail.com	eb8154908c	atomic cas use prefetchw if available remove useless memorybarrier remove double membarrier in wait pm4 cmd add int64 cvar use int64 cvar for x64 feature mask Rework some functions that were frontend bound according to vtune placing some of their code in different noinline functions, profiling after indicating l1 cache misses decreased and perf of func increased remove long vpinsrd dep chain code for conversion.h, instead do normal load+bswap or movbe if avail Much faster entry table via split_map, code size could be improved though GetResolveInfo was very large and had impact on icache, mark callees as noinline + msvc pragma optimize small use log2 shifts instead of integer divides in memory minor optimizations in PhysicalHeap::EnableAccessCallbacks, the majority of time in the function is spent looping, NOT calling Protect! Someone should optimize this function and rework the algo completely remove wonky scheduling log message, it was spammy and unhelpful lock count was unnecessary for criticalsection mutex, criticalsection is already a recursive mutex brief notes i gotta run	2022-09-17 04:04:53 -07:00
Wunkolo	addd8c94e5	[x64] Add AVX512 optimization for `OPCODE_VECTOR_ADD`(saturated) Uses a single `vpternlogd` to test for signed/unsigned overflow/underflow. Then utilizes AVX512 mask operations to create either `0x7FFFFFFF` or `0x80000000` arithmetically.	2022-09-14 11:39:03 -05:00
Wunkolo	9fd684594b	[x64] Add AVX512 optimization for `OPCODE_VECTOR_CONVERT_F2I`(unsigned) `vcvttps2udq` already saturates overflowing and unordered values to `0xFFFFFFFF`. Using mask registers, zeroes are written to negative values within the same instruction.	2022-09-12 13:52:57 -05:00
chss95cs@gmail.com	0fd4a2533b	Prevent clang-format from moving d3d12_nvapi above the require d3d12 headers	2022-09-11 14:35:33 -07:00
chss95cs@gmail.com	20638c2e61	use Sleep(0) instead of SwitchToThread, should waste less power and help the os with scheduling. PM4 buffer handling made a virtual member of commandprocessor, place the implementation/declaration into reusable macro files. this is probably the biggest boost here. Optimized SET_CONSTANT/ LOAD_CONSTANT pm4 ops based on the register range they start writing at, this was also a nice boost Expose X64 extension flags to code outside of x64 backend, so we can detect and use things like avx512, xop, avx2, etc in normal code Add freelists for HIR structures to try to reduce the number of last level cache misses during optimization (currently disabled... fixme later) Analyzed PGO feedback and reordered branches, uninlined functions, moved code out into different functions based on info from it in the PM4 functions, this gave like a 2% boost at best. Added support for the db16cyc opcode, which is used often in xb360 spinlocks. before it was just being translated to nop, now on x64 we translate it to _mm_pause but may change that in the future to reduce cpu time wasted texture util - all our divisors were powers of 2, instead we look up a shift. this made texture scaling slightly faster, more so on intel processors which seem to be worse at int divs. GetGuestTextureLayout is now a little faster, although it is still one of the heaviest functions in the emulator when scaling is on. xe_unlikely_mutex was not a good choice for the guest clock lock, (running theory) on intel processors another thread may take a significant time to update the clock? maybe because of the uint64 division? really not sure, but switched it to xe_mutex. This fixed audio stutter that i had introduced to 1 or 2 games, fixed performance on that n64 rare game with the monkeys. Took another crack at DMA implementation, another failure. Instead of passing as a parameter, keep the ringbuffer reader as the first member of commandprocessor so it can be accessed through this Added macro for noalias Applied noalias to Memory::LookupHeap. This reduced the size of the executable by 7 kb. Reworked kernel shim template, this shaved like 100kb off the exe and eliminated the indirect calls from the shim to the actual implementation. We still unconditionally generate string representations of kernel calls though :(, unless it is kHighFrequency Add nvapi extensions support, currently unused. Will use CPUVISIBLE memory at some point Inserted prefetches in a few places based on feedback from vtune. Add native implementation of SHA int8 if all elements are the same Vectorized comparisons for SetViewport, SetScissorRect Vectorized ranged comparisons for WriteRegister Add XE_MSVC_ASSUME Move FormatInfo::name out of the structure, instead look up the name in a different table. Debug related data and critical runtime data are best kept apart Templated UpdateSystemConstantValues based on ROV/RTV and primitive_polygonal Add ArchFloatMask functions, these are for storing the results of floating point comparisons without doing costly float->int pipeline transfers (vucomiss/setb) Use floatmasks in UpdateSystemConstantValues for checking if dirty, only transfer to int at end of function. Instead of dirty \|= (x == y) in UpdateSystemConstantValues, now we do dirty_u32 \|= (x^y). if any of them are not equal, dirty_u32 will be nz, else if theyre all equal it will be zero. This is more friendly to register renaming and the lack of dependencies on EFLAGS lets the compiler reorder better Add PrefetchSamplerParameters to D3D12TextureCache use PrefetchSamplerParameters in UpdateBindings to eliminate cache misses that vtune detected Add PrefetchTextureBinding to D3D12TextureCache Prefetch texture bindings to get rid of more misses vtune detected (more accesses out of order with random strides) Rewrote DMAC, still terrible though and have disabled it for now. Replace tiny memcmp of 6 U64 in render_target_cache with inline loop, msvc fails to make it a loop and instead does a thunk to their memcmp function, which is optimized for larger sizes PrefetchTextureBinding in AreActiveTextureSRVKeysUpToDate Replace memcmp calls for pipelinedescription with handwritten cmp Directly write some registers that dont have special handling in PM4 functions Changed EstimateMaxY to try to eliminate mispredictions that vtune was reporting, msvc ended up turning the changed code into a series of blends in ExecutePacketType3_EVENT_WRITE_EXT, instead of writing extents to an array on the stack and then doing xe_copy_and_swap_16 of the data to its dest, pre-swap each constant and then store those. msvc manages to unroll that into wider stores stop logging XE_SWAP every time we receive XE_SWAP, stop logging the start and end of each viz query Prefetch watch nodes in FireWatches based on feedback from vtune Removed dead code from texture_info.cc NOINLINE on GpuSwap, PGO builds did it so we should too.	2022-09-11 14:14:48 -07:00
Wunkolo	90fffe1de7	[PPC] Fix memory assert formatting This was still using printf-style format specifiers. Causing memory asserts to show up like this while testing. ``` !> 0000438C Memory 10001040 assert failed: !> 0000438C Expected: %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X !> 0000438C Actual: %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X %02X !> 0000438C TEST FAILED ``` Updated them so they format correctly: ``` !> 00002CCC Memory 10001040 assert failed: !> 00002CCC Expected: FC FD FE FF 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F !> 00002CCC Actual: FC FD FE FF 00 00 00 00 00 00 00 00 00 00 00 00 !> 00002CCC TEST FAILED ```	2022-09-05 13:47:48 -05:00
Wunkolo	b0cc3db4d8	[x64] Add AVX512 optimization for `NOT_V128`	2022-09-05 13:47:30 -05:00
chss95cs@gmail.com	0c576877c8	Add constant folding for LVR when 16 aligned, clean up prior commit by removing dead test code for LVR/LVL/STVL/STVR opcodes and legacy hir sequence Delay using mm_pause in KeAcquireSpinLockAtRaisedIrql_entry, a huge amount of time is spent spinning in halo3	2022-09-04 22:42:51 -05:00
chss95cs@gmail.com	d372d8d5e3	nasty commit with a bunch of test code left in, will clean up and pr Remove the logger_ != nullptr check from shouldlog, it will nearly always be true except on initialization and gets checked later anyway, this shrinks the size of the generated code for some Select specialized vastcpy for current cpu, for now only have paths for MOVDIR64B and generic avx1 Add XE_UNLIKELY/LIKELY if, they map better to the c++ unlikely/likely attributes which we will need to use soon Finished reimplementing STVL/STVR/LVL/LVR as their own opcodes. we now generate far less code for these instructions. this also means optimization passes can be written to simplify/remove/replace these instructions in some cases. Found that a good deal of the X86 we were emitting for these instructions was dead code or redundant. the reduction in generated HIR/x86 should help a lot with compilation times and make function precompilation more feasible as a default Don't static assert in default prefetch impl, in c++20 the assertion will be triggered even without an instantiation Reorder some if/else to prod msvc into ordering the branches optimally. it somewhat worked... Added some notes about which opcodes should be removed/refactored Dispatch in WriteRegister via vector compares for the bounds. still not very optimal, we ought to be checking whether any register in a range may be special A lot of work on trying to optimize writeregister, moved wraparound path into a noinline function based on profiling info Hoist the IsUcodeAnalyzed check out of AnalyzeShader, instead check it before each call. Profiler recorded many hits in the stack frame setup of the function, but none in the actual body of it, so the check is often true but the stack frame setup is run unconditionally Pre-check whether we're about to write a single register from a ring Replace more jump tables from draw_util/texture_info with popcnt based sparse indexing/bit tables/shuffle lookups Place the GPU register file on its own VAD/virtual allocation, it is no longer a member of graphics system	2022-09-04 22:42:51 -05:00
illusion0001	f62ac9868a	Make portable default for new install	2022-09-04 22:42:40 -05:00
chss95cs@gmail.com	78c9a48bc2	also use vastcpy for shared memory page stuff	2022-08-28 14:52:12 -07:00
chss95cs@gmail.com	f31869092c	Fixed a bug with readback_resolve and readback_memexport that was responsible for a large portion of their overhead. readback_memexport and resolve are now usable for games, depending on your hardware. in my case games that were slideshows now run at like 20-30 fps, and my hardware isnt the best for xenia. add split_map class for mapping keys to values in a way that optimizes for frequent searches and infrequent insertions/removals remove jump table implementation of GetColorRenderTargetFormatComponentCount, it was appearing relatively high in profiles. instead pack the component counts into a single 32 bit word, which is indexed by shifting Add cvar to align all basic blocks to a boundary Add mmio aware load paths liberally apply XE_RESTRICT in ringbuffer related code Removed the IS_TRUE and IS_FALSE opcodes, they were pointless duplicates of COMPARE_EQ/COMPARE_NE and i want to simplify our set of opcodes for future backends More work on LVSR/LVSL/STVR/STVL opcodes Optimized X64 translated code emission, now only compute instrkey once Add code for pre-computing integer division magic numbers Optimized GetHostViewportInfo a little Move args for GetHostViewportInfo into a class, cache the result and compare for future queries. moved GetHostViewportInfo far lower on the profile Add (currently not functional, and very racy) asynchronous memcpy code. will improve it and actually use it in future commits. Add non-temporal memcpy function for huge page-aligned allocations. Used for copying to shared memory/readback hoist are_accumulated_render_targets_valid_ check out of loop in render_target_cache already bound check. Add stosb/movsb code for small constant memcpys/memsets that arent worth the overhead of memcpy/memset	2022-08-28 14:24:25 -07:00
beeanyew	3569e97e0e	[CPU] Add rldicx implementation NOTE: May or may not be correct, but works for 535507D4.	2022-08-28 20:02:39 +02:00
beeanyew	75ed343e72	[CPU] Add stub OE handling implementation for addex and negx	2022-08-28 20:01:26 +02:00
illusion0001	04c9c02270	Guest crash message more useful	2022-08-24 09:42:56 -05:00
Radosław Gliński	9006b309af	Merge pull request #62 from chrisps/canary_experimental Minor correctness/constant folding fixes, guest code optimizations for pre-ryzen amd processors	2022-08-23 00:01:24 +02:00
chss95cs@gmail.com	1ffd7ecae8	Remove vpcmov print	2022-08-21 12:40:56 -07:00
chss95cs@gmail.com	b5ef3453c7	Disable most XOP code by default, the manual must be wrong for the shifts or we must be assembling them incorrectly, will return to it later and fix comparisons and select done by xop are fine though	2022-08-21 12:32:33 -07:00
chss95cs@gmail.com	b26c6ee1b8	Fix some more constant folding fabsx does NOT set fpscr turns out that our vector unsigned compare instructions are a bit wierd?	2022-08-21 10:27:54 -07:00
chss95cs@gmail.com	0ebc109d4d	add initial xop codepaths, still need to finish the rest of the compares, and then do shifts, rotates, and PERMUTE Add vector simplification pass, so far it only recognizes whether VECTOR_DENORMFLUSH is useless and optimizes them away Tag restgplr/savegplr/restvmx/savevmx/restfpr/savefpr with useful information, i intend to inline them (they tend to be the most heavily called guest functions)	2022-08-21 08:55:42 -07:00
Gliniak	da00ede181	[XAM/Settings] Check if provided size doesn't exceed maximal setting size	2022-08-21 17:46:00 +02:00
Radosław Gliński	0b013fdc6b	Merge pull request #61 from chrisps/canary_experimental performance improvements, kernel fixes, cpu accuracy improvements	2022-08-21 09:31:09 +02:00
chss95cs@gmail.com	d85bfc1894	Dont constant evaluate MAX with V128! Fix signed zeroes behavior for vmaxfp emulation, was causing a block in sonic to move perpetually, very slowly	2022-08-20 14:22:05 -07:00
Gliniak	010b59e81c	[Emulator] Install Content: Create header for installed packages This fixes support for certain DLCs	2022-08-20 20:44:30 +02:00
Gliniak	469d062a50	[Emulator] Updated "Install Content" function to match PR status	2022-08-20 20:44:30 +02:00
Gliniak	f19cb704aa	[Emulator] Added error checking while creating directories	2022-08-20 20:44:30 +02:00
chss95cs@gmail.com	457296850e	Add OPCODE_NEGATED_MUL_ADD/OPCODE_NEGATED_MUL_SUB Proper handling of nans for VMX max/min on x64 (minps/maxps has special behavior depending on the operand order that vmx does not have for vminfp/vmaxfp) Add extremely unintrusive guest code profiler utilizing KUSER_SHARED systemtime. This profiler is disabled on platforms other than windows, and on windows is disabled by default by a cvar Repurpose GUEST_SCRATCH64 stack offset to instead be for storing guest function profile times, define GUEST_SCRATCH as 0 instead, since thats already meant to be a scratch area Fix xenia silently closing on config errors/other fatal errors by setting has_console_attached_'s default to false Add alternative code path for guest clock that uses kusershared systemtime instead of QueryPerformanceCounter. This is way faster and I have tested it and found it to be working, but i have disabled it because i do not know how well it works on wine or on processors other than mine Significantly reduce log spam by setting XELOGAPU and XELOGGPU to be LogLevel::Debug Changed some LOGI to LOGD in places to reduce log spam Mark VdSwap as kHighFrequency, it was spamming up logs Make logging calls less intrusive for the caller by forcing the test of log level inline and moving the format/AppendLogLine stuff to an outlined cold function Add swcache namespace for software cache operations like prefetches, streaming stores and streaming loads. Add XE_MSVC_REORDER_BARRIER for preventing msvc from propagating a value too close to its store or from its load Add xe_unlikely_mutex for locks we know have very little contention add XE_HOST_CACHE_LINE_SIZE and XE_RESTRICT to platform.h Microoptimization: Changed most uses of size_t to ring_size_t in RingBuffer, this reduces the size of the inlined ringbuffer operations slightly by eliminating rex prefixes, depending on register allocation Add BeginPrefetchedRead to ringbuffer, which prefetches the second range if there is one according to the provided PrefetchTag added inline_loadclock cvar, which will directly use the value of the guest clock from clock.cc in jitted guest code. off by default change uses of GUEST_SCRATCH64 to GUEST_SCRATCH Add fast vectorized xenos_half_to_float/xenos_float_to_half (currently resides in x64_seq_vector, move to gpu code maybe at some point) Add fast x64 codegen for PackFloat16_4/UnpackFloat16_4. Same code can be used for Float16_2 in future commit. This should speed up some games that use these functions heavily Remove cvar for toggling old float16 behavior Add VRSAVE register, support mfspr/mtspr vrsave Add cvar for toggling off codegen for trap instructions and set it to true by default. Add specialized methods to CommandProcessor: WriteRegistersFromMem, WriteRegisterRangeFromRing, and WriteOneRegisterFromRing. These reduce the overall cost of WriteRegister Use a fixed size vmem vector for upload ranges, realloc/memsetting on resize in the inner loop of requestranges was showing up on the profiler (the search in requestranges itself needs work) Rename fixed_vmem_vector to better fit xenia's naming convention Only log unknown register writes in WriteRegister if DEBUG :/. We're stuck on MSVC with c++17 so we have no way of influencing the branch ordering for that function without profile guided optimization Remove binding stride assert in shader_translator.cc, triangle told me its leftover ogl stuff Mark xe::FatalError as noreturn If a controller is not connected, delay by 1.1 seconds before checking if it has been reconnected. Asking Xinput about a controller slot that is unused is extremely slow, and XinputGetState/SetState were taking up an enormous amount of time in profiles. this may have caused a bit of input lag Protect accesses to input_system with a lock Add proper handling for user_index>= 4 in XamInputGetState/SetState, properly return zeroed state in GetState Add missing argument to NtQueryVirtualMemory_entry Fixed RtlCompareMemoryUlong_entry, it actually does not care if the source is misaligned, and for length it aligns down Fixed RtlUpperChar and RtlLowerChar, added a table that has their correct return values precomputed	2022-08-20 11:40:19 -07:00
Gliniak	e06978e5be	[Premake] Cleanup & Fixed references in cpu-tests	2022-08-17 09:43:55 +02:00
Gliniak	0df92130e6	[Memory] Changed amount of kernel reserved pages. This fixes flickering in games with resoultion scaling enabled	2022-08-15 17:51:29 +02:00
chss95cs@gmail.com	7cc364dcb8	squash reallocs in command buffers by using large prealloced buffer, directly use virtual memory with it so os allocs on demand mark raw clock functions as noinline, the way msvc was inlining them and ordering the branches meant that rdtsc would often be speculatively executed add alternative clock impl for win, instead of using queryperformancecounter we grab systemtime from kusershared. it does not have the same precision as queryperformancecounter, we only have 100 nanosecond precision, but we round to milliseconds so it never made sense to use the performance counter in the first place stubbed out the "guest clock mutex"... (the entirety of clock.cc needs a rewrite) added some helpers for minf/maxf without the nan handling behavior	2022-08-14 13:42:08 -07:00
chss95cs@gmail.com	c9b2d10e17	alternative mutex impl on windows works but i really can't tell if it helps much. use larger size in deferred_command_list to cut down on resizes in big scenes on m:dur	2022-08-14 10:26:50 -07:00
chss95cs@gmail.com	08f7a28920	Alternative mutex	2022-08-14 08:59:11 -07:00
chss95cs@gmail.com	495b1f8bc8	once again return to spinloop	2022-08-13 14:05:35 -07:00
chss95cs@gmail.com	c9e4119428	Add branch of ffmpeg with non-recursive split_radix_permutation Add branch of disruptorplus with working blocking_wait_stategy Switch back to blocking wait for timer queue	2022-08-13 13:43:45 -07:00
chss95cs@gmail.com	020d64a1a1	revert to using old bad spinwait, disruptorplus' blocking_wait code does not compile	2022-08-13 13:20:35 -07:00
chss95cs@gmail.com	cb85fe401c	Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90% But for normal msvc builds i would put it at around 30-50% Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger fixed a number of errors where temporaries were being passed by reference/pointer Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes. Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold. Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me. Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases ^this can be toggled off in the platform_win header handle indirect call true with constant function pointer, was occurring in h3 lookup host format swizzle in denser array by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar ^looking up whether its known or not took approx 0.3% cpu time Changed some things in /cpu to make the project UNITYBUILD friendly The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds) Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed added support for docdecaduple precision floating point so that we can represent our performance gains numerically tons of other stuff im probably forgetting	2022-08-13 12:59:00 -07:00
Radosław Gliński	2f59487bf3	Merge pull request #59 from Uraniumm/canary_experimental Add nullptr check in CheckScalarConstCmp	2022-08-08 19:47:35 +02:00
Uraniumm	a16acbaf59	add nullptr check to mitigate crashes wip for reach untracked tags build fixes	2022-08-08 02:02:25 -04:00
chss95cs@gmail.com	324a8eb818	A bunch of fixes for division logic: "turns out theres a lot of quirks with the div instructions we havent been covering if the denom is 0, we jump to the end and mov eax/rax to dst, which is correct because ppc raises no exceptions for divide by 0 unlike x86 except we don't initialize eax before that jump, so whatever garbage from the previous sequence that has been left in eax/rax is what the result of the instruction will be and then in our constant folding, we don't do the same zero check in Value::Div, so if we constant folded the denom to 0 we will host crash the ppc manual says the result for a division by 0 is undefined, but in reality it seems it is always 0 there are a few posts i saw from googling about it, and tests on my rgh gave me 0, but then another issue came up and that is that we dont check for signed overflow in our division, so we raise an exception if guest code ever does (1<<signbit_pos) / -1 signed overflow in division also produces 0 on ppc the last thing is that if src2 is constant we skip the 0 check for division without checking if its nonzero all weird, likely very rare edge cases, except for maybe the signed overflow division chrispy — Today at 9:51 AM oh yeah, and because the int members of constantvalue are all signed ints, we were actually doing signed division always with constant folding" fixed an earlier mistake by me with the precision of fresx made some optimization disableable implemented vkpkx fixed possible bugs with vsr/vsl constant folding disabled the nice imul code for now, there was a bug with int64 version and i dont have time to check started on multiplication/addition/subtraction/division identities Removed optimized VSL implementation, it's going to have to be rewritten anyway Added ppc_ctx_t to xboxkrnl shim for direct context access started working on KeSaveFloatingPointState, re'ed most of it Exposed some more state/functionality to the kernel for implementing lower level routines like the save/restore ones Add cvar to re-enable incorrect mxcsr behavior if a user doesnt care and wants better cpu performance Stubbed out more impossible sequences, replace mul_hi_i32 with a 64 bit multiply	2022-08-07 10:41:26 -07:00
Gliniak	f45e9e5e9a	[Kernel] Improved handling of internal display resolution	2022-08-02 12:09:25 +02:00
Gliniak	0e1353aa71	Implemented Opcode: mcrf	2022-08-01 14:54:05 +02:00
chss95cs@gmail.com	968f656d96	Add separate VMX/fpu mxcsr Add support for constant operands for most fpu instructions Remove constant folding for most fpu cpde half float	2022-07-31 08:56:36 -07:00
Gliniak	5d1b641197	[Emulator] Added possiblity to install multiple packages at once	2022-07-30 15:52:41 +02:00
Gliniak	79ffbe3971	Merge branch 'importContent' of https://github.com/Gliniak/xenia.git into canary_experimental	2022-07-30 12:44:24 +02:00
Gliniak	0e3403d6da	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-30 12:42:51 +02:00
Gliniak	433a8a8a5e	[Emulator] Added option for content installation	2022-07-30 12:41:26 +02:00
Triang3l	7595cdb52b	[Vulkan] Non-GS point sprites + minor SPIR-V fixes	2022-07-27 17:14:28 +03:00
Triang3l	ff7ef05063	[SPIR-V] Clamp cube face using NClamp, not NMax/FMin	2022-07-26 17:08:12 +03:00
Triang3l	66c995f3aa	[SPIR-V] Saturate point sprite coordinates	2022-07-26 17:04:22 +03:00
Triang3l	8fb5da18ea	[Vulkan] Add forgotten fullDrawIndexUint32 check	2022-07-26 16:24:14 +03:00
Triang3l	9fa41c27bc	[Vulkan] Point sprite geometry shader	2022-07-26 16:01:20 +03:00
Gliniak	0c3019981c	[Video] Added option to set internal output resolution	2022-07-26 11:25:03 +02:00
Gliniak	76806e08c5	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-26 10:22:38 +02:00
Triang3l	f248e23079	[DXBC] Skip backface check in point PsParamGen	2022-07-25 21:48:25 +03:00
Triang3l	77e85ecaa4	[Vulkan] 32-bit index fetch without fullDrawIndexUint32	2022-07-25 16:53:12 +03:00
Gliniak	061000af01	[Base] Changed size of bitstream accessed data (Risky) This prevents crashing in situation when buffer_ + offset_bytes is at the end of allocated memory range and can go into unallocated space	2022-07-25 10:52:21 +02:00
Gliniak	364137ef5f	[XAM] Send UI On notification on start of XamShowSigninUI	2022-07-25 10:50:32 +02:00
Gliniak	6730ffb7d3	Merge branch 'canary_experimental' of https://github.com/xenia-canary/xenia-canary into canary_experimental	2022-07-24 17:58:48 +02:00
Gliniak	6e501fbd61	[XAM] Set license mask for DLCs (Thanks Beeanyew)	2022-07-24 17:58:00 +02:00
Gliniak	98c2cb636f	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-24 17:38:08 +02:00
Triang3l	37579d3bf0	[GPU] Treat non-adaptive-tessellated patches as 1-control-point	2022-07-24 17:38:26 +03:00
chss95cs@gmail.com	33a6cfc0a7	Add special cases to DOT_PRODUCT_3/4 that detect whether they're calculating lengthsquared Add alternate path to DOT_PRODUCT_3/4 for use_fast_dot_product that skips all the status register stuff and just remaps inf to qnan Add OPCODE_TO_SINGLE to replace the CONVERT_F32_F64 - CONVERT_F64_F32 sequence we used to emit with the idea that a backend could implement a more correct rounding behavior if possible on its arch Remove some impossible sequences like MUL_HI_I8/I16, MUL_ADD_F32, DIV_V128. These instructions have no equivalent in PPC. Many other instructions are unused/dead code and should be removed to make the x64 backend a better reference for future ones Add backend_flags to Instr. Basically, flags field that a backend can use for whatever it wants when generating code. Add backend instr flag to x64 that tells it to not generate code for an instruction. this allows sequences to consume subsequent instructions Generate actual x64 code for VSL instruction instead of using callnativesafe Detect repeated COMPARE instructions w/ identical operands and reuse the results in FLAGS if so. this eliminates a ton of garbage compare/set instructions. If a COMPARE instructions destination is stored to context with no intervening instruction and no additional uses besides the store, do setx [ctx address] Detect prefetchw and use it in CACHE_CONTROL if prefetch for write is requested instead of doing prefetch to all cache levels Fixed an accident in an earlier commit by me, VECTOR_DENORMFLUSH was not being emitted at all, so denormal inputs to MUL_ADD_V128 were not becoming zero and outputs from DOT_PRODUCT_X were not either. I believe this introduced a bug into RDR where a wagon wouldnt spawn? (https://discord.com/channels/308194948048486401/308207592482668545/1000443975817252874) Compute fresx in double precision using RECIP_F64 and then round to single instead of doing (double)(1.0f / (float)value), matching original behavior better Refactor some of ppc_emit_fpu, much of the InstrEmit function are identical except for whether they round to single or not Added "tail emitters" to X64Emitter. These are callbacks that get invoked with their label and the X64Emitter after the epilog code. This allows us to move cold code out of the critical path and in the future place constant pools near functions guest_to_host_thunk/host_to_guest_thunk now gets directly rel32 called, instead of doing a mov Add X64BackendContext structure, represents data before the start of the PPCContext Instead of doing branchless sequence, do a compare and jump to tail emitted code for address translation. This makes converting addresses a 3 uop affair in most cases. Do qnan move for dot product in a tail emitter Detect whether EFLAGS bits are independent variables for the current cpu (not really detecting it ehe, just checking if zen) and if so generate inc/dec for add/sub 1 Detect whether low 32 bits of membase are 0. If they are then we can use membasereg.cvt32() in place of immediate 0 in many places, particularly in stores Detect LOAD MODIFY STORE pattern for context variables (currently only done for 64 bit ones) and turn them into modify [context ptr]. This is done for add, sub, and, or, xor, not, neg Tail emit error handling for TRAP opcodes Stub out unused trap opcodes like TRAP_TRUE_I32, TRAP_TRUE_I64, TRAP_TRUE_I16 (the call_true/return_true opcodes for these types are also probably unused) Remove BackpropTruncations. It was poorly written and causes crashes on the game Viva pinata (https://discord.com/channels/308194948048486401/701111856600711208/1000249460451983420)	2022-07-23 12:10:07 -07:00
Gliniak	1fcac00924	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-23 13:26:31 +02:00
Triang3l	3c12814276	[GPU] EDRAM looped addressing (resolves #2031 )	2022-07-22 23:51:50 +03:00
Gliniak	0c782ade8e	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-21 18:52:33 +02:00
Triang3l	6ff312afb1	[DXBC] Update PsParamGen comment [ci skip]	2022-07-21 12:42:06 +03:00
Triang3l	1a95bef8b3	[GPU] Eliminate unused shader I/O, UCP culling, centroid on Vulkan For more optimal usage of exports and the parameter cache on the host regardless of how effective the optimizations in the host GPU driver are. Also reserve space for Vulkan/Metal/D3D11-specific HostVertexShaderTypes to use one more bit for the host vertex shader type in the shader modification bits, so that won't have to be done in the future as that would require invalidating shader storages (which are invalidated by this commit) again.	2022-07-21 12:32:28 +03:00
Gliniak	0f60e23208	[Kernel] Removed input change notifications from initial notify list	2022-07-19 10:46:36 +02:00
Gliniak	bc315d21e0	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-19 10:45:14 +02:00
Triang3l	0a94b86cb8	[GPU] Remove orphaned GetPresentArea declaration [ci skip]	2022-07-18 21:02:34 +03:00
Gliniak	57b514ea6a	Removed (again) unnecessary include	2022-07-18 09:40:45 +02:00
Radosław Gliński	3757580f45	Merge pull request #52 from chrisps/canary_experimental Fix previous batch of CPU changes	2022-07-18 09:20:35 +02:00
Gliniak	fd78ab4dfc	[Patcher] Allow loading patches from non-utf8 paths	2022-07-18 08:46:04 +02:00
chss95cs@gmail.com	11817f0a3b	vshufps accident broke things, this fixes	2022-07-17 14:44:09 -07:00
Gliniak	6e1e62378f	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-17 21:27:52 +02:00
Triang3l	14fdf4b270	[GPU] Up to 7x7 resolution scaling	2022-07-17 20:41:50 +03:00
chss95cs@gmail.com	3717167bbe	Preload ThreeFloatMask in DOT_PRODUCT_3 Use shuffle_ps instead of broadcastss, broadcastss is slower on many intel and amd processors and encodes to the same number of bytes as shuffle_ps Detect and optimize away PERMUTE with a zero src2 and src3 in constant_propagation_pass instead of in the x64 sequence For constant PERMUTE, do the Xor/And prior to LoadConstantXmm instead of in the generated code Simplified code for PERMUTE Added simplification rule that detects (lzcnt(x) >> log2(bitsizeof_x)) == ( x == 0) Added set_srcN(value, idx) which can be used to set the nth source of an instruction, which makes more sense than having three different functions that only differ by the field they touch Added Value::VisitValueOperands for iterating all Value operands an instruction has. Add BackpropTruncations code to simplification_pass Changed the (void**) dereferences of raw_context that are done to grab thread_state to instead reference PPCContext and the thread_state field. Moved the thread_state field to the tail of PPCContext. Moved membase to the tail of PPCContext, since now it is reloaded very infrequently. Rearranged PPCContext so that the condition registers come first (most accesses to them cant get SSA'd), moved lr and ctr to after gp regs since they are not accessed as much as the main gpregs. This way the most frequently accessed registers will be accessible via a rel8 displacement instead of rel32 (ideally, we would have only certain CRs at the start, but xenia does pointer arithmetic on CR0's offset to get CRn) Use alignas(64) to ensure PPCContext's padding Map PPCContext specially so that the low 32 bits of the context register is 0xE0000000, for the 4k page offset check. Also allocate the page before, so that backends can store their own information that is not relevant to the PPCContext on that page and reference that data in the generated asm via 8-bit signed displ or 32-bit signed displ. Currently this page is not being utilized, but I plan on stashing some data critical to the x86 backend there Changed many wrong avx instructions, they worked but they were not intended for the data they operated on, meaning they transferred domains and caused 1-2 cycle stall each time Added SimdDomain checking/deduction to X64Emitter. Used SimdDomain code to fix a lot of float/int domain stalls Use the low 32 bits of the context register instead of constant 0xE0000000 in ComputeAddress Special path for SELECT_V128 with result of comparison that will use a blend instruction instead of and/or Many HIR optimizations added in simp pass A bunch of other stuff running out of time to write this msg	2022-07-17 09:52:40 -07:00
Triang3l	e8652e544a	[GPU] Translucent trace viewer controls	2022-07-17 17:29:41 +03:00
Triang3l	25663827ba	[GPU] Trace viewer Android content URI loading	2022-07-17 16:37:49 +03:00
Triang3l	624f2b2d9e	[Base] Android content URI file memory mapping	2022-07-17 16:34:17 +03:00
Triang3l	93a7918025	[Base] Android content URI file descriptor opening	2022-07-17 16:25:58 +03:00
Triang3l	34a952d789	[Base] Wrap strdup and strcasecmp in xe:: functions	2022-07-17 16:14:29 +03:00
chss95cs@gmail.com	6a612b4d34	remove useless tag field from hir::Value pack local_slot and constant in hir::Value Instead of loading membase at the start of every function, just load it in HostToGuestThunk vzeroupper in GuestToHostThunk before calling host function, and in HostToGuestThunk after calling function to prevent AVX dirty state slowdowns. In the future, check if CPU implements AVX as 128x2 and skip if so (https://john-h-k.github.io/VexTransitionPenalties.html) Remove useless save/restore of ctx pointer, nothing modifies it and it prevents cpus from doing cross-function memory renaming (https://www.agner.org/forum/viewtopic.php?t=41). Could not remove the space on stack because of alignment issues, instead turned it into GUEST_SCRATCH64 which is a temporary that sequences may use Reorder OpcodeInfo so that name is at offset 0, remove name and add GetOpcodeName function (name is only used for debug code, we are seperating frequently accessed data and rarely accessed data) Add VECTOR_DENORMFLUSH opcode for handling output to DOT_PRODUCT and other opcodes that implicitly force denormal inputs/outputs to zero, will eventually use for implementing NJM Rewrite sequences for LOAD_VECTOR_SHL/SHR. The mask with 0xf in it was pointless as all InstrEmit_ functions that create the load shift instructions do that in HIR. The tables are only used for nonzero constant inputs now, which are probably pretty rare. Instead of doing a shift and lookup, a base value is used for both in the constant table and adding/subtracting of the input is done Reuse result of LoadVectorShl/Shr in InstrEmit_stvlx_, InstrEmit_stvrx_. We were previously calculating it twice which was contributing to the final sequences' fatness. Use OPCODE_SELECT instead of the sequence of or, andnot, and that it was using for merging Add the proper unconditional denormal input flushing behavior to vfmadd, add it also to vfmsub (making the assumption it has the same behavior) Remove constant propagation for DOT_PRODUCT_3/4 DOT_PRODUCT_3/4 now returns a vector with all four elements set to the result. (what we were doing before, truncating to float32 and then splatting didnt make any sense) Add much more correct versions of DOT_PRODUCT_3/4, matching the Xb360's to 1 bit. Still needs work to be a perfect emulation. Add constant folding for OPCODE_SELECT, OPCODE_INSERT, OPCODE_PERMUTE, OPCODE_SWIZZLE Remove constant folding for DOT_PRODUCT Removed the multibyte nop code I committed earlier, it doesnt help us much because nops are only used for debug stuff and its ugly and wouldnt survive in a pr to main Check for AVX512BMI, use vpermb to shuffle if supported	2022-07-16 10:25:04 -07:00
Triang3l	500bbe9e0d	[Base] Use to_path for Android path argument loading	2022-07-16 13:42:04 +03:00
Triang3l	373b143049	[Base] Cvars from Android Bundle/Intent	2022-07-16 13:13:08 +03:00
chss95cs@gmail.com	71c5f8f0fa	Optimized GetScalarNZM, add limit to how far it can recurse. Add rlwinm elimination rule	2022-07-14 14:32:14 -07:00
Triang3l	415750252b	[Base] PosixMappedMemory: Close, Flush	2022-07-14 22:51:07 +03:00
Triang3l	65137e58bd	[Base] PosixMappedMemory: fd instead of stdio Android ContentResolver, which is needed for content:// URIs, provides file descriptors rather than stdio files	2022-07-14 22:11:46 +03:00
Triang3l	9fd63519bf	[Base] Make MappedMemory non-copyable	2022-07-14 22:04:06 +03:00
Triang3l	2a69d1db4d	[Vulkan] Fix a typo in a comment about BC textures [ci skip]	2022-07-14 21:16:23 +03:00
Triang3l	7b8281aee0	[UI] Android ImGui touch and mouse input	2022-07-14 21:13:40 +03:00
Triang3l	037310f8dc	[Android] Unified xenia-app with windowed apps and build prerequisites	2022-07-11 21:45:57 +03:00
Gliniak	1d00372e6b	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-10 10:50:39 +02:00
Triang3l	b41bb35a20	[SPIR-V] Make interpolators an array to fix Adreno linkage	2022-07-09 17:52:26 +03:00
Triang3l	b3edc56576	[Vulkan] Merge texture and sampler descriptors into a single descriptor set Put all descriptors used by translated shaders in up to 4 descriptor sets, which is the minimum required, and the most common on Android, `maxBoundDescriptorSets` device limit value	2022-07-09 17:10:28 +03:00
Gliniak	d33be73f3d	Fixed crash caused by hash calculation in specific cases	2022-07-08 08:49:43 +02:00
Triang3l	e4de8663c4	[Vulkan] All guest draw uniform buffer bindings in a single descriptor set Reduce the number of bound descriptor sets from 10 to 6, which is still above the minimum limit of 4, but closer	2022-07-07 21:05:56 +03:00
Triang3l	88c055eb30	[CPU] Null backend enough for GPU trace viewing	2022-07-06 23:28:06 +03:00
Triang3l	3ee68d79ea	Revert "[GPU] Make Processor optional for GraphicsSystem setup" The Processor is still required in many places, including the GPU command processor worker thread This reverts commit `fd03d886e9`.	2022-07-06 22:43:40 +03:00
Triang3l	6852e54937	[CPU] Remove intrinsics from dot product constant propagation	2022-07-06 21:32:56 +03:00
Triang3l	326e718035	[CPU] MMIO: Arm64, load register writes + exception cleanup	2022-07-06 21:05:05 +03:00
Triang3l	fd03d886e9	[GPU] Make Processor optional for GraphicsSystem setup	2022-07-05 21:21:22 +03:00
Triang3l	bdfd410b13	[CPU] Cleanup x64 backend usage conditionals	2022-07-05 21:07:10 +03:00
Triang3l	d263d508cd	[GPU] Make operator< const	2022-07-05 20:47:53 +03:00
Triang3l	536f14d94c	[GPU] Fix a typo in a Neon intrinsic name	2022-07-05 20:47:34 +03:00
Triang3l	d51fafd07c	[Base] Linux Arm64 exception handler	2022-07-05 20:46:49 +03:00
Triang3l	40aa73f7d7	[Linux] Swap read/write in x64 page fault handler + exception code cleanup	2022-07-04 23:51:26 +03:00
Triang3l	a9cbd9cc5f	[Linux] Update RIP after handling an exception	2022-07-04 23:24:26 +03:00
uytvbn	54aac81268	[Linux] Implement exception handler	2022-07-04 23:04:27 +03:00
Triang3l	35d4ea59c6	[Base] Remove exception_handler_linux.cc	2022-07-04 23:02:11 +03:00
Triang3l	feaad639fb	[Vulkan] Destroy all RTs before VulkanRenderTargetCache is destroyed	2022-07-04 11:27:51 +03:00
Gliniak	6e753c6399	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-04 08:11:04 +02:00
Triang3l	2621dabf0f	[Vulkan] Native 24-bit unorm depth where available	2022-07-03 21:21:17 +03:00
Triang3l	83e9984539	[Vulkan] Remove required feature checks Fallbacks for those will be added more or less soon, the stable version won't hard-require anything beyond 1.0 and the portability subset	2022-07-03 20:54:34 +03:00
Triang3l	bbae909fd7	[GPU] Reasons to keep non-Vulkan backends [ci skip]	2022-07-03 20:39:44 +03:00
Triang3l	ed61e15fc3	[App] Make D3D12 the default GPU backend on Windows again	2022-07-03 19:49:11 +03:00
Triang3l	ee84f4e267	[Vulkan] Update title bar warning	2022-07-03 19:45:48 +03:00
Triang3l	f7ef051025	[Vulkan] Disable validation by default	2022-07-03 19:42:22 +03:00
Triang3l	001f64852c	[Vulkan] VMA for textures	2022-07-03 19:40:48 +03:00
Gliniak	a8df744ea6	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-03 13:07:15 +02:00
Triang3l	636585e0aa	[Vulkan] Trace viewer	2022-07-01 19:53:41 +03:00
Triang3l	ad1ef84145	Merge branch 'master' into vulkan	2022-07-01 19:53:08 +03:00
Triang3l	e37e3ef382	[GPU] Display swap output in the trace viewer Resolve output is unreliable because resolving may be done to a subregion of a texture and even to 3D textures, and to any color format	2022-07-01 19:50:19 +03:00
Triang3l	c8a4a9504f	[Vulkan] Remove an unneeded scale from RefreshGuestOutput aspect ratio	2022-07-01 12:52:12 +03:00
Triang3l	d174762a40	Merge branch 'master' into vulkan	2022-07-01 12:51:34 +03:00
Triang3l	28670d8ec2	[UI] Presenter: Rename display size to aspect ratio	2022-07-01 12:50:45 +03:00
Triang3l	f8b351138e	[Vulkan] Alpha test	2022-06-30 22:20:51 +03:00
Triang3l	6772c88141	Merge branch 'master' into vulkan	2022-06-30 22:15:29 +03:00
Triang3l	7e691d5ef1	[DXBC] Handle NaN in not equal alpha test as passed	2022-06-30 22:15:01 +03:00
Triang3l	c0c3666e12	[Vulkan] Align texture extents in loading to vector size accessed by the shader Fixes loading of the 1x1 linear 8_8_8_8 texture containing just a single #FFFFFFFF texel in 4D5307E6, which is used for screen fade and the lobby map loading bar background	2022-06-29 23:41:32 +03:00
Triang3l	9392fff369	Merge branch 'master' into vulkan	2022-06-29 23:39:54 +03:00
Triang3l	a11b070fee	[GPU] Align texture extents in loading to host buffer texel size accessed by the shader	2022-06-29 23:38:06 +03:00
Triang3l	7c2df55209	[Vulkan] Cache clear: shared memory, scratch buffer	2022-06-29 13:24:45 +03:00
Triang3l	d5815d9e6a	[Vulkan] Float24 depth range remapping fixes	2022-06-29 13:14:00 +03:00
Gliniak	efe3cd96d6	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-29 09:21:09 +02:00
Triang3l	05ef7a273a	[Vulkan] Samplers (only 1.0 core features for now)	2022-06-28 22:42:18 +03:00
Triang3l	5d9061cf99	Merge branch 'master' into vulkan	2022-06-28 22:05:45 +03:00
Triang3l	243683d2e9	[GPU] Cleanup Texture::MarkAsUsed conditionals	2022-06-28 22:04:26 +03:00
Triang3l	382710bab7	[GPU] Normalize sampler clamp modes	2022-06-28 21:58:58 +03:00
Triang3l	cedc94679b	[GPU] Don't drop the rest of the command list if IssueDraw fails	2022-06-28 21:40:06 +03:00
chss95cs@gmail.com	3c06921cd4	Added optimizations for combining conditions together when their results are OR'ed Added recognition of impossible comparisons via NZM and optimize them away Recognize (x + -y) and transform to (x - y) for constants Recognize (~x ) + 1 and transform to -x Check and transform comparisons if theyre semantically equal to others Detect comparisons of single-bit values with their only possible non-zero value and transform to true/false tests Transform ==0 to IS_FALSE, !=0 to IS_TRUE Truncate to int8 if operand for IS_TRUE/IS_FALSE has a nzm of 1 Reduced code generated for SubDidCarry slightly Add special case for InstrEmit_srawix if mask == 1 Cut down the code generated for trap instructions, instead of naive or'ing or compare results do a switch and select the best condition Rerun simplification pass until no changes, as some optimizations will enable others to be done Enable rel32 call optimization by default	2022-06-26 12:49:04 -07:00
Gliniak	e6898fda66	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-26 20:11:33 +02:00
chrisps	08232de8cc	patch a mistake in NZM calculation for OPCODE_NOT	2022-06-26 09:30:56 -07:00
Triang3l	9672230d9f	Merge branch 'master' into vulkan	2022-06-26 18:59:49 +03:00
Triang3l	ec008463b6	[GPU] CrYCb/YCrCb border colors	2022-06-26 18:56:50 +03:00
Triang3l	2606fa5709	[GPU] Apply BaseMap MipFilter via samplers as it may be overridden Make it have no effect on the texture resource as a resource may be used with samplers with different overrides. Also make sure magnification vs. minification is not undefined with it on Direct3D 12.	2022-06-26 18:41:38 +03:00
Triang3l	e191430091	Merge branch 'master' into vulkan	2022-06-26 16:58:27 +03:00
Triang3l	086a070fa9	[GPU] Explicitly cast bit field values in std::min/max According to the integral promotion rules https://eel.is/c++draft/conv.prom#5.sentence-1 bit fields can be promoted to `int` if it's wide enough to store their value, and then otherwise, to `unsigned int`. Hopefully fixes Clang building (the `width_div_8` case).	2022-06-26 16:54:11 +03:00
Triang3l	e0b890fe5c	[DXBC] Remove alphatest/A2C with [earlydepthstencil]	2022-06-26 15:31:08 +03:00
Triang3l	6688b13773	[Vulkan] PsParamGen	2022-06-26 15:01:27 +03:00
Triang3l	a99a1be880	Merge branch 'master' into vulkan	2022-06-26 15:00:21 +03:00
Triang3l	b787f2dec1	[GPU] GPR count limit is 128, not 64	2022-06-26 14:45:49 +03:00
Triang3l	a5c8df7a37	[Vulkan] Remove UB-based independent blend logic On Vulkan, unlike Direct3D, not writing to a color target in the fragment shader produces an undefined result.	2022-06-25 20:57:44 +03:00
Triang3l	d8b2944caa	[Vulkan] Handle unsupported fillModeNonSolid + fix portability subset feature checks	2022-06-25 20:46:52 +03:00
Triang3l	d30d59883a	[Vulkan] Color exponent bias and gamma conversion	2022-06-25 20:35:13 +03:00
Triang3l	b1be33004a	Merge branch 'master' into vulkan	2022-06-25 20:31:26 +03:00
Triang3l	4812b4ba8b	[D3D12] Fix outdated color system constants comment [ci skip]	2022-06-25 20:31:05 +03:00
chss95cs@gmail.com	327cc9eff5	drastically reduce size of final generated code for rlwinm by adding special paths for rotations of 0, masks that discard the rotated bits and using And w/ UINT_MAX instead of truncate/zero extend Add special case to TYPE_INT64's EmitAnd for UINT_MAX mask. Do mov32 to 32 if detected to take advantage of implicit zero xt/reg renaming Add helper function for skipping assignment defs in instr. Add helper function for checking if an opcode is binary value type Add several new optimizations to simplificationpass, plus weak NZM calculation code (better full evaluation of Z/NZ will be done later) . List of optimizations: If a value is anded with a bitmask that it was already masked against, reuse the old value (this cuts out most FPSCR update garbage, although it does cause a local variable to be allocated for the masked FPSCR and it still repeatedly stores the masked value to the context) If masking a value that was or'ed against another check whether our mask only considers bits from one value or another. if so, change the operand to the OR input that actually matters If the only usage of a rotate left's output is an AND against a mask that discards the bits that were rotated in change the opcode to SHIFT_LEFT If masking against all ones, become an assign. If XOR or OR against 0, become an assign (additional FPSCR codegen cleanup) If XOR against all ones, become a NOT Adding a direct CPUID check to x64_emitter for lzcnt, the version of xbyak we are using is skipping checking for lzcnt on all non-intel cpus, meaning we are generating the much slower bitscan path for AMD cpus.	2022-06-25 09:58:13 -07:00
Triang3l	5dca11a892	[SPIR-V] Fix fetch constant LOD bias signedness	2022-06-25 16:33:35 +03:00
Triang3l	d8b0227cbd	[SPIR-V] Fix cubemap X axis	2022-06-25 16:25:29 +03:00
Triang3l	fdcbf67623	[Vulkan] Enable VK_KHR_sampler_ycbcr_conversion	2022-06-25 15:46:02 +03:00
Triang3l	758db4ccb3	[Vulkan] Fix textures not loaded if using a shader for the first time	2022-06-25 15:15:06 +03:00
Triang3l	4db445c6f9	Merge branch 'master' into vulkan	2022-06-25 15:13:41 +03:00
Triang3l	aa45d7b47d	[D3D12] More descriptive pipeline creation call comment [ci skip]	2022-06-25 15:13:11 +03:00
Triang3l	c37c05d189	[Vulkan] Remove an outdated fullscreen shader comment [ci skip]	2022-06-25 14:35:15 +03:00
Triang3l	4b4205ba00	[Vulkan] Frontbuffer presentation	2022-06-25 14:33:43 +03:00
Triang3l	3fc7d8753c	Merge branch 'master' into vulkan	2022-06-24 23:38:04 +03:00
Triang3l	f4a634c617	[XeSL] xesl_writeStore > xesl_Store	2022-06-24 23:37:29 +03:00
Triang3l	7a4732e14f	[GPU] XeSL swap shaders	2022-06-24 23:24:30 +03:00
Gliniak	2b3686f0e9	[XAM] Set profile setting 'from' entry accordingly to setting existence	2022-06-24 10:10:52 +02:00
Triang3l	b7737d70ca	[D3D12] Update RequestSwapTexture resource state comment [ci skip]	2022-06-23 22:59:53 +03:00
Gliniak	ce3b159683	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-22 21:05:45 +02:00
Triang3l	227d495738	Merge branch 'master' into vulkan	2022-06-22 21:19:29 +03:00
Triang3l	e9f129f67f	[GPU] Safer and more correct depth bias conversion Float24-as-float32 depth bias is now in the increments of 8, because conversion of the depth to float24 directly in the pixel shaders may destroy the bias qualitatively otherwise if it's too small.	2022-06-22 21:14:40 +03:00
Triang3l	a7885ae1a4	[GPU] Fix CPU-side float24 conversion broken recently	2022-06-22 20:47:44 +03:00
Triang3l	4514050f55	[Vulkan] Truncate depth to float24 in EDRAM range ownership transfers and resolves by default Doesn't ruin the "greater or equal" depth test in subsequent rendering passes if precision is lost, unlike rounding to the nearest	2022-06-22 13:25:06 +03:00
Gliniak	e7a122d943	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-22 12:18:13 +02:00
Triang3l	0d8bd0e0c6	Merge branch 'master' into vulkan	2022-06-22 13:15:50 +03:00
Triang3l	cbf0476d42	[D3D12] Don't round float24 depth when it's known to be exact	2022-06-22 13:14:38 +03:00
Gliniak	83269315d8	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-22 12:06:42 +02:00
Triang3l	7869b080d3	[D3D12] Truncate depth to float24 in EDRAM range ownership transfers and resolves by default Doesn't ruin the "greater or equal" depth test in subsequent rendering passes if precision is lost, unlike rounding to the nearest	2022-06-22 12:53:09 +03:00
Gliniak	87fd772393	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-21 07:54:44 +02:00
chss95cs@gmail.com	549ee28a93	ome guest function calls can now be resolved and embedded directly in the emitted asm as rel32 calls. Disabled by default, enabled via resolve_rel32_guest_calls detect whether cpu has fast jrcxz, fast loop/loope/loopne much more thorough LoadConstantXMM New cvar elide_e0_check that allows the backend to assume accesses via the SP or TLS register will not cross into 0xe0 range Add x64 codegen for Vector shift uint8 If has fast jrcxz use for some traptrue/breaktrue instructions Use phat nops Add cvar use_fast_dot_product, which uses a four instruction sequence for both dot product instructions which ought to be equivalent. disabled by default.	2022-06-20 15:08:18 -07:00
Triang3l	c0703e64db	Merge branch 'master' into vulkan	2022-06-20 22:40:19 +03:00
Triang3l	e2f632f8fa	[D3D12] Use udiv by constant tile size + minor transfer cleanup Drivers compile that to a multiplication and a shift anyway.	2022-06-20 22:39:30 +03:00
Triang3l	0dc480721f	[Vulkan] Render target resolving	2022-06-20 22:29:07 +03:00
Triang3l	c6ec6d8239	[Vulkan] Use UDiv/UMod by constant tile size + minor transfer cleanup Drivers compile that to a multiplication and a shift anyway.	2022-06-20 22:24:07 +03:00
Gliniak	a4ff64c465	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-20 21:07:32 +02:00
Triang3l	61c4c49d76	Merge branch 'master' into vulkan	2022-06-20 12:34:41 +03:00
Triang3l	207e11c8d2	[GPU] Separate range arguments for fixed16 RG and RGBA in GetResolveInfo On Vulkan, when snorm16 in unsupported, these formats may be emulated as float16, which natively can represent a wide range of numbers including -32 to 32 with blending. However, R16G16_SNORM and R16G16B16A16_SNORM are two separate formats, which may have different support on the device.	2022-06-20 12:29:45 +03:00
Triang3l	3b4845511d	[Vulkan] Don't require an explicit uint64_t cast for SetDeviceObjectName	2022-06-20 12:25:52 +03:00

... 3 4 5 6 7 ...

6437 Commits