xenia-canary

Commit Graph

Author	SHA1	Message	Date
chss95cs@gmail.com	90c771526d	"Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false Remove dead #if 0'd code in math.h On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe and gives a modest code size reduction Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds. In some games seemingly random crashes were happening that were hard to trace because the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread. Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted. Changes were also made to allow the guest to call into a piece of an existing x64 function. This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false. It is possible it may have introduced regressions, but i dont know of any yet So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame. Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function add Processor::LookupModule Add Backend::DeinitializeBackendContext Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0) add notes about flags that trap in XamInputGetCapabilities 0 == 3 in XamInputGetCapabilities Name arg 2 of XamInputSetState PrefetchW in critical section kernel funcs if available & doing cmpxchg Add terminated field to X_KTHREAD, set it on termination Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type) and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread. Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?) Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen Use page_size_shift in more places Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues "Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false Remove dead #if 0'd code in math.h On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe and gives a modest code size reduction Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds. In some games seemingly random crashes were happening that were hard to trace because the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread. Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted. Changes were also made to allow the guest to call into a piece of an existing x64 function. This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false. It is possible it may have introduced regressions, but i dont know of any yet So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame. Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function add Processor::LookupModule Add Backend::DeinitializeBackendContext Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0) add notes about flags that trap in XamInputGetCapabilities 0 == 3 in XamInputGetCapabilities Name arg 2 of XamInputSetState PrefetchW in critical section kernel funcs if available & doing cmpxchg Add terminated field to X_KTHREAD, set it on termination Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type) and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread. Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?) Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen Use page_size_shift in more places Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues	2022-11-27 09:39:33 -08:00
chss95cs@gmail.com	c1d922eebf	Minor decoder optimizations, kernel fixes, cpu backend fixes	2022-11-05 10:50:33 -07:00
chss95cs@gmail.com	eb8154908c	atomic cas use prefetchw if available remove useless memorybarrier remove double membarrier in wait pm4 cmd add int64 cvar use int64 cvar for x64 feature mask Rework some functions that were frontend bound according to vtune placing some of their code in different noinline functions, profiling after indicating l1 cache misses decreased and perf of func increased remove long vpinsrd dep chain code for conversion.h, instead do normal load+bswap or movbe if avail Much faster entry table via split_map, code size could be improved though GetResolveInfo was very large and had impact on icache, mark callees as noinline + msvc pragma optimize small use log2 shifts instead of integer divides in memory minor optimizations in PhysicalHeap::EnableAccessCallbacks, the majority of time in the function is spent looping, NOT calling Protect! Someone should optimize this function and rework the algo completely remove wonky scheduling log message, it was spammy and unhelpful lock count was unnecessary for criticalsection mutex, criticalsection is already a recursive mutex brief notes i gotta run	2022-09-17 04:04:53 -07:00
chss95cs@gmail.com	20638c2e61	use Sleep(0) instead of SwitchToThread, should waste less power and help the os with scheduling. PM4 buffer handling made a virtual member of commandprocessor, place the implementation/declaration into reusable macro files. this is probably the biggest boost here. Optimized SET_CONSTANT/ LOAD_CONSTANT pm4 ops based on the register range they start writing at, this was also a nice boost Expose X64 extension flags to code outside of x64 backend, so we can detect and use things like avx512, xop, avx2, etc in normal code Add freelists for HIR structures to try to reduce the number of last level cache misses during optimization (currently disabled... fixme later) Analyzed PGO feedback and reordered branches, uninlined functions, moved code out into different functions based on info from it in the PM4 functions, this gave like a 2% boost at best. Added support for the db16cyc opcode, which is used often in xb360 spinlocks. before it was just being translated to nop, now on x64 we translate it to _mm_pause but may change that in the future to reduce cpu time wasted texture util - all our divisors were powers of 2, instead we look up a shift. this made texture scaling slightly faster, more so on intel processors which seem to be worse at int divs. GetGuestTextureLayout is now a little faster, although it is still one of the heaviest functions in the emulator when scaling is on. xe_unlikely_mutex was not a good choice for the guest clock lock, (running theory) on intel processors another thread may take a significant time to update the clock? maybe because of the uint64 division? really not sure, but switched it to xe_mutex. This fixed audio stutter that i had introduced to 1 or 2 games, fixed performance on that n64 rare game with the monkeys. Took another crack at DMA implementation, another failure. Instead of passing as a parameter, keep the ringbuffer reader as the first member of commandprocessor so it can be accessed through this Added macro for noalias Applied noalias to Memory::LookupHeap. This reduced the size of the executable by 7 kb. Reworked kernel shim template, this shaved like 100kb off the exe and eliminated the indirect calls from the shim to the actual implementation. We still unconditionally generate string representations of kernel calls though :(, unless it is kHighFrequency Add nvapi extensions support, currently unused. Will use CPUVISIBLE memory at some point Inserted prefetches in a few places based on feedback from vtune. Add native implementation of SHA int8 if all elements are the same Vectorized comparisons for SetViewport, SetScissorRect Vectorized ranged comparisons for WriteRegister Add XE_MSVC_ASSUME Move FormatInfo::name out of the structure, instead look up the name in a different table. Debug related data and critical runtime data are best kept apart Templated UpdateSystemConstantValues based on ROV/RTV and primitive_polygonal Add ArchFloatMask functions, these are for storing the results of floating point comparisons without doing costly float->int pipeline transfers (vucomiss/setb) Use floatmasks in UpdateSystemConstantValues for checking if dirty, only transfer to int at end of function. Instead of dirty \|= (x == y) in UpdateSystemConstantValues, now we do dirty_u32 \|= (x^y). if any of them are not equal, dirty_u32 will be nz, else if theyre all equal it will be zero. This is more friendly to register renaming and the lack of dependencies on EFLAGS lets the compiler reorder better Add PrefetchSamplerParameters to D3D12TextureCache use PrefetchSamplerParameters in UpdateBindings to eliminate cache misses that vtune detected Add PrefetchTextureBinding to D3D12TextureCache Prefetch texture bindings to get rid of more misses vtune detected (more accesses out of order with random strides) Rewrote DMAC, still terrible though and have disabled it for now. Replace tiny memcmp of 6 U64 in render_target_cache with inline loop, msvc fails to make it a loop and instead does a thunk to their memcmp function, which is optimized for larger sizes PrefetchTextureBinding in AreActiveTextureSRVKeysUpToDate Replace memcmp calls for pipelinedescription with handwritten cmp Directly write some registers that dont have special handling in PM4 functions Changed EstimateMaxY to try to eliminate mispredictions that vtune was reporting, msvc ended up turning the changed code into a series of blends in ExecutePacketType3_EVENT_WRITE_EXT, instead of writing extents to an array on the stack and then doing xe_copy_and_swap_16 of the data to its dest, pre-swap each constant and then store those. msvc manages to unroll that into wider stores stop logging XE_SWAP every time we receive XE_SWAP, stop logging the start and end of each viz query Prefetch watch nodes in FireWatches based on feedback from vtune Removed dead code from texture_info.cc NOINLINE on GpuSwap, PGO builds did it so we should too.	2022-09-11 14:14:48 -07:00
chss95cs@gmail.com	08f7a28920	Alternative mutex	2022-08-14 08:59:11 -07:00
chss95cs@gmail.com	cb85fe401c	Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90% But for normal msvc builds i would put it at around 30-50% Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger fixed a number of errors where temporaries were being passed by reference/pointer Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes. Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold. Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me. Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases ^this can be toggled off in the platform_win header handle indirect call true with constant function pointer, was occurring in h3 lookup host format swizzle in denser array by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar ^looking up whether its known or not took approx 0.3% cpu time Changed some things in /cpu to make the project UNITYBUILD friendly The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds) Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed added support for docdecaduple precision floating point so that we can represent our performance gains numerically tons of other stuff im probably forgetting	2022-08-13 12:59:00 -07:00
Gliniak	1e369afa3d	[Memory] Allocate system heap memory from bottom of heap last quarter Aka. From 0x30000000	2022-06-17 22:23:39 +02:00
Gliniak	3d96dfa359	Always allocate system heap from top of heap	2022-05-25 07:53:50 +02:00
Gliniak	af806ee98f	Allocate guest objects in last quarter of memory heap	2022-05-22 13:08:47 +02:00
Gliniak	6c6c5ac14b	Merge remote-tracking branch 'GliniakRepo/experimentals' into canary_experimental	2022-05-19 10:51:44 +02:00
Gliniak	b237b71031	Merge remote-tracking branch 'GliniakRepo/memory_stats' into canary_pr	2022-05-19 10:03:29 +02:00
Gliniak	498dde6e1a	Limit unspecified virtual allocation only to 3/4 of heap	2022-01-31 20:12:34 +01:00
Gliniak	c4d64a0501	QueryRegionInfo: Adjust allocation_base to contain heap address	2022-01-31 20:12:24 +01:00
Triang3l	e720e0a540	[Code] Remove game names from code comments (most of at least)	2021-09-05 21:27:40 +03:00
Gliniak	35321a10c3	[Kernel] Improvements to MmQueryStatistics - Fixed incorrect calculation of available pages - Changed amount of total virtual bytes - Added real amount of reserved virtual bytes - Removed unused methods	2021-07-15 09:45:35 +02:00
Triang3l	a73592c2ef	[Memory/CPU] UWP: Support separate code execution and write memory, FromApp functions + other Windows memory fixes	2020-11-24 22:18:50 +03:00
Gliniak	a6868d1f8a	[Memory] Removed redundant BaseHeap::IsGuestPhysicalHeap	2020-11-22 15:43:53 -06:00
Gliniak	c071500ff4	[Base] Specify heap type on initialization	2020-11-22 15:43:53 -06:00
Triang3l	86ae42919d	[Memory] Close shared memory FD and properly handle its invalid value	2020-11-22 14:17:37 +03:00
Sandy Carter	49e194009b	[memory linux] Properly unlink shared memory shm_unlink(name) is the proper way to close a shared memory in linux. Prior to this, xenia was creating and not cleaning up shared memory handle which would accumulate in /dev/shm. shm_unlink is the proper way of doing this. Add filename to CloseFileMappingHandle signature. Add simple test to open and close.	2020-11-22 13:54:00 +03:00
Sandy Carter	2c7009ca80	[memory] Move "Local\\" prefix to win impl CreateFileMappingHandle now takes shared memory name without a prefix. The doc of shm_open recommends not using slashes and prefixing with "/". The prefixing has been moved to the os implementation layer. Invocations of CreateFileMappingHandle were all using "Local\\" so these prefixes were removed.	2020-11-22 13:54:00 +03:00
Triang3l	52efbcf741	[Memory] Fix Protect range calculation	2020-09-01 12:44:37 +03:00
gibbed	a48bb71c2f	Overhaul logging.	2020-04-07 16:09:41 -05:00
gibbed	5bf0b34445	C++17ification. C++17ification! - Filesystem interaction now uses std::filesystem::path. - Usage of const char*, std::string have been changed to std::string_view where appropriate. - Usage of printf-style functions changed to use fmt.	2020-04-07 16:09:41 -05:00
Triang3l	c156616103	[Memory] Invalidate physical memory in Release/Decommit (#1559 )	2020-02-24 01:04:30 +03:00
Triang3l	d156c3275d	[Memory] Fix incorrect comparison in QueryRangeAccess	2020-02-22 18:12:46 +03:00
Triang3l	f858631245	[Kernel] Trigger memory callbacks after file read	2020-02-22 18:06:56 +03:00
Triang3l	028c784c5c	[Memory] Make heap_size actually mean size rather than high address	2020-02-22 14:55:28 +03:00
Triang3l	8ec813de82	[Memory, D3D12] Various refactoring from data provider development	2020-02-15 21:35:24 +03:00
Triang3l	8ba6f3fc37	[Memory] Trigger watches when making pages writable, not the other way around	2019-11-10 14:21:36 +03:00
Triang3l	7e6bf8022f	[Memory] Refactor GetPhysicalAddress and use it for XMA, resolve #1448	2019-08-24 17:42:06 +03:00
Triang3l	e35c609224	Revert "[APU] Temp XMA context allocation region workaround" This reverts commit `968c337d22`.	2019-08-16 21:11:55 +03:00
Triang3l	968c337d22	[APU] Temp XMA context allocation region workaround	2019-08-16 09:47:28 +03:00
Triang3l	126978d960	[Memory] Fix memory watch addresses	2019-08-16 08:49:48 +03:00
Triang3l	834ced0d63	[Memory] 0xE0000000: Fix a typo, re-enable and cleanup	2019-08-15 23:55:33 +03:00
Triang3l	e862169156	[Memory] BaseHeap::TranslateRelative including host address offset	2019-08-15 00:31:21 +03:00
Triang3l	0451153760	[Memory] Temporarily disable allocation in 0xE0000000	2019-08-15 00:06:27 +03:00
Triang3l	003c02c640	[CPU, Memory] 0xE0000000 adjustment by @elad335 and mapping	2019-08-14 21:37:52 +03:00
Triang3l	0067f5561d	[Kernel] More TranslateVirtual/HostToGuestVirtual usage	2019-08-14 08:28:30 +03:00
Triang3l	2152c79965	[Memory] 0xE… adjustment in TranslateVirtual	2019-08-14 00:07:27 +03:00
Triang3l	741b5ae2ec	[Memory] Add HostToGuestVirtual and use it in a couple of places	2019-08-13 23:49:49 +03:00
Triang3l	cb0e18c7dc	[Memory] BaseHeap::host_address_offset	2019-08-04 23:55:54 +03:00
Triang3l	25675cb8b8	[Memory] E0000000 adjustment in watches only for Windows	2019-08-04 23:10:59 +03:00
Triang3l	d20c2fa9da	[Memory/Vulkan] Move old memory watches to the Vulkan backend	2019-08-03 21:06:59 +03:00
Triang3l	0370f8bbd9	[Memory] Pass exact_range to watch callbacks	2019-08-03 19:16:04 +03:00
Triang3l	352f12f92e	[D3D12] Switch from gflags to cvars	2019-08-03 16:53:23 +03:00
Jonathan Goyvaerts	c1af632562	Replace all gflag implementations with cvar implementations	2019-08-03 02:34:07 +02:00
Triang3l	9d0986030f	[Memory] Don't mark non-writable pages as watched	2019-07-31 08:40:26 +03:00
Triang3l	24383b9137	[Memory/D3D12] Unwatch up to 256 KB ranges	2019-07-31 00:18:12 +03:00
Triang3l	b5fb84473d	[Memory] Replace forgotten InvalidateRange in NtReadFile	2019-07-30 09:06:23 +03:00

1 2 3

143 Commits