xenia-canary

Commit Graph

Author	SHA1	Message	Date
Gliniak	c3301d9281	[Base] Fixed issue with initialization deadlock on Proton For whatever reason Proton doesn't like it when Xenia is compiled with 2022 MSVC	2024-12-30 16:41:47 +01:00
Gliniak	b9061e6292	[LINT] Linted files + Added lint job to CI	2024-03-12 19:19:30 +01:00
chss95cs@gmail.com	7cc364dcb8	squash reallocs in command buffers by using large prealloced buffer, directly use virtual memory with it so os allocs on demand mark raw clock functions as noinline, the way msvc was inlining them and ordering the branches meant that rdtsc would often be speculatively executed add alternative clock impl for win, instead of using queryperformancecounter we grab systemtime from kusershared. it does not have the same precision as queryperformancecounter, we only have 100 nanosecond precision, but we round to milliseconds so it never made sense to use the performance counter in the first place stubbed out the "guest clock mutex"... (the entirety of clock.cc needs a rewrite) added some helpers for minf/maxf without the nan handling behavior	2022-08-14 13:42:08 -07:00
chss95cs@gmail.com	08f7a28920	Alternative mutex	2022-08-14 08:59:11 -07:00
chss95cs@gmail.com	495b1f8bc8	once again return to spinloop	2022-08-13 14:05:35 -07:00
chss95cs@gmail.com	c9e4119428	Add branch of ffmpeg with non-recursive split_radix_permutation Add branch of disruptorplus with working blocking_wait_stategy Switch back to blocking wait for timer queue	2022-08-13 13:43:45 -07:00
chss95cs@gmail.com	020d64a1a1	revert to using old bad spinwait, disruptorplus' blocking_wait code does not compile	2022-08-13 13:20:35 -07:00
chss95cs@gmail.com	cb85fe401c	Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90% But for normal msvc builds i would put it at around 30-50% Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger fixed a number of errors where temporaries were being passed by reference/pointer Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes. Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold. Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me. Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases ^this can be toggled off in the platform_win header handle indirect call true with constant function pointer, was occurring in h3 lookup host format swizzle in denser array by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar ^looking up whether its known or not took approx 0.3% cpu time Changed some things in /cpu to make the project UNITYBUILD friendly The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds) Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed added support for docdecaduple precision floating point so that we can represent our performance gains numerically tons of other stuff im probably forgetting	2022-08-13 12:59:00 -07:00
Joel Linn	e3dd873892	[Base] Fix wait for callback return - If wait item has disarmed itself and is then disarmed by another thread, still wait for the callback to return to meet guaratees	2022-04-26 13:56:11 -05:00
Joel Linn	75357caeaf	[Base] Add TimerQueue - Cross platform functionality similar to Windows' `CreateTimerQueue` with `WT_EXECUTEINTIMERTHREAD`	2022-04-26 13:56:11 -05:00

10 Commits