Commit Graph

10 Commits

Author SHA1 Message Date
Gliniak c3301d9281 [Base] Fixed issue with initialization deadlock on Proton
For whatever reason Proton doesn't like it when Xenia is compiled with 2022 MSVC
2024-12-30 16:41:47 +01:00
Gliniak b9061e6292 [LINT] Linted files + Added lint job to CI 2024-03-12 19:19:30 +01:00
chss95cs@gmail.com 7cc364dcb8 squash reallocs in command buffers by using large prealloced buffer, directly use virtual memory with it so os allocs on demand
mark raw clock functions as noinline, the way msvc was inlining them and ordering the branches meant that rdtsc would often be speculatively executed
add alternative clock impl for win, instead of using queryperformancecounter we grab systemtime from kusershared. it does not have the same precision as queryperformancecounter, we only have 100 nanosecond precision, but we round to milliseconds so it never made sense to use the performance counter in the first place
stubbed out the "guest clock mutex"... (the entirety of clock.cc needs a rewrite)
added some helpers for minf/maxf without the nan handling behavior
2022-08-14 13:42:08 -07:00
chss95cs@gmail.com 08f7a28920 Alternative mutex 2022-08-14 08:59:11 -07:00
chss95cs@gmail.com 495b1f8bc8 once again return to spinloop 2022-08-13 14:05:35 -07:00
chss95cs@gmail.com c9e4119428 Add branch of ffmpeg with non-recursive split_radix_permutation
Add branch of disruptorplus with working blocking_wait_stategy
Switch back to blocking wait for timer queue
2022-08-13 13:43:45 -07:00
chss95cs@gmail.com 020d64a1a1 revert to using old bad spinwait, disruptorplus' blocking_wait code does not compile 2022-08-13 13:20:35 -07:00
chss95cs@gmail.com cb85fe401c Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90%
But for normal msvc builds i would put it at around 30-50%
Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up
Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger
fixed a number of errors where temporaries were being passed by reference/pointer
Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes.
Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold.
Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me.
Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it
Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total
Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time
Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op
Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x
For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases
^this can be toggled off in the platform_win header
handle indirect call true with constant function pointer, was occurring in h3
lookup host format swizzle in denser array
by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar
^looking up whether its known or not took approx 0.3% cpu time
Changed some things in /cpu to make the project UNITYBUILD friendly
The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead
tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds)
Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed
added support for docdecaduple precision floating point so that we can represent our performance gains numerically
tons of other stuff im probably forgetting
2022-08-13 12:59:00 -07:00
Joel Linn e3dd873892 [Base] Fix wait for callback return
- If wait item has disarmed itself and is then disarmed by another
  thread, still wait for the callback to return to meet guaratees
2022-04-26 13:56:11 -05:00
Joel Linn 75357caeaf [Base] Add TimerQueue
- Cross platform functionality similar to Windows' `CreateTimerQueue`
  with `WT_EXECUTEINTIMERTHREAD`
2022-04-26 13:56:11 -05:00