Commit Graph

7180 Commits

Author SHA1 Message Date
chrisps 8c6cefcce0
Merge branch 'xenia-canary:canary_experimental' into command_processor_optimizations 2022-12-14 11:35:13 -08:00
chss95cs@gmail.com f931c34ecb Cleaned up for commit, moved WriteRegistersFromMemCommonSense code into WriteRegistersFromMem
optimized copy_and_swap_32_unaligned further
2022-12-14 11:34:33 -08:00
chss95cs@gmail.com 754293ffc3 Fix mistake with fetch constant dirty mask 2022-12-14 10:13:13 -08:00
chss95cs@gmail.com 7a0fd0f32a Remove MaybeYields when vsync is off 2022-12-14 09:56:33 -08:00
chss95cs@gmail.com 080b6f4cbd Partially vectorized GetScissor (loading and unpacking the bitfields from the registers is still scalar) 2022-12-14 09:33:14 -08:00
chss95cs@gmail.com ab6d9dade0 add avx2 codepath for copy_and_swap_32_unaligned
use the new writerange approach in WriteRegisterRangeFromMem_WithKnownBound
2022-12-14 07:53:21 -08:00
Gliniak b2dd489151 [APU] Set first frame offset for next buffer + Note about edgecase 2022-12-13 22:57:36 +01:00
Gliniak e00feb7b0f [APU] Fixed incorrect frame count + removed hopefully useless check from now 2022-12-13 21:47:35 +01:00
chss95cs@gmail.com 82dcf3f951 faster/more compact MatchValueAndRef
Made commandprocessor GetcurrentRingReadcount inline, it was made noinline to match PGO decisions but i think PGO can make extra reg allocation decisions that make this inlining choice yield gains, whereas if we do it manually we lose a tiny bit of performance

Working on a more compact vectorized version of GetScissor to save icache on cmd processor thread
Add WriteRegisterForceinline, will probably end up amending to remove it

add ERMS path for vastcpy

Adding WriteRegistersFromMemCommonSense (name will be changed later), name is because i realized my approach with optimizing writeregisters has been backwards,  instead of handling all checks more quickly within the loop that writes the registers, we need a different loop for each range with unique handling. we also manage to hoist a lot of the logic out of the loops

Use 100ns delay for MaybeYield, noticed we often return almost immediately from the syscall so we end up wasting some cpu, instead we give up the cpu for the min waitable time (on my system, this is 0.5 ms)

Added a note about affinity mask/dynamic process affinity updates to threading_win

Add TextureFetchConstantsWritten
2022-12-13 11:25:33 -08:00
Gliniak 16580f5fae [APU] Fixed crash in error message caused by invalid arguments number 2022-12-13 11:02:59 +01:00
Gliniak 97fdf9c6dd [APU] Resolved crash related to negative amount of bits to copy
This is likely due to hitting somehow valid frame in invalid data
2022-12-13 08:46:48 +01:00
Gliniak 43d7fc5158 [APU] Shuffle checks to hopefully prevent crashing from logger 2022-12-11 21:06:47 +01:00
Gliniak c9cd6f15fc [APU] Fixed logged frame count
Until now info about frames that were provided in log was always incorrect by 1
2022-12-11 13:11:23 +01:00
Gliniak 82ccdd3db5 [APU] Misc Changes:
- Unified APU error messages
- Removed magic number from SetOffset call
- Commented out that annoying assertion from XmaContext::GetNextFrame
- Removed checks for current_input_packet_count and replaced with bool check

Not sure how to call it correctly. I know that calls with packet count == 1 is specific one
and probably handled differently. Is it streaming or how should it be called?
2022-12-11 12:22:23 +01:00
Gliniak 2c6bbf9a4a [GIT] Added recent.toml to ignored files 2022-12-11 11:11:02 +01:00
chrisps 0f94eb21c2
Merge pull request #102 from chrisps/error_modules_and_threads_plus_xmacontext_workaround
Host exception improvements, bandaid over div by 0 crash
2022-12-10 09:09:52 -08:00
chss95cs@gmail.com 7d49b97e4c Print any module name+ offset in host exception reports
print thread name in host exception reports
trying to force win32 error descriptions to english
Return if output buffer block count is 0 in XmaContext, this is an attempt to fix a divide by zero crash many users have reported
2022-12-09 12:24:06 -08:00
Gliniak 7c5da821d4 [Kernel] Fixed invalid thread pointer in KeEnableFpuExceptions 2022-12-08 21:48:13 +01:00
Radosław Gliński 747fb42bdf
Merge pull request #98 from AdrianCassar/canary_experimental
Added a hotkey to open the previously played title
2022-12-05 18:09:52 +01:00
MoistyMarley 6d2724a861 Added a hotkey to open the previously played title 2022-12-05 17:01:16 +00:00
chrisps 85723f117d
Merge pull request #99 from chrisps/stack_sync2_fence_krnl_hostexcept
Improve stack sync, kernel fixes, better host exception reporting
2022-12-04 13:50:06 -08:00
chss95cs@gmail.com a63f424c0a Directly check PEB for IsDebuggerAttached
Add constexpr getters to magicdiv class so it can be used from jitted x64/dxbc
Track the guest return address as well for guest/host sync, if multiple entries have the same guest stack find the first one with a matching guest retaddr. this fixes epic mickey 2 (which the previous guest-stack change had allowed to go ingame for a bit) and potentially also a crash in fable3.
Break if under debugger when stackpoints are overflowed

Add much more useful output for host exceptions, print out xenia_canary.exe relative offsets if exception is in module, formatmessage for ntstatus/win32err, strerror

Minor d3d12 microoptimization, instead of doing SetEventOnCompletion + WaitForSingleObject do SetEventOnCompletion w/ nullptr so that the wait happens in kernel mode, avoiding two extra context switches

add unimplemented kernel functions:
ExAllocatePoolWithTag
ObReferenceObject

ObDereferenceObject has no return value.
Log a message when ObDereferenceObject/Reference receive unregistered guest kernel objects
gave ObLookupThreadByThreadId its correct error status
hoist object_types initialization out of ObReferenceObjectByHandle
Fix out parameter values on error for a few kernel funcs
add note about msr to KeSetCurrentStackPointers
add X_STATUS_OBJECT_TYPE_MISMATCH check for xeNtSetEvent
add msr_mask field to X_KPCR
2022-12-04 12:38:19 -08:00
Gliniak 1eb61aa9ab Added reccently opened titles list 2022-11-29 10:47:30 +01:00
chrisps 0674b68143
Merge pull request #96 from chrisps/host_guest_stack_synchronization
Host/Guest stack sync, exception messagebox, kernel improvements, minor opt
2022-11-27 10:30:16 -08:00
Gliniak 12005acc98 [APU] Check if splitted frame length is valid 2022-11-27 18:40:27 +01:00
chss95cs@gmail.com 90c771526d "Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false
Remove dead #if 0'd code in math.h

On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe
and gives a modest code size reduction

Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds.

In some games seemingly random crashes were happening that were hard to trace because
the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread.

Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted.
Changes were also made to allow the guest to call into a piece of an existing x64 function.

This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false.
It is possible it may have introduced regressions, but i dont know of any yet

So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame.

Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game

MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function

add Processor::LookupModule

Add Backend::DeinitializeBackendContext
Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0)

add notes about flags that trap in XamInputGetCapabilities

0 == 3 in XamInputGetCapabilities

Name arg 2 of XamInputSetState

PrefetchW in critical section kernel funcs if available & doing cmpxchg

Add terminated field to X_KTHREAD, set it on termination

Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type)
and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread.

Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?)

Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen

Use page_size_shift in more places

Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues
"Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false

Remove dead #if 0'd code in math.h

On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe
and gives a modest code size reduction

Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds.

In some games seemingly random crashes were happening that were hard to trace because
the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread.

Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted.
Changes were also made to allow the guest to call into a piece of an existing x64 function.

This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false.
It is possible it may have introduced regressions, but i dont know of any yet

So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame.

Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game

MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function

add Processor::LookupModule

Add Backend::DeinitializeBackendContext
Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0)

add notes about flags that trap in XamInputGetCapabilities

0 == 3 in XamInputGetCapabilities

Name arg 2 of XamInputSetState

PrefetchW in critical section kernel funcs if available & doing cmpxchg

Add terminated field to X_KTHREAD, set it on termination

Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type)
and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread.

Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?)

Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen

Use page_size_shift in more places

Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues
2022-11-27 09:39:33 -08:00
Gliniak 1451ca4266 [APU] Clear host data while reseting context 2022-11-27 17:00:31 +01:00
Gliniak 9fdfd2ada9 [APU] Removed old hack that invalidates input on decoder error
Added returning parsing error while decoder fails
2022-11-26 17:25:39 +01:00
chrisps 6e541536dd
Merge pull request #93 from chrisps/canary_experimental
add some missing kthread fields, fix assert eval in release
2022-11-07 14:49:31 -08:00
chss95cs@gmail.com 7a17fad88a fix crash from precompiling out of range funcs, add xexcache version, increment xexcache version (all priors are version 0 thanks to 0 initialization) 2022-11-07 05:40:18 -08:00
chss95cs@gmail.com e21fd22d09 add x_kthread priority/fpu_exceptions_on fields, set fpu_exceptions_on in KeEnableFpuExceptions, set priority in SetPriority
add msr field on context
write to msr for mtmsr/mfmsr, do not have correct default value for msr yet, nor has mtmsrd been reimplemented
do not evaluate assert expressions in release at all, while still avoiding unused variable warnings
2022-11-06 11:03:10 -08:00
chrisps 3dcbd25e7f
Merge pull request #92 from chrisps/canary_experimental
ffmpeg decoder optimizations, kernel fixes, cpu backend fixes, clang warnings, implement some missing kernel functions
2022-11-05 11:59:35 -07:00
chss95cs@gmail.com c70ae76a69 hopefully switched cxxopts to the main master branch now that the selectany changes are accepted 2022-11-05 11:08:04 -07:00
chss95cs@gmail.com c1d922eebf Minor decoder optimizations, kernel fixes, cpu backend fixes 2022-11-05 10:50:33 -07:00
Gliniak ba66373d8c [APU][Janky] Fixed issues with incorrect frames on streamed data
This requires a lot more research and test data!
2022-11-03 20:56:36 +01:00
Gliniak dae508500a [APU] Clear remaining packets skip when we're done with current stream
Plus some additional logging
2022-11-03 12:59:47 +01:00
Margen67 4ba14bc35e [APU+HID] Optimizations 2022-11-03 03:56:13 -07:00
Gliniak b23566b823 [APU] Fix incorrect packet frame count when frame ends exactly where packet ends
This resolves looping background sound in GoW
2022-11-03 11:14:37 +01:00
Gliniak 259679d53c [APU] Handle exceeding input offset by switching buffer
This should resolve crashes in FH
2022-11-02 08:47:36 +01:00
chrisps ff0f3fcc9d
Merge pull request #89 from xenia-canary/revert-87-canary_experimental
Revert "Minor decoder optimizations, kernel fixes, cpu backend fixes"
2022-11-01 14:46:55 -07:00
chrisps 8186792113
Revert "Minor decoder optimizations, kernel fixes, cpu backend fixes" 2022-11-01 14:45:36 -07:00
chrisps 781871e2d5
Merge pull request #87 from chrisps/canary_experimental
Minor decoder optimizations, kernel fixes, cpu backend fixes
2022-11-01 11:49:10 -07:00
Gliniak c080e2e17c [APU] Resolved crashes related to out of bound readouts 2022-11-01 11:24:01 +01:00
chss95cs@gmail.com 06bfd624de fix failed debug build from loops variable assert 2022-10-30 12:33:08 -07:00
chss95cs@gmail.com 941237027d fix ffmpeg submodule ptr 2022-10-30 11:16:05 -07:00
chss95cs@gmail.com bff264b5fd Fixed RtlCompareString and RtlCompareStringN, they were very wrong, for CompareString the params are struct ptrs not char ptrs
Fixed a ton of clang-cl compiler warnings about unused variables, still many left. Fixed a lot of inconsistent override ones too
2022-10-30 10:47:09 -07:00
chrisps 65b9d93551
Merge branch 'xenia-canary:canary_experimental' into canary_experimental 2022-10-30 09:05:40 -07:00
chss95cs@gmail.com f5cc54bdae Fix building on clang-cl, it did not like the cxxopts selectany changes 2022-10-30 09:05:10 -07:00
chss95cs@gmail.com 4fc18949a2 Merge branch 'canary_experimental' of https://github.com/chrisps/xenia-canary into canary_experimental 2022-10-30 08:55:53 -07:00
chss95cs@gmail.com 550d1d0a7c use much faster exp2/cos approximations in ffmpeg, large decrease in cpu usage on my machine on decoder thread
properly byteswap r13 for spinlock
Add PPCOpcodeBits
stub out broken fpscr updating in ppc_hir_builder. it's just code that repeatedly does nothing right now.
add note about 0 opcode bytes being executed to ppc_frontend
Add assert to check that function end is greater than function start, can happen with malformed functions
Disable prefetch and cachecontrol by default, automatic hardware prefetchers already do the job for the most part
minor cleanup in simplification_pass, dont loop optimizations, let the pass manager do it for us
Add experimental "delay_via_maybeyield" cvar, which uses MaybeYield to "emulate" the db16cyc instruction
Add much faster/simpler way of directly calling guest functions, no longer have to do a byte by byte search through the generated code
Generate label string ids on the fly
Fix unused function warnings for prefetch on clang, fix many other clang warnings
Eliminated majority of CallNativeSafes by replacing them with naive generic code paths.
^ Vector rotate left, vector shift left, vector shift right, vector shift arithmetic right, and vector average are included
These naive paths are implemented small loops that stash the two inputs to the stack and load them in gprs from there, they are not particularly fast but should be an order of magnitude faster than callnativesafe
to a host function, which would involve a call, stashing all volatile registers, an indirect call, potentially setting up a stack frame for the arrays that the inputs get stashed to, the actual operations, a return, loading all volatile registers, a return, etc
Added the fast SHR_V128 path back in
Implement signed vector average byte, signed vector average word. previously we were emitting no code for them. signed vector average byte appears in many games
Fix bug with signed vector average 32, we were doing unsigned shift, turning negative values into big positive ones potentially
2022-10-30 08:48:58 -07:00