Commit Graph

5735 Commits

Author SHA1 Message Date
Gliniak 1eb61aa9ab Added reccently opened titles list 2022-11-29 10:47:30 +01:00
chrisps 0674b68143
Merge pull request #96 from chrisps/host_guest_stack_synchronization
Host/Guest stack sync, exception messagebox, kernel improvements, minor opt
2022-11-27 10:30:16 -08:00
Gliniak 12005acc98 [APU] Check if splitted frame length is valid 2022-11-27 18:40:27 +01:00
chss95cs@gmail.com 90c771526d "Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false
Remove dead #if 0'd code in math.h

On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe
and gives a modest code size reduction

Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds.

In some games seemingly random crashes were happening that were hard to trace because
the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread.

Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted.
Changes were also made to allow the guest to call into a piece of an existing x64 function.

This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false.
It is possible it may have introduced regressions, but i dont know of any yet

So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame.

Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game

MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function

add Processor::LookupModule

Add Backend::DeinitializeBackendContext
Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0)

add notes about flags that trap in XamInputGetCapabilities

0 == 3 in XamInputGetCapabilities

Name arg 2 of XamInputSetState

PrefetchW in critical section kernel funcs if available & doing cmpxchg

Add terminated field to X_KTHREAD, set it on termination

Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type)
and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread.

Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?)

Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen

Use page_size_shift in more places

Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues
"Fix" debug console, we were checking the cvar before any cvars were loaded, and the condition it checks in AttachConsole is somehow always false

Remove dead #if 0'd code in math.h

On amd64, page_size == 4096 constant, on amd64 w/ win32, allocation_granularity == 65536. These values for x86 windows havent changed over the last 20 years so this is probably safe
and gives a modest code size reduction

Enable XE_USE_KUSER_SHARED. This sources host time from KUSER_SHARED instead of from QueryPerformanceCounter, which is far faster, but only has a granularity of 100 nanoseconds.

In some games seemingly random crashes were happening that were hard to trace because
the faulting thread was actually not the one that was misbehaving, another threads stack was underflowing into the faulting thread.

Added a bunch of code to synchronize the guest stack and host stack so that if a guest longjmps the host's stack will be adjusted.
Changes were also made to allow the guest to call into a piece of an existing x64 function.

This synchronization might have a slight performance impact on lower end cpus, to disable it set enable_host_guest_stack_synchronization to false.
It is possible it may have introduced regressions, but i dont know of any yet

So far, i know the synchronization change fixes the "hub crash" in super sonic and allows the game "london 2012" to go ingame.

Removed emit_useless_fpscr_updates, not emitting these updates breaks the raiden game

MapGuestAddressToMachineCode now returns nullptr if no address was found, instead of the start of the function

add Processor::LookupModule

Add Backend::DeinitializeBackendContext
Use WriteRegisterRangeFromRing_WithKnownBound<0, 0xFFFF> in WriteRegisterRangeFromRing for inlining (previously regressed on performance of ExecutePacketType0)

add notes about flags that trap in XamInputGetCapabilities

0 == 3 in XamInputGetCapabilities

Name arg 2 of XamInputSetState

PrefetchW in critical section kernel funcs if available & doing cmpxchg

Add terminated field to X_KTHREAD, set it on termination

Expanded the logic of NtResumeThread/NtSuspendThread to include checking the type of the handle (in release, LookupObject doesnt seem to do anything with the type)
and returning X_STATUS_OBJECT_TYPE_MISMATCH if invalid. Do termination check in NtSuspendThread.

Add basic host exception messagebox, need to flesh it out more (maybe use the new stack tracking stuff if on guest thrd?)

Add rdrand patching hack, mostly affects users with nvidia cards who have many threads on zen

Use page_size_shift in more places

Once again disable precompilation! Raiden is mostly weird ppc asm which probably breaks the precompilation. The code is still useful for running the compiler over the whole of an xex in debug to test for issues
2022-11-27 09:39:33 -08:00
Gliniak 1451ca4266 [APU] Clear host data while reseting context 2022-11-27 17:00:31 +01:00
Gliniak 9fdfd2ada9 [APU] Removed old hack that invalidates input on decoder error
Added returning parsing error while decoder fails
2022-11-26 17:25:39 +01:00
chss95cs@gmail.com 7a17fad88a fix crash from precompiling out of range funcs, add xexcache version, increment xexcache version (all priors are version 0 thanks to 0 initialization) 2022-11-07 05:40:18 -08:00
chss95cs@gmail.com e21fd22d09 add x_kthread priority/fpu_exceptions_on fields, set fpu_exceptions_on in KeEnableFpuExceptions, set priority in SetPriority
add msr field on context
write to msr for mtmsr/mfmsr, do not have correct default value for msr yet, nor has mtmsrd been reimplemented
do not evaluate assert expressions in release at all, while still avoiding unused variable warnings
2022-11-06 11:03:10 -08:00
chss95cs@gmail.com c1d922eebf Minor decoder optimizations, kernel fixes, cpu backend fixes 2022-11-05 10:50:33 -07:00
Gliniak ba66373d8c [APU][Janky] Fixed issues with incorrect frames on streamed data
This requires a lot more research and test data!
2022-11-03 20:56:36 +01:00
Gliniak dae508500a [APU] Clear remaining packets skip when we're done with current stream
Plus some additional logging
2022-11-03 12:59:47 +01:00
Margen67 4ba14bc35e [APU+HID] Optimizations 2022-11-03 03:56:13 -07:00
Gliniak b23566b823 [APU] Fix incorrect packet frame count when frame ends exactly where packet ends
This resolves looping background sound in GoW
2022-11-03 11:14:37 +01:00
Gliniak 259679d53c [APU] Handle exceeding input offset by switching buffer
This should resolve crashes in FH
2022-11-02 08:47:36 +01:00
chrisps 8186792113
Revert "Minor decoder optimizations, kernel fixes, cpu backend fixes" 2022-11-01 14:45:36 -07:00
chrisps 781871e2d5
Merge pull request #87 from chrisps/canary_experimental
Minor decoder optimizations, kernel fixes, cpu backend fixes
2022-11-01 11:49:10 -07:00
Gliniak c080e2e17c [APU] Resolved crashes related to out of bound readouts 2022-11-01 11:24:01 +01:00
chss95cs@gmail.com 06bfd624de fix failed debug build from loops variable assert 2022-10-30 12:33:08 -07:00
chss95cs@gmail.com bff264b5fd Fixed RtlCompareString and RtlCompareStringN, they were very wrong, for CompareString the params are struct ptrs not char ptrs
Fixed a ton of clang-cl compiler warnings about unused variables, still many left. Fixed a lot of inconsistent override ones too
2022-10-30 10:47:09 -07:00
chrisps 65b9d93551
Merge branch 'xenia-canary:canary_experimental' into canary_experimental 2022-10-30 09:05:40 -07:00
chss95cs@gmail.com 550d1d0a7c use much faster exp2/cos approximations in ffmpeg, large decrease in cpu usage on my machine on decoder thread
properly byteswap r13 for spinlock
Add PPCOpcodeBits
stub out broken fpscr updating in ppc_hir_builder. it's just code that repeatedly does nothing right now.
add note about 0 opcode bytes being executed to ppc_frontend
Add assert to check that function end is greater than function start, can happen with malformed functions
Disable prefetch and cachecontrol by default, automatic hardware prefetchers already do the job for the most part
minor cleanup in simplification_pass, dont loop optimizations, let the pass manager do it for us
Add experimental "delay_via_maybeyield" cvar, which uses MaybeYield to "emulate" the db16cyc instruction
Add much faster/simpler way of directly calling guest functions, no longer have to do a byte by byte search through the generated code
Generate label string ids on the fly
Fix unused function warnings for prefetch on clang, fix many other clang warnings
Eliminated majority of CallNativeSafes by replacing them with naive generic code paths.
^ Vector rotate left, vector shift left, vector shift right, vector shift arithmetic right, and vector average are included
These naive paths are implemented small loops that stash the two inputs to the stack and load them in gprs from there, they are not particularly fast but should be an order of magnitude faster than callnativesafe
to a host function, which would involve a call, stashing all volatile registers, an indirect call, potentially setting up a stack frame for the arrays that the inputs get stashed to, the actual operations, a return, loading all volatile registers, a return, etc
Added the fast SHR_V128 path back in
Implement signed vector average byte, signed vector average word. previously we were emitting no code for them. signed vector average byte appears in many games
Fix bug with signed vector average 32, we were doing unsigned shift, turning negative values into big positive ones potentially
2022-10-30 08:48:58 -07:00
Gliniak 55877f4c61 [APU] Force buffer swap at the end of stream
Plus some debugging messages and lint fixes
2022-10-25 17:20:45 +02:00
Gliniak 6b11787c93 [APU] Fixed typo that prevented last packet in stream to be processed 2022-10-24 21:33:25 +02:00
Gliniak fac2a89d0f Disallow offset to be set before header, header size fix, audio channels crashfix 2022-10-24 19:43:43 +02:00
Radosław Gliński 7c375879bc
Merge pull request #85 from chrisps/canary_experimental
Kernel improvements, "fix" crash on sandy bridge/ivy bridge
2022-10-21 14:18:03 +02:00
chrisps 4493d17acc
Update hir_builder.cc 2022-10-20 14:58:27 -07:00
chrisps adc3405537
change else{if} to else if in AndNot 2022-10-20 14:56:55 -07:00
Gliniak 48fea6d9aa Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-10-18 12:19:52 +02:00
Triang3l cdb40ddb28 [DXBC] Fix interpolator copying from v# to r# in PS
The bit count was of `(1<<i)-1` itself (thus couldn't handle interpolators with a smaller index skipped), not of `bits&((1<<i)-1)`.
2022-10-18 13:12:37 +03:00
chss95cs@gmail.com d8b7b3ecec Fix bindless path in d3d12 that i broke in earlier commit (did not affect any users, thats a debug thing)
Fix guest code profiler, it previously only worked with function precomp + all code you were about to execute already discovered
Allow AndNot if type is V128
2022-10-16 07:48:43 -07:00
chss95cs@gmail.com 22e52cbecd Canary can now run on sandy bridge/e and ivy bridge/e
Stubbed out OPCODE_AND_NOT, its fallback implementation if bmi1 was not supported was broken. it's difficult to tell where the actual issue is there.
2022-10-15 05:14:53 -07:00
chss95cs@gmail.com 7204532b1c Implement RtlUpcaseUnicodeChar 2022-10-15 04:29:13 -07:00
chrisps a495709344
Merge branch 'xenia-canary:canary_experimental' into canary_experimental 2022-10-15 03:07:35 -07:00
chss95cs@gmail.com efbeae660c Drastically reduce cpu time wasted by XMADecoderThread spinning, went from 13% of all cpu time to about 0.6% in my tests
Commented out lock in WatchMemoryRange, lock is always held by caller
properly set the value/check the irql for spinlocks in xboxkrnl_threading
2022-10-15 03:07:07 -07:00
Gliniak d262214c1b Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-10-14 20:13:03 +02:00
chss95cs@gmail.com ecf6bfbbdf Stub event query for linux, fix missing semicolon in linux SetEventBoostPriority 2022-10-09 12:30:18 -07:00
Triang3l 45050b2380 [GPU] Vulkan fragment shader interlock RB and related fixes/cleanup
Also fixes addressing of MSAA samples 2 and 3 for 64bpp color render targets in the ROV RB implementation on Direct3D 12.
Additionally, with FSI/ROV, alpha test and alpha to coverage are done only if the render target 0 was dynamically written to (according to the Direct3D 9 rules for writing to color render targets, though not sure if they actually apply to the alpha tests on Direct3D 9, but for safety).
There is also some code cleanup for things spotted during the development of the feature.
2022-10-09 22:06:41 +03:00
Gliniak c923ab78a9 [Config] Added note about internal_display_resolution 2022-10-09 12:33:04 +02:00
Gliniak 7975ea78d4 [Base] BitStream: Prevent readout beyond buffer 2022-10-09 12:24:46 +02:00
Gliniak 17b3939bbf Revert "[Base] Changed size of bitstream accessed data (Risky)"
This reverts commit 061000af01.
2022-10-09 12:18:43 +02:00
chss95cs@gmail.com 2dd6f33f4b Fix debug/ui premake too 2022-10-08 10:34:50 -07:00
chss95cs@gmail.com d8c94b1aee Fix premake filter mistake that broke debug builds (and likely any build other than release) 2022-10-08 10:10:36 -07:00
chss95cs@gmail.com 8f7f7dc6ad fixed wine crash from use of NtSetEventPriorityBoost
add xe::clear_lowest_bit, use it in place of shift-andnot in some bit iteration code
make is_allocated_ and is_enabled_ volatile in xma_context
preallocate avpacket buffer in XMAContext::Setup, the reallocations of the buffer in ffmpeg were showing up on profiles
check is_enabled and is_allocated BEFORE locking an xmacontext. XMA worker was spending most of its time locking and unlocking contexts
Removed XeDMAC, dma:: namespace. It was a bad idea and I couldn't make it work in the end. Kept vastcpy and moved it to the memory namespace instead
Made the rest of global_critical_region's members static. They never needed an instance.
Removed ifdef'ed out code from ring_buffer.h
Added EventInfo struct to threading, added Event::Query to aid with implementing NtQueryEvent.
Removed vector from WaitMultiple, instead use a fixed array of 64 handles that we populate. WaitForMultipleObjects cannot handle more than 64 objects.
Remove XE_MSVC_OPTIMIZE_SMALL() use in x64_sequences, x64 backend is now always size optimized because of premake
Make global_critical_region_ static constexpr in shared_memory.h to get rid of wasteage of 8 bytes (empty class=1byte, +alignment for next member=8)
Move trace-related data to the tail of SharedMemory to keep more important data together
In IssueDraw build an array of fetch constant addresses/sizes, then pre-lock the global lock before doing requestrange for each instead of individually locking within requestrange for each of them
Consistent access specifier protected for pm4_command_processor_declare
Devirtualize WriteOneRegisterFromRing.
Move ExecutePacket and ExecutePrimaryBuffer to pm4_command_buffer_x
Remove many redundant header inclusions access xenia-gpu
Minor microoptimization of ExecutePacketType0

Add TextureCache::RequestTextures for batch invocation of LoadTexturesData

Add TextureCache::LoadTexturesData for reducing the number of times we release and reacquire the global lock.
Ideally you should hold the global lock for as little time as possible, but if you are constantly acquiring and releasing it you are actually more likely to have contention
Add already_locked param to ObjectTable::LookupObject to help with reducing lock acquire/release pairs
Add missing checks to XAudioRegisterRenderDriverClient_entry. this is unlikely to fix anything, it was just an easy thing to do
Add NtQueryEvent system call implementation. I don't actually know of any games that need it.
Instead of using std::vector + push_back in KeWaitForMultipleObjects and xeNtWaitForMultipleObjectsEx use a fixed size array of 64 and track the count. More than 64 objects is not permitted by the kernel. The repeated reallocations from push_back were appearing unusually high on the profiler, but were masked until now by waitformultipleobjects natural overhead
Pre-lock the global lock before looking up each handle for xeNtWaitForMultipleObjectsEx and KeWaitForMultipleObjects.
Pre-lock before looking up the signal and waiter in NtSignalAndWaitForSingleObjectEx
add missing checks to NtWaitForMultipleObjectsEx
Support pre-locking in XObject::GetNativeObject
2022-10-08 09:55:17 -07:00
chss95cs@gmail.com bae63b95c5 Update to latest version of cxxopts 2022-09-30 06:51:25 -07:00
chss95cs@gmail.com b4c175d8a3 Enable SDL_LEAN_AND_MEAN, SDL_RENDER_DISABLED, saves about 500kb in final exe
Build several projects that arent performance critical with /Os and /O1 under msvc windows
2022-09-29 07:26:38 -07:00
chss95cs@gmail.com 7e58a3b320 Fix compiler errors i introduced under clang-cl
remove xe_kernel_export_shim_fn field of Export function_data, trampoline is now the only way exports get invoked
Remove kernelstate argument from string functions in order to conform to the trampoline signature (the argument was unused anyway)
Constant-evaluated initialization of ppc_opcode_disasm_table, removal of unused std::vector fields
Constant-evaluated initialization of export tables
name field on export is just a const char* now, only immutable static strings are ever passed to it
Remove unused callcount field of export.
PM4 compare op function extracted
Globally apply /Oy, /GS-, /Gw on msvc windows
Remove imgui testwindow code call, it took up like 300 kb
2022-09-29 07:04:17 -07:00
Gliniak 203267b106 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-09-23 12:23:53 +02:00
Rick Gibbed 3bfa3b05e1
Lint fix. 2022-09-22 06:34:21 -05:00
Gliniak 7d970967c4 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2022-09-20 21:15:12 +02:00
beeanyew cd17f1846f [Input System] xe_mutex revert
On request from chrisps, revert xe_mutex back to xe_unlikely_mutex to avoid mutex deadlocks while initializing hid-winkey.
2022-09-18 15:18:29 +02:00