Commit Graph

266 Commits

Author SHA1 Message Date
Margen67 28c67b9384 [APU] Add apu_ prefix to max_queued_frames cvar
Also add note about the minimum value.
2023-06-12 07:21:44 -07:00
chss95cs@gmail.com 27c4cef1b5 Added logger flags, for selectively disabling categories of logging (cpu, apu, kernel). Need to make more log messages make use of these flags.
The "close window" keyboard hotkey (Guide-B) now toggles between loglevel -1 and the loglevel set in your config.
Added LoggerBatch class, which accumulates strings into the threads scratch buffer. This is only intended to be used for very high frequency debug logging. if it exhausts the thread buffer, it just silently stops.
Cleaned nearly 8 years of dust off of the pm4 packet disassembler code, now supports all packets that the command processor supports.
Added extremely verbose logging for gpu register writes. This is not compiled in outside of debug builds, requires LogLevel::Debug and log_guest_driven_gpu_register_written_values = true.
Added full logging of all PM4 packets in the cp. This is not compiled in outside of debug builds, requires LogLevel::Debug and disassemble_pm4.
Piggybacked an implementation of guest callstack backtraces using the stackpoints from enable_host_guest_stack_synchronization. If enable_host_guest_stack_synchronization = false, no backtraces can be obtained.
Added log_ringbuffer_kickoff_initiator_bts. when a thread updates the cp's read pointer, it dumps the backtrace of that thread
Changed the names of the gpu registers CALLBACK_ADDRESS and CALLBACK_CONTEXT to the correct names.
Added a note about CP_PROG_COUNTER
Added CP_RB_WPTR to the gpu register table
Added notes about CP_RB_CNTL and CP_RB_RPTR_ADDR. Both aren't necessary for HLE
Changed name of UNKNOWN_0E00 gpu register to TC_CNTL_STATUS. Games only seem to write 1 to it (L2 invalidate)
2023-04-16 12:42:42 -04:00
Gliniak c74a047655 [Win] Revert XE_USE_KUSER_SHARED back to 0
Also limit queued audio frames
2023-02-11 18:20:21 +01:00
Gliniak d24d3295c6 [XMA] Clear host data on context clear + swap buffer if decoding fails 2023-01-06 19:27:41 +01:00
Gliniak 39c509b57f [APU] Resolved context stuck with is_stream_done_ flag and no space left 2023-01-03 19:49:21 +01:00
Gliniak b7bc0425ba Revert "[APU] Clear host data while reseting context"
This reverts commit 1451ca4266.
2023-01-01 11:23:47 +01:00
Gliniak b2dd489151 [APU] Set first frame offset for next buffer + Note about edgecase 2022-12-13 22:57:36 +01:00
Gliniak e00feb7b0f [APU] Fixed incorrect frame count + removed hopefully useless check from now 2022-12-13 21:47:35 +01:00
Gliniak 16580f5fae [APU] Fixed crash in error message caused by invalid arguments number 2022-12-13 11:02:59 +01:00
Gliniak 97fdf9c6dd [APU] Resolved crash related to negative amount of bits to copy
This is likely due to hitting somehow valid frame in invalid data
2022-12-13 08:46:48 +01:00
Gliniak 43d7fc5158 [APU] Shuffle checks to hopefully prevent crashing from logger 2022-12-11 21:06:47 +01:00
Gliniak c9cd6f15fc [APU] Fixed logged frame count
Until now info about frames that were provided in log was always incorrect by 1
2022-12-11 13:11:23 +01:00
Gliniak 82ccdd3db5 [APU] Misc Changes:
- Unified APU error messages
- Removed magic number from SetOffset call
- Commented out that annoying assertion from XmaContext::GetNextFrame
- Removed checks for current_input_packet_count and replaced with bool check

Not sure how to call it correctly. I know that calls with packet count == 1 is specific one
and probably handled differently. Is it streaming or how should it be called?
2022-12-11 12:22:23 +01:00
chss95cs@gmail.com 7d49b97e4c Print any module name+ offset in host exception reports
print thread name in host exception reports
trying to force win32 error descriptions to english
Return if output buffer block count is 0 in XmaContext, this is an attempt to fix a divide by zero crash many users have reported
2022-12-09 12:24:06 -08:00
Gliniak 12005acc98 [APU] Check if splitted frame length is valid 2022-11-27 18:40:27 +01:00
Gliniak 1451ca4266 [APU] Clear host data while reseting context 2022-11-27 17:00:31 +01:00
Gliniak 9fdfd2ada9 [APU] Removed old hack that invalidates input on decoder error
Added returning parsing error while decoder fails
2022-11-26 17:25:39 +01:00
chss95cs@gmail.com c1d922eebf Minor decoder optimizations, kernel fixes, cpu backend fixes 2022-11-05 10:50:33 -07:00
Gliniak ba66373d8c [APU][Janky] Fixed issues with incorrect frames on streamed data
This requires a lot more research and test data!
2022-11-03 20:56:36 +01:00
Gliniak dae508500a [APU] Clear remaining packets skip when we're done with current stream
Plus some additional logging
2022-11-03 12:59:47 +01:00
Margen67 4ba14bc35e [APU+HID] Optimizations 2022-11-03 03:56:13 -07:00
Gliniak b23566b823 [APU] Fix incorrect packet frame count when frame ends exactly where packet ends
This resolves looping background sound in GoW
2022-11-03 11:14:37 +01:00
Gliniak 259679d53c [APU] Handle exceeding input offset by switching buffer
This should resolve crashes in FH
2022-11-02 08:47:36 +01:00
Gliniak c080e2e17c [APU] Resolved crashes related to out of bound readouts 2022-11-01 11:24:01 +01:00
Gliniak 55877f4c61 [APU] Force buffer swap at the end of stream
Plus some debugging messages and lint fixes
2022-10-25 17:20:45 +02:00
Gliniak 6b11787c93 [APU] Fixed typo that prevented last packet in stream to be processed 2022-10-24 21:33:25 +02:00
Gliniak fac2a89d0f Disallow offset to be set before header, header size fix, audio channels crashfix 2022-10-24 19:43:43 +02:00
chss95cs@gmail.com efbeae660c Drastically reduce cpu time wasted by XMADecoderThread spinning, went from 13% of all cpu time to about 0.6% in my tests
Commented out lock in WatchMemoryRange, lock is always held by caller
properly set the value/check the irql for spinlocks in xboxkrnl_threading
2022-10-15 03:07:07 -07:00
chss95cs@gmail.com 8f7f7dc6ad fixed wine crash from use of NtSetEventPriorityBoost
add xe::clear_lowest_bit, use it in place of shift-andnot in some bit iteration code
make is_allocated_ and is_enabled_ volatile in xma_context
preallocate avpacket buffer in XMAContext::Setup, the reallocations of the buffer in ffmpeg were showing up on profiles
check is_enabled and is_allocated BEFORE locking an xmacontext. XMA worker was spending most of its time locking and unlocking contexts
Removed XeDMAC, dma:: namespace. It was a bad idea and I couldn't make it work in the end. Kept vastcpy and moved it to the memory namespace instead
Made the rest of global_critical_region's members static. They never needed an instance.
Removed ifdef'ed out code from ring_buffer.h
Added EventInfo struct to threading, added Event::Query to aid with implementing NtQueryEvent.
Removed vector from WaitMultiple, instead use a fixed array of 64 handles that we populate. WaitForMultipleObjects cannot handle more than 64 objects.
Remove XE_MSVC_OPTIMIZE_SMALL() use in x64_sequences, x64 backend is now always size optimized because of premake
Make global_critical_region_ static constexpr in shared_memory.h to get rid of wasteage of 8 bytes (empty class=1byte, +alignment for next member=8)
Move trace-related data to the tail of SharedMemory to keep more important data together
In IssueDraw build an array of fetch constant addresses/sizes, then pre-lock the global lock before doing requestrange for each instead of individually locking within requestrange for each of them
Consistent access specifier protected for pm4_command_processor_declare
Devirtualize WriteOneRegisterFromRing.
Move ExecutePacket and ExecutePrimaryBuffer to pm4_command_buffer_x
Remove many redundant header inclusions access xenia-gpu
Minor microoptimization of ExecutePacketType0

Add TextureCache::RequestTextures for batch invocation of LoadTexturesData

Add TextureCache::LoadTexturesData for reducing the number of times we release and reacquire the global lock.
Ideally you should hold the global lock for as little time as possible, but if you are constantly acquiring and releasing it you are actually more likely to have contention
Add already_locked param to ObjectTable::LookupObject to help with reducing lock acquire/release pairs
Add missing checks to XAudioRegisterRenderDriverClient_entry. this is unlikely to fix anything, it was just an easy thing to do
Add NtQueryEvent system call implementation. I don't actually know of any games that need it.
Instead of using std::vector + push_back in KeWaitForMultipleObjects and xeNtWaitForMultipleObjectsEx use a fixed size array of 64 and track the count. More than 64 objects is not permitted by the kernel. The repeated reallocations from push_back were appearing unusually high on the profiler, but were masked until now by waitformultipleobjects natural overhead
Pre-lock the global lock before looking up each handle for xeNtWaitForMultipleObjectsEx and KeWaitForMultipleObjects.
Pre-lock before looking up the signal and waiter in NtSignalAndWaitForSingleObjectEx
add missing checks to NtWaitForMultipleObjectsEx
Support pre-locking in XObject::GetNativeObject
2022-10-08 09:55:17 -07:00
chss95cs@gmail.com 7e58a3b320 Fix compiler errors i introduced under clang-cl
remove xe_kernel_export_shim_fn field of Export function_data, trampoline is now the only way exports get invoked
Remove kernelstate argument from string functions in order to conform to the trampoline signature (the argument was unused anyway)
Constant-evaluated initialization of ppc_opcode_disasm_table, removal of unused std::vector fields
Constant-evaluated initialization of export tables
name field on export is just a const char* now, only immutable static strings are ever passed to it
Remove unused callcount field of export.
PM4 compare op function extracted
Globally apply /Oy, /GS-, /Gw on msvc windows
Remove imgui testwindow code call, it took up like 300 kb
2022-09-29 07:04:17 -07:00
chss95cs@gmail.com eb8154908c atomic cas use prefetchw if available
remove useless memorybarrier
remove double membarrier in wait pm4 cmd
add int64 cvar
use int64 cvar for x64 feature mask
Rework some functions that were frontend bound according to vtune placing some of their code in different noinline functions, profiling after indicating l1 cache misses decreased and perf of func increased
remove long vpinsrd dep chain code for conversion.h, instead do normal load+bswap or movbe if avail
Much faster entry table via split_map, code size could be improved though
GetResolveInfo was very large and had impact on icache, mark callees as noinline + msvc pragma optimize small
use log2 shifts instead of integer divides in memory
minor optimizations in PhysicalHeap::EnableAccessCallbacks, the majority of time in the function is spent looping, NOT calling Protect! Someone should optimize this function and rework the algo completely
remove wonky scheduling log message, it was spammy and unhelpful
lock count was unnecessary for criticalsection mutex, criticalsection is already a recursive mutex
brief notes i gotta run
2022-09-17 04:04:53 -07:00
chss95cs@gmail.com 08f7a28920 Alternative mutex 2022-08-14 08:59:11 -07:00
Gliniak 6c6c5ac14b Merge remote-tracking branch 'GliniakRepo/experimentals' into canary_experimental 2022-05-19 10:51:44 +02:00
Philpax e901567193 Fix crash from null sample channel
Certain games, such as Forza Motorsport 3, submit XMA data with the
stereo flag set with a null second channel. This falls back to mono
conversion when the second channel is null, preventing a crash.
2022-05-19 10:22:41 +02:00
Gliniak de03165995 Merge remote-tracking branch 'GliniakRepo/audioSkipHeaderInputOffset' into canary_pr 2022-05-19 10:16:41 +02:00
Joel Linn 986dcf4f65 [Base] Check success of sync primitive creation
- Mainly use `assert`s, since failure is very rare
- Forward failure of `CreateSemaphore` to guests because it is more easy
  to trigger with invalid initial parameters.
2022-03-08 12:17:57 -06:00
Gliniak 07a1e77218 Allow users to change max amount of queued frames 2022-01-31 20:12:39 +01:00
Gliniak c483da91a4 Stop unnecessary spam of 0x601 opcode usage 2022-01-31 20:11:53 +01:00
Gliniak 8e35a3d649 Invalidate input buffers if decoding fails
Should output be invalidated too?
2022-01-31 20:11:44 +01:00
Gliniak c80ea14d9d Check if input_buffer exist
In some really specific cases there is a chance that
one of the buffers is valid, but its pointer is null
2022-01-31 20:10:14 +01:00
Pseudo-Kernel 372bdd3ec9
[APU] XMA: Fix audio loop handling.
Handles audio loop if loop_start < loop_end.
Need to handle additional cases like loop_start > loop_end.
2022-01-29 02:49:00 -06:00
Triang3l 0846cc026d [APU] Manage XAudio 2.8 lifecycle in MTA thread + error handling cleanup 2021-12-12 17:05:01 +03:00
Gliniak f40607041b [APU] Skip audio header when there is no valid input
Thanks Cancerous1/Randprint for initial reseach in this topic
2021-10-18 08:50:51 +02:00
Triang3l e720e0a540 [Code] Remove game names from code comments (most of at least) 2021-09-05 21:27:40 +03:00
Triang3l 6ce5330f5f [UI] Loop thread to main thread WindowedAppContext 2021-08-28 19:38:24 +03:00
gibbed 8daef93207 [APU] XMA register table cleanup, documentation.
- [APU] Clean up XMA register table.
- [APU] Document observed register ranges in the XMA register table.
2021-06-28 20:32:52 -05:00
Joel Linn f15e3d07e7 [APU] Use vectorized converter in xaudio2 backend 2021-06-12 18:57:39 +03:00
Joel Linn 0ad939b2f1 [APU] Add AVX intrinsic variants for conversion 2021-06-12 18:57:39 +03:00
Joel Linn cd631fc447 [APU, SDL] Refactor sample submission
- Move sample conversion to SDL callback thread
- Add early channel down-conversion
2021-06-12 18:57:39 +03:00
Gliniak 313fb3e5a3 [XMA] GetFrameNumber: Return correct frame_idx at stream end 2021-06-05 08:54:22 -05:00