add xe::clear_lowest_bit, use it in place of shift-andnot in some bit iteration code
make is_allocated_ and is_enabled_ volatile in xma_context
preallocate avpacket buffer in XMAContext::Setup, the reallocations of the buffer in ffmpeg were showing up on profiles
check is_enabled and is_allocated BEFORE locking an xmacontext. XMA worker was spending most of its time locking and unlocking contexts
Removed XeDMAC, dma:: namespace. It was a bad idea and I couldn't make it work in the end. Kept vastcpy and moved it to the memory namespace instead
Made the rest of global_critical_region's members static. They never needed an instance.
Removed ifdef'ed out code from ring_buffer.h
Added EventInfo struct to threading, added Event::Query to aid with implementing NtQueryEvent.
Removed vector from WaitMultiple, instead use a fixed array of 64 handles that we populate. WaitForMultipleObjects cannot handle more than 64 objects.
Remove XE_MSVC_OPTIMIZE_SMALL() use in x64_sequences, x64 backend is now always size optimized because of premake
Make global_critical_region_ static constexpr in shared_memory.h to get rid of wasteage of 8 bytes (empty class=1byte, +alignment for next member=8)
Move trace-related data to the tail of SharedMemory to keep more important data together
In IssueDraw build an array of fetch constant addresses/sizes, then pre-lock the global lock before doing requestrange for each instead of individually locking within requestrange for each of them
Consistent access specifier protected for pm4_command_processor_declare
Devirtualize WriteOneRegisterFromRing.
Move ExecutePacket and ExecutePrimaryBuffer to pm4_command_buffer_x
Remove many redundant header inclusions access xenia-gpu
Minor microoptimization of ExecutePacketType0
Add TextureCache::RequestTextures for batch invocation of LoadTexturesData
Add TextureCache::LoadTexturesData for reducing the number of times we release and reacquire the global lock.
Ideally you should hold the global lock for as little time as possible, but if you are constantly acquiring and releasing it you are actually more likely to have contention
Add already_locked param to ObjectTable::LookupObject to help with reducing lock acquire/release pairs
Add missing checks to XAudioRegisterRenderDriverClient_entry. this is unlikely to fix anything, it was just an easy thing to do
Add NtQueryEvent system call implementation. I don't actually know of any games that need it.
Instead of using std::vector + push_back in KeWaitForMultipleObjects and xeNtWaitForMultipleObjectsEx use a fixed size array of 64 and track the count. More than 64 objects is not permitted by the kernel. The repeated reallocations from push_back were appearing unusually high on the profiler, but were masked until now by waitformultipleobjects natural overhead
Pre-lock the global lock before looking up each handle for xeNtWaitForMultipleObjectsEx and KeWaitForMultipleObjects.
Pre-lock before looking up the signal and waiter in NtSignalAndWaitForSingleObjectEx
add missing checks to NtWaitForMultipleObjectsEx
Support pre-locking in XObject::GetNativeObject