Commit Graph

44 Commits

Author SHA1 Message Date
Gliniak ce9a82ccf8 Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental 2023-09-01 18:20:29 +02:00
Triang3l 53f98d1fe6 [GPU/D3D12] Memexport from anywhere in control flow + 8/16bpp memexport
There's no limit on the number of memory exports in a shader on the real
Xenos, and exports can be done anywhere, including in loops. Now, instead
of deferring the exports to the end of the shader, and assuming that export
allocs are executed only once, Xenia flushes exports when it reaches an
alloc (allocs terminate memory exports on Xenos, as well as individual ALU
instructions with `serialize`, but not handling this case for simplicity,
it's only truly mandatory to flush memory exports before starting a new
one), the end of the shader, or a pixel with outstanding exports is killed.

To know which eM# registers need to be flushed to the memory, traversing
the successors of each exec potentially writing any eM#, and specifying
that certain eM# registers might have potentially been written before each
reached control flow instruction, until a flush point or the end of the
shader is reached.

Also, some games export to sub-32bpp formats. These are now supported via
atomic AND clearing the bits of the dword to replace followed by an atomic
OR inserting the new byte/short.
2023-05-05 21:32:02 +03:00
chss95cs@gmail.com 8f7f7dc6ad fixed wine crash from use of NtSetEventPriorityBoost
add xe::clear_lowest_bit, use it in place of shift-andnot in some bit iteration code
make is_allocated_ and is_enabled_ volatile in xma_context
preallocate avpacket buffer in XMAContext::Setup, the reallocations of the buffer in ffmpeg were showing up on profiles
check is_enabled and is_allocated BEFORE locking an xmacontext. XMA worker was spending most of its time locking and unlocking contexts
Removed XeDMAC, dma:: namespace. It was a bad idea and I couldn't make it work in the end. Kept vastcpy and moved it to the memory namespace instead
Made the rest of global_critical_region's members static. They never needed an instance.
Removed ifdef'ed out code from ring_buffer.h
Added EventInfo struct to threading, added Event::Query to aid with implementing NtQueryEvent.
Removed vector from WaitMultiple, instead use a fixed array of 64 handles that we populate. WaitForMultipleObjects cannot handle more than 64 objects.
Remove XE_MSVC_OPTIMIZE_SMALL() use in x64_sequences, x64 backend is now always size optimized because of premake
Make global_critical_region_ static constexpr in shared_memory.h to get rid of wasteage of 8 bytes (empty class=1byte, +alignment for next member=8)
Move trace-related data to the tail of SharedMemory to keep more important data together
In IssueDraw build an array of fetch constant addresses/sizes, then pre-lock the global lock before doing requestrange for each instead of individually locking within requestrange for each of them
Consistent access specifier protected for pm4_command_processor_declare
Devirtualize WriteOneRegisterFromRing.
Move ExecutePacket and ExecutePrimaryBuffer to pm4_command_buffer_x
Remove many redundant header inclusions access xenia-gpu
Minor microoptimization of ExecutePacketType0

Add TextureCache::RequestTextures for batch invocation of LoadTexturesData

Add TextureCache::LoadTexturesData for reducing the number of times we release and reacquire the global lock.
Ideally you should hold the global lock for as little time as possible, but if you are constantly acquiring and releasing it you are actually more likely to have contention
Add already_locked param to ObjectTable::LookupObject to help with reducing lock acquire/release pairs
Add missing checks to XAudioRegisterRenderDriverClient_entry. this is unlikely to fix anything, it was just an easy thing to do
Add NtQueryEvent system call implementation. I don't actually know of any games that need it.
Instead of using std::vector + push_back in KeWaitForMultipleObjects and xeNtWaitForMultipleObjectsEx use a fixed size array of 64 and track the count. More than 64 objects is not permitted by the kernel. The repeated reallocations from push_back were appearing unusually high on the profiler, but were masked until now by waitformultipleobjects natural overhead
Pre-lock the global lock before looking up each handle for xeNtWaitForMultipleObjectsEx and KeWaitForMultipleObjects.
Pre-lock before looking up the signal and waiter in NtSignalAndWaitForSingleObjectEx
add missing checks to NtWaitForMultipleObjectsEx
Support pre-locking in XObject::GetNativeObject
2022-10-08 09:55:17 -07:00
Triang3l b787f2dec1 [GPU] GPR count limit is 128, not 64 2022-06-26 14:45:49 +03:00
Triang3l 913e1e949c [GPU] Ownership-transfer-based RT cache, 3x3 resolution scaling
The ROV path is also disabled by default because of lower performance
2021-04-26 22:12:09 +03:00
Triang3l e6fa0ad139 [GPU] Dynamic r# count via shader modifications + refactoring 2020-12-19 16:14:54 +03:00
Triang3l 9a4643d0f2 [GPU] Non-ROV f24 trunc/round, host shader modifications, cache dir 2020-12-07 22:31:46 +03:00
Triang3l fd18a97f3a [GPU] Shaders: Make label_addresses accessible to translators 2020-10-14 21:19:33 +03:00
Triang3l 4bb0ca0e09 [GPU] Move all xenos.h to gpu::xenos, disambiguate Dimension/TextureDimension 2020-07-11 15:54:22 +03:00
Triang3l 8a64861ec0 [DXBC] New tfetch: pre-swizzle signs, additive LOD + refactoring 2020-06-06 19:12:34 +03:00
Triang3l 3aa0ce3096 [GPU] Shader translator refactoring (mostly ALU), fixes for disassembly round trip and write masks 2020-05-08 23:57:51 +03:00
Triang3l 4b9f63cdf1 [GPU] Shader::HostVertexShaderType enum for domain shader types 2020-04-06 00:03:23 +03:00
Triang3l f83269cf8c [GPU] Refactor: Register structs in D3D12CommandProcessor and some other places 2019-10-19 23:32:38 +03:00
Triang3l 4825e69fda [D3D12] Cleanup primitive types and front/back facing 2019-07-13 22:25:03 +03:00
Triang3l 6672997108 [D3D12] ROV early Z and full rewrite, shader scalar optimizations 2019-07-11 09:31:58 +03:00
Triang3l 66a9c9d812 [GPU] Store ALU result after both vector and scalar instructions 2019-04-20 20:30:01 +03:00
Triang3l ef523823d5 [D3D12] Force early Z with DSV, fix blend disabled flag in rb_colorcontrol ignored 2019-01-11 17:07:33 +03:00
Triang3l 73baaa8e89 [GPU] Gather eA/eM# writes in shader translator 2018-12-21 10:06:41 +03:00
Triang3l adc0eb87f6 [GPU] Gather used memexport constants in shader translator 2018-12-20 10:14:18 +03:00
Triang3l 9a58841219 [D3D12] ROV: Use MSAA instead of SSAA 2018-11-25 16:37:38 +03:00
Triang3l f48ea20880 [D3D12] ROV: Track which RTs and components have been actually written 2018-10-18 14:54:33 +03:00
Triang3l c4599ff211 [D3D12] Compact float constants and don't split them into pages 2018-09-30 20:17:26 +03:00
Triang3l 1413a7d206 [GPU] Rename uses_register_relative_addressing to uses_register_dynamic_addressing 2018-09-02 16:31:21 +03:00
Triang3l da9f153a29 [D3D12] DXBC: Don't use indexable temps unless needed 2018-08-31 20:28:29 +03:00
Triang3l dad10c30e9 [GPU] Detect dynamic temp indexing before translating shaders 2018-08-31 20:06:53 +03:00
DrChat 77da785d70 [Vulkan] Use locally generated texture binding indices instead of GPU indices 2018-02-22 21:00:54 -06:00
Dr. Chat 300d1c57ba SPIR-V: Rewrite basic control-flow to use a while loop paired with a switch statement 2016-09-05 16:57:02 -05:00
Dr. Chat d94ff6eb25 Shaders: Track the register count from the program control register (if available) 2016-05-22 19:58:50 -05:00
Dr. Chat 9b2e2a7275 SPIR-V: Hack in OpSelectionMerge as hints to NVidia's shader compiler (TODO: Make a Shader Compiler) 2016-04-13 23:17:03 -05:00
Dr. Chat b7f2c93d73 SPIR-V: Batch predicated instructions together into a single block.
Add Post-Translation validation.
Fix a couple of type-related typos.
2016-04-09 21:03:44 -05:00
Dr. Chat 4aff1c19a7 (WIP) SPIR-V Shader Translator 2016-02-20 18:44:37 -06:00
Ben Vanik 35e08d9428 Switching from fork to main glslang spirv builder. 2016-02-18 16:43:41 -08:00
Ben Vanik 5ab0af9e6d Implementing shader constant register map construction. 2016-02-18 16:43:41 -08:00
Dr. Chat 1066362ada ShaderTranslator::GatherAllBindingInformation 2016-02-05 16:00:50 -06:00
Ben Vanik 280c0b35f6 Basic control flow skeleton and jumps implemented.
A-train's trees draw right now!
Helps a bit with #473 but still need to implement loops.
2015-12-06 11:44:22 -08:00
Ben Vanik c86e479214 Replacing old Shader with TranslatedShader. 2015-12-06 10:36:07 -08:00
Ben Vanik 51a8002629 Moving GL backend to new shader translator.
This seems to make a lot of things better, but may also break things.
Cleanup to follow.
2015-12-06 00:48:41 -08:00
Ben Vanik 2b3b423776 Mostly complete new GLSL translator (modulo flow control). 2015-12-05 17:44:06 -08:00
Ben Vanik 0058cae901 Adding pseudo code for all ucode ops from AMD docs. 2015-12-05 03:10:45 -08:00
Ben Vanik cd50aac6d2 Skeleton SPIRV translator. 2015-11-29 19:45:55 -08:00
Ben Vanik d2f7cc1602 Reworking translator code to be pretty sexy. 2015-11-29 16:55:42 -08:00
Ben Vanik 65130edaa1 First pass ShaderTranslator base type, able to disasm in msft style. 2015-11-28 16:19:04 -08:00
Ben Vanik 71b9995448 Skeleton SPIRV shader translator. 2015-11-28 16:10:26 -08:00
Ben Vanik 6a546ebe4d Shuffling spirv code so it's not tied to xe::gpu.
Will make it easier to use in standalone apps.
2015-11-24 19:49:05 -08:00