xenia-canary

Commit Graph

Author	SHA1	Message	Date
Gliniak	ce9a82ccf8	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2023-09-01 18:20:29 +02:00
Triang3l	53f98d1fe6	[GPU/D3D12] Memexport from anywhere in control flow + 8/16bpp memexport There's no limit on the number of memory exports in a shader on the real Xenos, and exports can be done anywhere, including in loops. Now, instead of deferring the exports to the end of the shader, and assuming that export allocs are executed only once, Xenia flushes exports when it reaches an alloc (allocs terminate memory exports on Xenos, as well as individual ALU instructions with `serialize`, but not handling this case for simplicity, it's only truly mandatory to flush memory exports before starting a new one), the end of the shader, or a pixel with outstanding exports is killed. To know which eM# registers need to be flushed to the memory, traversing the successors of each exec potentially writing any eM#, and specifying that certain eM# registers might have potentially been written before each reached control flow instruction, until a flush point or the end of the shader is reached. Also, some games export to sub-32bpp formats. These are now supported via atomic AND clearing the bits of the dword to replace followed by an atomic OR inserting the new byte/short.	2023-05-05 21:32:02 +03:00
chss95cs@gmail.com	8f7f7dc6ad	fixed wine crash from use of NtSetEventPriorityBoost add xe::clear_lowest_bit, use it in place of shift-andnot in some bit iteration code make is_allocated_ and is_enabled_ volatile in xma_context preallocate avpacket buffer in XMAContext::Setup, the reallocations of the buffer in ffmpeg were showing up on profiles check is_enabled and is_allocated BEFORE locking an xmacontext. XMA worker was spending most of its time locking and unlocking contexts Removed XeDMAC, dma:: namespace. It was a bad idea and I couldn't make it work in the end. Kept vastcpy and moved it to the memory namespace instead Made the rest of global_critical_region's members static. They never needed an instance. Removed ifdef'ed out code from ring_buffer.h Added EventInfo struct to threading, added Event::Query to aid with implementing NtQueryEvent. Removed vector from WaitMultiple, instead use a fixed array of 64 handles that we populate. WaitForMultipleObjects cannot handle more than 64 objects. Remove XE_MSVC_OPTIMIZE_SMALL() use in x64_sequences, x64 backend is now always size optimized because of premake Make global_critical_region_ static constexpr in shared_memory.h to get rid of wasteage of 8 bytes (empty class=1byte, +alignment for next member=8) Move trace-related data to the tail of SharedMemory to keep more important data together In IssueDraw build an array of fetch constant addresses/sizes, then pre-lock the global lock before doing requestrange for each instead of individually locking within requestrange for each of them Consistent access specifier protected for pm4_command_processor_declare Devirtualize WriteOneRegisterFromRing. Move ExecutePacket and ExecutePrimaryBuffer to pm4_command_buffer_x Remove many redundant header inclusions access xenia-gpu Minor microoptimization of ExecutePacketType0 Add TextureCache::RequestTextures for batch invocation of LoadTexturesData Add TextureCache::LoadTexturesData for reducing the number of times we release and reacquire the global lock. Ideally you should hold the global lock for as little time as possible, but if you are constantly acquiring and releasing it you are actually more likely to have contention Add already_locked param to ObjectTable::LookupObject to help with reducing lock acquire/release pairs Add missing checks to XAudioRegisterRenderDriverClient_entry. this is unlikely to fix anything, it was just an easy thing to do Add NtQueryEvent system call implementation. I don't actually know of any games that need it. Instead of using std::vector + push_back in KeWaitForMultipleObjects and xeNtWaitForMultipleObjectsEx use a fixed size array of 64 and track the count. More than 64 objects is not permitted by the kernel. The repeated reallocations from push_back were appearing unusually high on the profiler, but were masked until now by waitformultipleobjects natural overhead Pre-lock the global lock before looking up each handle for xeNtWaitForMultipleObjectsEx and KeWaitForMultipleObjects. Pre-lock before looking up the signal and waiter in NtSignalAndWaitForSingleObjectEx add missing checks to NtWaitForMultipleObjectsEx Support pre-locking in XObject::GetNativeObject	2022-10-08 09:55:17 -07:00
Triang3l	b787f2dec1	[GPU] GPR count limit is 128, not 64	2022-06-26 14:45:49 +03:00
Triang3l	913e1e949c	[GPU] Ownership-transfer-based RT cache, 3x3 resolution scaling The ROV path is also disabled by default because of lower performance	2021-04-26 22:12:09 +03:00
Triang3l	e6fa0ad139	[GPU] Dynamic r# count via shader modifications + refactoring	2020-12-19 16:14:54 +03:00
Triang3l	9a4643d0f2	[GPU] Non-ROV f24 trunc/round, host shader modifications, cache dir	2020-12-07 22:31:46 +03:00
Triang3l	fd18a97f3a	[GPU] Shaders: Make label_addresses accessible to translators	2020-10-14 21:19:33 +03:00
Triang3l	4bb0ca0e09	[GPU] Move all xenos.h to gpu::xenos, disambiguate Dimension/TextureDimension	2020-07-11 15:54:22 +03:00
Triang3l	8a64861ec0	[DXBC] New tfetch: pre-swizzle signs, additive LOD + refactoring	2020-06-06 19:12:34 +03:00
Triang3l	3aa0ce3096	[GPU] Shader translator refactoring (mostly ALU), fixes for disassembly round trip and write masks	2020-05-08 23:57:51 +03:00
Triang3l	4b9f63cdf1	[GPU] Shader::HostVertexShaderType enum for domain shader types	2020-04-06 00:03:23 +03:00
Triang3l	f83269cf8c	[GPU] Refactor: Register structs in D3D12CommandProcessor and some other places	2019-10-19 23:32:38 +03:00
Triang3l	4825e69fda	[D3D12] Cleanup primitive types and front/back facing	2019-07-13 22:25:03 +03:00
Triang3l	6672997108	[D3D12] ROV early Z and full rewrite, shader scalar optimizations	2019-07-11 09:31:58 +03:00
Triang3l	66a9c9d812	[GPU] Store ALU result after both vector and scalar instructions	2019-04-20 20:30:01 +03:00
Triang3l	ef523823d5	[D3D12] Force early Z with DSV, fix blend disabled flag in rb_colorcontrol ignored	2019-01-11 17:07:33 +03:00
Triang3l	73baaa8e89	[GPU] Gather eA/eM# writes in shader translator	2018-12-21 10:06:41 +03:00
Triang3l	adc0eb87f6	[GPU] Gather used memexport constants in shader translator	2018-12-20 10:14:18 +03:00
Triang3l	9a58841219	[D3D12] ROV: Use MSAA instead of SSAA	2018-11-25 16:37:38 +03:00
Triang3l	f48ea20880	[D3D12] ROV: Track which RTs and components have been actually written	2018-10-18 14:54:33 +03:00
Triang3l	c4599ff211	[D3D12] Compact float constants and don't split them into pages	2018-09-30 20:17:26 +03:00
Triang3l	1413a7d206	[GPU] Rename uses_register_relative_addressing to uses_register_dynamic_addressing	2018-09-02 16:31:21 +03:00
Triang3l	da9f153a29	[D3D12] DXBC: Don't use indexable temps unless needed	2018-08-31 20:28:29 +03:00
Triang3l	dad10c30e9	[GPU] Detect dynamic temp indexing before translating shaders	2018-08-31 20:06:53 +03:00
DrChat	77da785d70	[Vulkan] Use locally generated texture binding indices instead of GPU indices	2018-02-22 21:00:54 -06:00
Dr. Chat	300d1c57ba	SPIR-V: Rewrite basic control-flow to use a while loop paired with a switch statement	2016-09-05 16:57:02 -05:00
Dr. Chat	d94ff6eb25	Shaders: Track the register count from the program control register (if available)	2016-05-22 19:58:50 -05:00
Dr. Chat	9b2e2a7275	SPIR-V: Hack in OpSelectionMerge as hints to NVidia's shader compiler (TODO: Make a Shader Compiler)	2016-04-13 23:17:03 -05:00
Dr. Chat	b7f2c93d73	SPIR-V: Batch predicated instructions together into a single block. Add Post-Translation validation. Fix a couple of type-related typos.	2016-04-09 21:03:44 -05:00
Dr. Chat	4aff1c19a7	(WIP) SPIR-V Shader Translator	2016-02-20 18:44:37 -06:00
Ben Vanik	35e08d9428	Switching from fork to main glslang spirv builder.	2016-02-18 16:43:41 -08:00
Ben Vanik	5ab0af9e6d	Implementing shader constant register map construction.	2016-02-18 16:43:41 -08:00
Dr. Chat	1066362ada	ShaderTranslator::GatherAllBindingInformation	2016-02-05 16:00:50 -06:00
Ben Vanik	280c0b35f6	Basic control flow skeleton and jumps implemented. A-train's trees draw right now! Helps a bit with #473 but still need to implement loops.	2015-12-06 11:44:22 -08:00
Ben Vanik	c86e479214	Replacing old Shader with TranslatedShader.	2015-12-06 10:36:07 -08:00
Ben Vanik	51a8002629	Moving GL backend to new shader translator. This seems to make a lot of things better, but may also break things. Cleanup to follow.	2015-12-06 00:48:41 -08:00
Ben Vanik	2b3b423776	Mostly complete new GLSL translator (modulo flow control).	2015-12-05 17:44:06 -08:00
Ben Vanik	0058cae901	Adding pseudo code for all ucode ops from AMD docs.	2015-12-05 03:10:45 -08:00
Ben Vanik	cd50aac6d2	Skeleton SPIRV translator.	2015-11-29 19:45:55 -08:00
Ben Vanik	d2f7cc1602	Reworking translator code to be pretty sexy.	2015-11-29 16:55:42 -08:00
Ben Vanik	65130edaa1	First pass ShaderTranslator base type, able to disasm in msft style.	2015-11-28 16:19:04 -08:00
Ben Vanik	71b9995448	Skeleton SPIRV shader translator.	2015-11-28 16:10:26 -08:00
Ben Vanik	6a546ebe4d	Shuffling spirv code so it's not tied to xe::gpu. Will make it easier to use in standalone apps.	2015-11-24 19:49:05 -08:00

44 Commits