xenia-canary

Commit Graph

Author	SHA1	Message	Date
chss95cs@gmail.com	20638c2e61	use Sleep(0) instead of SwitchToThread, should waste less power and help the os with scheduling. PM4 buffer handling made a virtual member of commandprocessor, place the implementation/declaration into reusable macro files. this is probably the biggest boost here. Optimized SET_CONSTANT/ LOAD_CONSTANT pm4 ops based on the register range they start writing at, this was also a nice boost Expose X64 extension flags to code outside of x64 backend, so we can detect and use things like avx512, xop, avx2, etc in normal code Add freelists for HIR structures to try to reduce the number of last level cache misses during optimization (currently disabled... fixme later) Analyzed PGO feedback and reordered branches, uninlined functions, moved code out into different functions based on info from it in the PM4 functions, this gave like a 2% boost at best. Added support for the db16cyc opcode, which is used often in xb360 spinlocks. before it was just being translated to nop, now on x64 we translate it to _mm_pause but may change that in the future to reduce cpu time wasted texture util - all our divisors were powers of 2, instead we look up a shift. this made texture scaling slightly faster, more so on intel processors which seem to be worse at int divs. GetGuestTextureLayout is now a little faster, although it is still one of the heaviest functions in the emulator when scaling is on. xe_unlikely_mutex was not a good choice for the guest clock lock, (running theory) on intel processors another thread may take a significant time to update the clock? maybe because of the uint64 division? really not sure, but switched it to xe_mutex. This fixed audio stutter that i had introduced to 1 or 2 games, fixed performance on that n64 rare game with the monkeys. Took another crack at DMA implementation, another failure. Instead of passing as a parameter, keep the ringbuffer reader as the first member of commandprocessor so it can be accessed through this Added macro for noalias Applied noalias to Memory::LookupHeap. This reduced the size of the executable by 7 kb. Reworked kernel shim template, this shaved like 100kb off the exe and eliminated the indirect calls from the shim to the actual implementation. We still unconditionally generate string representations of kernel calls though :(, unless it is kHighFrequency Add nvapi extensions support, currently unused. Will use CPUVISIBLE memory at some point Inserted prefetches in a few places based on feedback from vtune. Add native implementation of SHA int8 if all elements are the same Vectorized comparisons for SetViewport, SetScissorRect Vectorized ranged comparisons for WriteRegister Add XE_MSVC_ASSUME Move FormatInfo::name out of the structure, instead look up the name in a different table. Debug related data and critical runtime data are best kept apart Templated UpdateSystemConstantValues based on ROV/RTV and primitive_polygonal Add ArchFloatMask functions, these are for storing the results of floating point comparisons without doing costly float->int pipeline transfers (vucomiss/setb) Use floatmasks in UpdateSystemConstantValues for checking if dirty, only transfer to int at end of function. Instead of dirty \|= (x == y) in UpdateSystemConstantValues, now we do dirty_u32 \|= (x^y). if any of them are not equal, dirty_u32 will be nz, else if theyre all equal it will be zero. This is more friendly to register renaming and the lack of dependencies on EFLAGS lets the compiler reorder better Add PrefetchSamplerParameters to D3D12TextureCache use PrefetchSamplerParameters in UpdateBindings to eliminate cache misses that vtune detected Add PrefetchTextureBinding to D3D12TextureCache Prefetch texture bindings to get rid of more misses vtune detected (more accesses out of order with random strides) Rewrote DMAC, still terrible though and have disabled it for now. Replace tiny memcmp of 6 U64 in render_target_cache with inline loop, msvc fails to make it a loop and instead does a thunk to their memcmp function, which is optimized for larger sizes PrefetchTextureBinding in AreActiveTextureSRVKeysUpToDate Replace memcmp calls for pipelinedescription with handwritten cmp Directly write some registers that dont have special handling in PM4 functions Changed EstimateMaxY to try to eliminate mispredictions that vtune was reporting, msvc ended up turning the changed code into a series of blends in ExecutePacketType3_EVENT_WRITE_EXT, instead of writing extents to an array on the stack and then doing xe_copy_and_swap_16 of the data to its dest, pre-swap each constant and then store those. msvc manages to unroll that into wider stores stop logging XE_SWAP every time we receive XE_SWAP, stop logging the start and end of each viz query Prefetch watch nodes in FireWatches based on feedback from vtune Removed dead code from texture_info.cc NOINLINE on GpuSwap, PGO builds did it so we should too.	2022-09-11 14:14:48 -07:00
chss95cs@gmail.com	c6010bd4b1	nasty commit with a bunch of test code left in, will clean up and pr Remove the logger_ != nullptr check from shouldlog, it will nearly always be true except on initialization and gets checked later anyway, this shrinks the size of the generated code for some Select specialized vastcpy for current cpu, for now only have paths for MOVDIR64B and generic avx1 Add XE_UNLIKELY/LIKELY if, they map better to the c++ unlikely/likely attributes which we will need to use soon Finished reimplementing STVL/STVR/LVL/LVR as their own opcodes. we now generate far less code for these instructions. this also means optimization passes can be written to simplify/remove/replace these instructions in some cases. Found that a good deal of the X86 we were emitting for these instructions was dead code or redundant. the reduction in generated HIR/x86 should help a lot with compilation times and make function precompilation more feasible as a default Don't static assert in default prefetch impl, in c++20 the assertion will be triggered even without an instantiation Reorder some if/else to prod msvc into ordering the branches optimally. it somewhat worked... Added some notes about which opcodes should be removed/refactored Dispatch in WriteRegister via vector compares for the bounds. still not very optimal, we ought to be checking whether any register in a range may be special A lot of work on trying to optimize writeregister, moved wraparound path into a noinline function based on profiling info Hoist the IsUcodeAnalyzed check out of AnalyzeShader, instead check it before each call. Profiler recorded many hits in the stack frame setup of the function, but none in the actual body of it, so the check is often true but the stack frame setup is run unconditionally Pre-check whether we're about to write a single register from a ring Replace more jump tables from draw_util/texture_info with popcnt based sparse indexing/bit tables/shuffle lookups Place the GPU register file on its own VAD/virtual allocation, it is no longer a member of graphics system	2022-09-04 11:04:41 -07:00
chss95cs@gmail.com	78c9a48bc2	also use vastcpy for shared memory page stuff	2022-08-28 14:52:12 -07:00
chss95cs@gmail.com	f31869092c	Fixed a bug with readback_resolve and readback_memexport that was responsible for a large portion of their overhead. readback_memexport and resolve are now usable for games, depending on your hardware. in my case games that were slideshows now run at like 20-30 fps, and my hardware isnt the best for xenia. add split_map class for mapping keys to values in a way that optimizes for frequent searches and infrequent insertions/removals remove jump table implementation of GetColorRenderTargetFormatComponentCount, it was appearing relatively high in profiles. instead pack the component counts into a single 32 bit word, which is indexed by shifting Add cvar to align all basic blocks to a boundary Add mmio aware load paths liberally apply XE_RESTRICT in ringbuffer related code Removed the IS_TRUE and IS_FALSE opcodes, they were pointless duplicates of COMPARE_EQ/COMPARE_NE and i want to simplify our set of opcodes for future backends More work on LVSR/LVSL/STVR/STVL opcodes Optimized X64 translated code emission, now only compute instrkey once Add code for pre-computing integer division magic numbers Optimized GetHostViewportInfo a little Move args for GetHostViewportInfo into a class, cache the result and compare for future queries. moved GetHostViewportInfo far lower on the profile Add (currently not functional, and very racy) asynchronous memcpy code. will improve it and actually use it in future commits. Add non-temporal memcpy function for huge page-aligned allocations. Used for copying to shared memory/readback hoist are_accumulated_render_targets_valid_ check out of loop in render_target_cache already bound check. Add stosb/movsb code for small constant memcpys/memsets that arent worth the overhead of memcpy/memset	2022-08-28 14:24:25 -07:00
Radosław Gliński	0b013fdc6b	Merge pull request #61 from chrisps/canary_experimental performance improvements, kernel fixes, cpu accuracy improvements	2022-08-21 09:31:09 +02:00
chss95cs@gmail.com	457296850e	Add OPCODE_NEGATED_MUL_ADD/OPCODE_NEGATED_MUL_SUB Proper handling of nans for VMX max/min on x64 (minps/maxps has special behavior depending on the operand order that vmx does not have for vminfp/vmaxfp) Add extremely unintrusive guest code profiler utilizing KUSER_SHARED systemtime. This profiler is disabled on platforms other than windows, and on windows is disabled by default by a cvar Repurpose GUEST_SCRATCH64 stack offset to instead be for storing guest function profile times, define GUEST_SCRATCH as 0 instead, since thats already meant to be a scratch area Fix xenia silently closing on config errors/other fatal errors by setting has_console_attached_'s default to false Add alternative code path for guest clock that uses kusershared systemtime instead of QueryPerformanceCounter. This is way faster and I have tested it and found it to be working, but i have disabled it because i do not know how well it works on wine or on processors other than mine Significantly reduce log spam by setting XELOGAPU and XELOGGPU to be LogLevel::Debug Changed some LOGI to LOGD in places to reduce log spam Mark VdSwap as kHighFrequency, it was spamming up logs Make logging calls less intrusive for the caller by forcing the test of log level inline and moving the format/AppendLogLine stuff to an outlined cold function Add swcache namespace for software cache operations like prefetches, streaming stores and streaming loads. Add XE_MSVC_REORDER_BARRIER for preventing msvc from propagating a value too close to its store or from its load Add xe_unlikely_mutex for locks we know have very little contention add XE_HOST_CACHE_LINE_SIZE and XE_RESTRICT to platform.h Microoptimization: Changed most uses of size_t to ring_size_t in RingBuffer, this reduces the size of the inlined ringbuffer operations slightly by eliminating rex prefixes, depending on register allocation Add BeginPrefetchedRead to ringbuffer, which prefetches the second range if there is one according to the provided PrefetchTag added inline_loadclock cvar, which will directly use the value of the guest clock from clock.cc in jitted guest code. off by default change uses of GUEST_SCRATCH64 to GUEST_SCRATCH Add fast vectorized xenos_half_to_float/xenos_float_to_half (currently resides in x64_seq_vector, move to gpu code maybe at some point) Add fast x64 codegen for PackFloat16_4/UnpackFloat16_4. Same code can be used for Float16_2 in future commit. This should speed up some games that use these functions heavily Remove cvar for toggling old float16 behavior Add VRSAVE register, support mfspr/mtspr vrsave Add cvar for toggling off codegen for trap instructions and set it to true by default. Add specialized methods to CommandProcessor: WriteRegistersFromMem, WriteRegisterRangeFromRing, and WriteOneRegisterFromRing. These reduce the overall cost of WriteRegister Use a fixed size vmem vector for upload ranges, realloc/memsetting on resize in the inner loop of requestranges was showing up on the profiler (the search in requestranges itself needs work) Rename fixed_vmem_vector to better fit xenia's naming convention Only log unknown register writes in WriteRegister if DEBUG :/. We're stuck on MSVC with c++17 so we have no way of influencing the branch ordering for that function without profile guided optimization Remove binding stride assert in shader_translator.cc, triangle told me its leftover ogl stuff Mark xe::FatalError as noreturn If a controller is not connected, delay by 1.1 seconds before checking if it has been reconnected. Asking Xinput about a controller slot that is unused is extremely slow, and XinputGetState/SetState were taking up an enormous amount of time in profiles. this may have caused a bit of input lag Protect accesses to input_system with a lock Add proper handling for user_index>= 4 in XamInputGetState/SetState, properly return zeroed state in GetState Add missing argument to NtQueryVirtualMemory_entry Fixed RtlCompareMemoryUlong_entry, it actually does not care if the source is misaligned, and for length it aligns down Fixed RtlUpperChar and RtlLowerChar, added a table that has their correct return values precomputed	2022-08-20 11:40:19 -07:00
Gliniak	e06978e5be	[Premake] Cleanup & Fixed references in cpu-tests	2022-08-17 09:43:55 +02:00
chss95cs@gmail.com	7cc364dcb8	squash reallocs in command buffers by using large prealloced buffer, directly use virtual memory with it so os allocs on demand mark raw clock functions as noinline, the way msvc was inlining them and ordering the branches meant that rdtsc would often be speculatively executed add alternative clock impl for win, instead of using queryperformancecounter we grab systemtime from kusershared. it does not have the same precision as queryperformancecounter, we only have 100 nanosecond precision, but we round to milliseconds so it never made sense to use the performance counter in the first place stubbed out the "guest clock mutex"... (the entirety of clock.cc needs a rewrite) added some helpers for minf/maxf without the nan handling behavior	2022-08-14 13:42:08 -07:00
chss95cs@gmail.com	c9b2d10e17	alternative mutex impl on windows works but i really can't tell if it helps much. use larger size in deferred_command_list to cut down on resizes in big scenes on m:dur	2022-08-14 10:26:50 -07:00
chss95cs@gmail.com	08f7a28920	Alternative mutex	2022-08-14 08:59:11 -07:00
chss95cs@gmail.com	cb85fe401c	Huge set of performance improvements, combined with an architecture specific build and clang-cl users have reported absurd gains over master for some gains, in the range 50%-90% But for normal msvc builds i would put it at around 30-50% Added per-xexmodule caching of information per instruction, can be used to remember what code needs compiling at start up Record what guest addresses wrote mmio and backpropagate that to future runs, eliminating dependence on exception trapping. this makes many games like h3 actually tolerable to run under a debugger fixed a number of errors where temporaries were being passed by reference/pointer Can now be compiled with clang-cl 14.0.1, requires -Werror off though and some other solution/project changes. Added macros wrapping compiler extensions like noinline, forceinline, __expect, and cold. Removed the "global lock" in guest code completely. It does not properly emulate the behavior of mfmsrd/mtmsr and it seriously cripples amd cpus. Removing this yielded around a 3x speedup in Halo Reach for me. Disabled the microprofiler for now. The microprofiler has a huge performance cost associated with it. Developers can re-enable it in the base/profiling header if they really need it Disable the trace writer in release builds. despite just returning after checking if the file was open the trace functions were consuming about 0.60% cpu time total Add IsValidReg, GetRegisterInfo is a huge (about 45k) branching function and using that to check if a register was valid consumed a significant chunk of time Optimized RingBuffer::ReadAndSwap and RingBuffer::read_count. This gave us the largest overall boost in performance. The memcpies were unnecessary and one of them was always a no-op Added simplification rules for multiplicative patterns like (x+x), (x<<1)+x For the most frequently called win32 functions i added code to call their underlying NT implementations, which lets us skip a lot of MS code we don't care about/isnt relevant to our usecases ^this can be toggled off in the platform_win header handle indirect call true with constant function pointer, was occurring in h3 lookup host format swizzle in denser array by default, don't check if a gpu register is unknown, instead just check if its out of range. controlled by a cvar ^looking up whether its known or not took approx 0.3% cpu time Changed some things in /cpu to make the project UNITYBUILD friendly The timer thread was spinning way too much and consuming a ton of cpu, changed it to use a blocking wait instead tagged some conditions as XE_UNLIKELY/LIKELY based on profiler feedback (will only affect clang builds) Shifted around some code in CommandProcessor::WriteRegister based on how frequently it was executed added support for docdecaduple precision floating point so that we can represent our performance gains numerically tons of other stuff im probably forgetting	2022-08-13 12:59:00 -07:00
Gliniak	0e3403d6da	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-30 12:42:51 +02:00
Triang3l	7595cdb52b	[Vulkan] Non-GS point sprites + minor SPIR-V fixes	2022-07-27 17:14:28 +03:00
Triang3l	ff7ef05063	[SPIR-V] Clamp cube face using NClamp, not NMax/FMin	2022-07-26 17:08:12 +03:00
Triang3l	66c995f3aa	[SPIR-V] Saturate point sprite coordinates	2022-07-26 17:04:22 +03:00
Triang3l	8fb5da18ea	[Vulkan] Add forgotten fullDrawIndexUint32 check	2022-07-26 16:24:14 +03:00
Triang3l	9fa41c27bc	[Vulkan] Point sprite geometry shader	2022-07-26 16:01:20 +03:00
Gliniak	76806e08c5	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-26 10:22:38 +02:00
Triang3l	f248e23079	[DXBC] Skip backface check in point PsParamGen	2022-07-25 21:48:25 +03:00
Triang3l	77e85ecaa4	[Vulkan] 32-bit index fetch without fullDrawIndexUint32	2022-07-25 16:53:12 +03:00
Gliniak	98c2cb636f	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-24 17:38:08 +02:00
Triang3l	37579d3bf0	[GPU] Treat non-adaptive-tessellated patches as 1-control-point	2022-07-24 17:38:26 +03:00
Gliniak	1fcac00924	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-23 13:26:31 +02:00
Triang3l	3c12814276	[GPU] EDRAM looped addressing (resolves #2031 )	2022-07-22 23:51:50 +03:00
Gliniak	0c782ade8e	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-21 18:52:33 +02:00
Triang3l	6ff312afb1	[DXBC] Update PsParamGen comment [ci skip]	2022-07-21 12:42:06 +03:00
Triang3l	1a95bef8b3	[GPU] Eliminate unused shader I/O, UCP culling, centroid on Vulkan For more optimal usage of exports and the parameter cache on the host regardless of how effective the optimizations in the host GPU driver are. Also reserve space for Vulkan/Metal/D3D11-specific HostVertexShaderTypes to use one more bit for the host vertex shader type in the shader modification bits, so that won't have to be done in the future as that would require invalidating shader storages (which are invalidated by this commit) again.	2022-07-21 12:32:28 +03:00
Gliniak	bc315d21e0	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-19 10:45:14 +02:00
Triang3l	0a94b86cb8	[GPU] Remove orphaned GetPresentArea declaration [ci skip]	2022-07-18 21:02:34 +03:00
Gliniak	6e1e62378f	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-17 21:27:52 +02:00
Triang3l	14fdf4b270	[GPU] Up to 7x7 resolution scaling	2022-07-17 20:41:50 +03:00
Triang3l	e8652e544a	[GPU] Translucent trace viewer controls	2022-07-17 17:29:41 +03:00
Triang3l	25663827ba	[GPU] Trace viewer Android content URI loading	2022-07-17 16:37:49 +03:00
Triang3l	2a69d1db4d	[Vulkan] Fix a typo in a comment about BC textures [ci skip]	2022-07-14 21:16:23 +03:00
Triang3l	037310f8dc	[Android] Unified xenia-app with windowed apps and build prerequisites	2022-07-11 21:45:57 +03:00
Gliniak	1d00372e6b	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-10 10:50:39 +02:00
Triang3l	b41bb35a20	[SPIR-V] Make interpolators an array to fix Adreno linkage	2022-07-09 17:52:26 +03:00
Triang3l	b3edc56576	[Vulkan] Merge texture and sampler descriptors into a single descriptor set Put all descriptors used by translated shaders in up to 4 descriptor sets, which is the minimum required, and the most common on Android, `maxBoundDescriptorSets` device limit value	2022-07-09 17:10:28 +03:00
Triang3l	e4de8663c4	[Vulkan] All guest draw uniform buffer bindings in a single descriptor set Reduce the number of bound descriptor sets from 10 to 6, which is still above the minimum limit of 4, but closer	2022-07-07 21:05:56 +03:00
Triang3l	88c055eb30	[CPU] Null backend enough for GPU trace viewing	2022-07-06 23:28:06 +03:00
Triang3l	3ee68d79ea	Revert "[GPU] Make Processor optional for GraphicsSystem setup" The Processor is still required in many places, including the GPU command processor worker thread This reverts commit `fd03d886e9`.	2022-07-06 22:43:40 +03:00
Triang3l	fd03d886e9	[GPU] Make Processor optional for GraphicsSystem setup	2022-07-05 21:21:22 +03:00
Triang3l	bdfd410b13	[CPU] Cleanup x64 backend usage conditionals	2022-07-05 21:07:10 +03:00
Triang3l	d263d508cd	[GPU] Make operator< const	2022-07-05 20:47:53 +03:00
Triang3l	536f14d94c	[GPU] Fix a typo in a Neon intrinsic name	2022-07-05 20:47:34 +03:00
Triang3l	feaad639fb	[Vulkan] Destroy all RTs before VulkanRenderTargetCache is destroyed	2022-07-04 11:27:51 +03:00
Gliniak	6e753c6399	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-04 08:11:04 +02:00
Triang3l	2621dabf0f	[Vulkan] Native 24-bit unorm depth where available	2022-07-03 21:21:17 +03:00
Triang3l	ee84f4e267	[Vulkan] Update title bar warning	2022-07-03 19:45:48 +03:00
Triang3l	001f64852c	[Vulkan] VMA for textures	2022-07-03 19:40:48 +03:00
Gliniak	a8df744ea6	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-07-03 13:07:15 +02:00
Triang3l	636585e0aa	[Vulkan] Trace viewer	2022-07-01 19:53:41 +03:00
Triang3l	ad1ef84145	Merge branch 'master' into vulkan	2022-07-01 19:53:08 +03:00
Triang3l	e37e3ef382	[GPU] Display swap output in the trace viewer Resolve output is unreliable because resolving may be done to a subregion of a texture and even to 3D textures, and to any color format	2022-07-01 19:50:19 +03:00
Triang3l	c8a4a9504f	[Vulkan] Remove an unneeded scale from RefreshGuestOutput aspect ratio	2022-07-01 12:52:12 +03:00
Triang3l	d174762a40	Merge branch 'master' into vulkan	2022-07-01 12:51:34 +03:00
Triang3l	28670d8ec2	[UI] Presenter: Rename display size to aspect ratio	2022-07-01 12:50:45 +03:00
Triang3l	f8b351138e	[Vulkan] Alpha test	2022-06-30 22:20:51 +03:00
Triang3l	6772c88141	Merge branch 'master' into vulkan	2022-06-30 22:15:29 +03:00
Triang3l	7e691d5ef1	[DXBC] Handle NaN in not equal alpha test as passed	2022-06-30 22:15:01 +03:00
Triang3l	c0c3666e12	[Vulkan] Align texture extents in loading to vector size accessed by the shader Fixes loading of the 1x1 linear 8_8_8_8 texture containing just a single #FFFFFFFF texel in 4D5307E6, which is used for screen fade and the lobby map loading bar background	2022-06-29 23:41:32 +03:00
Triang3l	9392fff369	Merge branch 'master' into vulkan	2022-06-29 23:39:54 +03:00
Triang3l	a11b070fee	[GPU] Align texture extents in loading to host buffer texel size accessed by the shader	2022-06-29 23:38:06 +03:00
Triang3l	7c2df55209	[Vulkan] Cache clear: shared memory, scratch buffer	2022-06-29 13:24:45 +03:00
Triang3l	d5815d9e6a	[Vulkan] Float24 depth range remapping fixes	2022-06-29 13:14:00 +03:00
Gliniak	efe3cd96d6	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-29 09:21:09 +02:00
Triang3l	05ef7a273a	[Vulkan] Samplers (only 1.0 core features for now)	2022-06-28 22:42:18 +03:00
Triang3l	5d9061cf99	Merge branch 'master' into vulkan	2022-06-28 22:05:45 +03:00
Triang3l	243683d2e9	[GPU] Cleanup Texture::MarkAsUsed conditionals	2022-06-28 22:04:26 +03:00
Triang3l	382710bab7	[GPU] Normalize sampler clamp modes	2022-06-28 21:58:58 +03:00
Triang3l	cedc94679b	[GPU] Don't drop the rest of the command list if IssueDraw fails	2022-06-28 21:40:06 +03:00
Gliniak	e6898fda66	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-26 20:11:33 +02:00
Triang3l	9672230d9f	Merge branch 'master' into vulkan	2022-06-26 18:59:49 +03:00
Triang3l	ec008463b6	[GPU] CrYCb/YCrCb border colors	2022-06-26 18:56:50 +03:00
Triang3l	2606fa5709	[GPU] Apply BaseMap MipFilter via samplers as it may be overridden Make it have no effect on the texture resource as a resource may be used with samplers with different overrides. Also make sure magnification vs. minification is not undefined with it on Direct3D 12.	2022-06-26 18:41:38 +03:00
Triang3l	e191430091	Merge branch 'master' into vulkan	2022-06-26 16:58:27 +03:00
Triang3l	086a070fa9	[GPU] Explicitly cast bit field values in std::min/max According to the integral promotion rules https://eel.is/c++draft/conv.prom#5.sentence-1 bit fields can be promoted to `int` if it's wide enough to store their value, and then otherwise, to `unsigned int`. Hopefully fixes Clang building (the `width_div_8` case).	2022-06-26 16:54:11 +03:00
Triang3l	e0b890fe5c	[DXBC] Remove alphatest/A2C with [earlydepthstencil]	2022-06-26 15:31:08 +03:00
Triang3l	6688b13773	[Vulkan] PsParamGen	2022-06-26 15:01:27 +03:00
Triang3l	a99a1be880	Merge branch 'master' into vulkan	2022-06-26 15:00:21 +03:00
Triang3l	b787f2dec1	[GPU] GPR count limit is 128, not 64	2022-06-26 14:45:49 +03:00
Triang3l	a5c8df7a37	[Vulkan] Remove UB-based independent blend logic On Vulkan, unlike Direct3D, not writing to a color target in the fragment shader produces an undefined result.	2022-06-25 20:57:44 +03:00
Triang3l	d8b2944caa	[Vulkan] Handle unsupported fillModeNonSolid + fix portability subset feature checks	2022-06-25 20:46:52 +03:00
Triang3l	d30d59883a	[Vulkan] Color exponent bias and gamma conversion	2022-06-25 20:35:13 +03:00
Triang3l	b1be33004a	Merge branch 'master' into vulkan	2022-06-25 20:31:26 +03:00
Triang3l	4812b4ba8b	[D3D12] Fix outdated color system constants comment [ci skip]	2022-06-25 20:31:05 +03:00
Triang3l	5dca11a892	[SPIR-V] Fix fetch constant LOD bias signedness	2022-06-25 16:33:35 +03:00
Triang3l	d8b0227cbd	[SPIR-V] Fix cubemap X axis	2022-06-25 16:25:29 +03:00
Triang3l	758db4ccb3	[Vulkan] Fix textures not loaded if using a shader for the first time	2022-06-25 15:15:06 +03:00
Triang3l	4db445c6f9	Merge branch 'master' into vulkan	2022-06-25 15:13:41 +03:00
Triang3l	aa45d7b47d	[D3D12] More descriptive pipeline creation call comment [ci skip]	2022-06-25 15:13:11 +03:00
Triang3l	c37c05d189	[Vulkan] Remove an outdated fullscreen shader comment [ci skip]	2022-06-25 14:35:15 +03:00
Triang3l	4b4205ba00	[Vulkan] Frontbuffer presentation	2022-06-25 14:33:43 +03:00
Triang3l	3fc7d8753c	Merge branch 'master' into vulkan	2022-06-24 23:38:04 +03:00
Triang3l	f4a634c617	[XeSL] xesl_writeStore > xesl_Store	2022-06-24 23:37:29 +03:00
Triang3l	7a4732e14f	[GPU] XeSL swap shaders	2022-06-24 23:24:30 +03:00
Triang3l	b7737d70ca	[D3D12] Update RequestSwapTexture resource state comment [ci skip]	2022-06-23 22:59:53 +03:00
Gliniak	ce3b159683	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-22 21:05:45 +02:00
Triang3l	227d495738	Merge branch 'master' into vulkan	2022-06-22 21:19:29 +03:00
Triang3l	e9f129f67f	[GPU] Safer and more correct depth bias conversion Float24-as-float32 depth bias is now in the increments of 8, because conversion of the depth to float24 directly in the pixel shaders may destroy the bias qualitatively otherwise if it's too small.	2022-06-22 21:14:40 +03:00
Triang3l	a7885ae1a4	[GPU] Fix CPU-side float24 conversion broken recently	2022-06-22 20:47:44 +03:00
Triang3l	4514050f55	[Vulkan] Truncate depth to float24 in EDRAM range ownership transfers and resolves by default Doesn't ruin the "greater or equal" depth test in subsequent rendering passes if precision is lost, unlike rounding to the nearest	2022-06-22 13:25:06 +03:00
Gliniak	e7a122d943	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-22 12:18:13 +02:00
Triang3l	0d8bd0e0c6	Merge branch 'master' into vulkan	2022-06-22 13:15:50 +03:00
Triang3l	cbf0476d42	[D3D12] Don't round float24 depth when it's known to be exact	2022-06-22 13:14:38 +03:00
Gliniak	83269315d8	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-22 12:06:42 +02:00
Triang3l	7869b080d3	[D3D12] Truncate depth to float24 in EDRAM range ownership transfers and resolves by default Doesn't ruin the "greater or equal" depth test in subsequent rendering passes if precision is lost, unlike rounding to the nearest	2022-06-22 12:53:09 +03:00
Gliniak	87fd772393	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-21 07:54:44 +02:00
Triang3l	c0703e64db	Merge branch 'master' into vulkan	2022-06-20 22:40:19 +03:00
Triang3l	e2f632f8fa	[D3D12] Use udiv by constant tile size + minor transfer cleanup Drivers compile that to a multiplication and a shift anyway.	2022-06-20 22:39:30 +03:00
Triang3l	0dc480721f	[Vulkan] Render target resolving	2022-06-20 22:29:07 +03:00
Triang3l	c6ec6d8239	[Vulkan] Use UDiv/UMod by constant tile size + minor transfer cleanup Drivers compile that to a multiplication and a shift anyway.	2022-06-20 22:24:07 +03:00
Gliniak	a4ff64c465	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-20 21:07:32 +02:00
Triang3l	61c4c49d76	Merge branch 'master' into vulkan	2022-06-20 12:34:41 +03:00
Triang3l	207e11c8d2	[GPU] Separate range arguments for fixed16 RG and RGBA in GetResolveInfo On Vulkan, when snorm16 in unsupported, these formats may be emulated as float16, which natively can represent a wide range of numbers including -32 to 32 with blending. However, R16G16_SNORM and R16G16B16A16_SNORM are two separate formats, which may have different support on the device.	2022-06-20 12:29:45 +03:00
Triang3l	b61953374e	[GPU] Make resolve EDRAM binding DS 0 and rename it Ordering the descriptor sets by the change frequency on Vulkan, in increasing order (the opposite of D3D12 root signatures). The EDRAM binding never changes there (always one storage buffer), while the destination buffer binding may become changeable in the future (to split dispatches if exceeding `maxStorageBufferRange`, for example).	2022-06-20 12:15:52 +03:00
Triang3l	1200b205cf	Merge branch 'master' into vulkan	2022-06-19 17:52:28 +03:00
Triang3l	9b83d3d0f4	[GPU] XeSL resolve shaders + host depth store width fix	2022-06-19 17:50:21 +03:00
Gliniak	c0483f8bee	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-17 10:58:15 +02:00
Triang3l	166be463be	[XeSL] Metal Shading Language definitions	2022-06-16 21:39:16 +03:00
Gliniak	e8aaddf4d5	Merge remote-tracking branch 'GliniakRepo/patchingSystem' into canary_experimental	2022-06-14 17:50:25 +02:00
Triang3l	127bf34264	[Vulkan] Trace dump tool	2022-06-13 13:03:02 +03:00
Gliniak	945976a31d	Added Premake Files For PatchingSystem	2022-06-12 19:58:12 +02:00
Triang3l	ac268afbe9	[Vulkan] Fix 1<< uint32_t constants	2022-06-12 19:45:12 +03:00
Triang3l	140ed51e9a	[GPU] Fix missing xenia-ui dependency in gpu > gpu-shader-compiler (needed for gmake2)	2022-06-12 19:44:24 +03:00
Triang3l	17c835b245	Merge branch 'master' into vulkan	2022-06-12 18:51:08 +03:00
Triang3l	820b7ba217	[GPU] Fix GetActiveTextureHostSwizzle return type	2022-06-12 18:50:38 +03:00
Triang3l	1a22216e44	[SPIR-V] Texture fetch instructions	2022-06-09 21:42:16 +03:00
Triang3l	f875a8d887	Merge branch 'master' into vulkan	2022-06-09 21:35:12 +03:00
Triang3l	78d1eb8bf8	[GPU] TextureCache::GetActiveTextureHostSwizzle	2022-06-09 21:34:21 +03:00
Triang3l	56f72da137	[GPU] More exact PWL texture/RT gamma conversion	2022-06-07 21:26:34 +03:00
Gliniak	c7da7e1999	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-02 22:19:43 +02:00
Triang3l	a8cfe9bebb	[Vulkan] Unsubsample odd-sized 4:2:2 textures	2022-06-02 23:10:50 +03:00
Triang3l	1ce45ee150	Merge branch 'master' into vulkan	2022-06-02 22:50:14 +03:00
Triang3l	55a91afcc7	[D3D12] Don't decompress unaligned BC textures if supported	2022-06-02 22:48:03 +03:00
Triang3l	84fcd5defa	[GPU] Fix resolve destination offset and extent calculation	2022-06-02 21:47:30 +03:00
Triang3l	a9a072bf00	[GPU] Explain why a 32x32x4bpp linear texture takes 2 pages, not 1 [ci skip]	2022-06-01 13:00:23 +03:00
Triang3l	8bd244f277	[GPU] Better explanation for exact texture memory extent calculation [ci skip]	2022-06-01 12:55:16 +03:00
Gliniak	3169aa2ff3	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-06-01 08:45:21 +02:00
Triang3l	d1ad10b98c	[GPU] Primitive reset comment typo correction [ci skip]	2022-05-31 23:23:53 +03:00
Triang3l	efd7ef212a	[D3D12] 128 megatexel limit explanation based on the spec [ci skip]	2022-05-31 23:23:10 +03:00
Triang3l	25594c918c	[GPU] Fix tiled texture memory extent calculation	2022-05-31 23:17:33 +03:00
Gliniak	d7d26dc1c4	Merge branch 'master' of https://github.com/xenia-project/xenia into canary_experimental	2022-05-25 07:54:16 +02:00
Triang3l	6c9a06b2da	[Vulkan] Texture loading	2022-05-24 22:42:22 +03:00
Triang3l	aac28f19d1	Merge branch 'master' into vulkan	2022-05-24 22:34:40 +03:00
Triang3l	a4840e1992	[GPU] FIXME comment for 1bpb/2bpb texture tiled extent	2022-05-24 22:33:27 +03:00
Triang3l	8701c9f24e	[D3D12] Texture load code cleanup and resolution scaling fixes The resolution scale is now taken into account when copying from the mip tail.	2022-05-24 22:28:42 +03:00
Triang3l	75c185e759	[GPU] Move texture load shader info to common	2022-05-24 22:24:33 +03:00
Triang3l	f994d3ebb3	[Vulkan] Single block-compressed flag for host texture formats, not block sizes	2022-05-23 13:27:43 +03:00
Triang3l	f7b0edee6b	[Vulkan] GBGR/BGRG decompression	2022-05-23 13:18:47 +03:00

1 2 3 4 5 ...

2308 Commits