Use the actual PVR2 dithering kernel (standard 4x4 Bayer matrix) used on
real hardware.
Fixes the screen melt effect of Doom64 (per-pixel only).
Issue #1939
Using src=OTHER_COLOR with destination secondary accumulator should use
the secondary accumulator color, not the final one.
Fixes dark gun in Doom 64 with Ultra graphics (bump mapping).
Issue #1771
Add subpass dependency from the last subpass to external/top of pipe.
Fix glitches in upper left corner when using OIT on Mali GPUs.
Issue #1014
Issue #1234
Issue #1356
Issue #1497
Issue #1852
Use a series of stable-partitions to sort the list of available formats
to find the best candidate surface-format/color-space that is a non-sRGB
format being presented in an sRGB color-space. Vulkan mandates that all
surface formats that have SRGB forms must also support a UNORM form.
This is basically just RGBA8/BGRA8 on all platforms still, but in a way
that is still capable of falling back to secondary formats in a stable
way in the case that the primary choice is not available. Mobile
devices especially have a LOT of secondary HDR surface formats and other
weird formats that can be used to present such as RGBA16 or RGBA565.
With stable partitions, if we can't get our best option then there is
always a "next best thing" to fall back on rather than relying on the
driver-order.
Modifier volumes should also be clipped when needed.
Implement outside clipping for non-OIT renderers.
OIT renderers are less affected since the shadowed polys themselves are
usually also clipped after shadow is applied.
Fixes overflowing shadows in baserunner cams in WSB 2K1.
Make sure to end the current render pass with the previous setting.
Fix initial layout of color attachments in OIT.
Fix missing initial transition after Term/Init in !OIT.
Issue #1734
Disables the naomi2 vertex input attribute when emitting non-naomi2 pipelines.
This addresses some validation messages involving unused vertex inputs and optimizes the bandwidth of the input assembler a little bit for non-naomi2 games.
Rather than using `VK_FORMAT_R8G8B8A8_UINT` for these vertex attributes and then dividing by `255.0` in each of the shaders, the `VK_FORMAT_R8G8B8A8_UNORM` format will automatically remap byte components into the `0.0-1.0` range and removes the need to do the extra divisions or castings within the shader.
The full push-constant region is 24 bytes(6 floats), but some of these push-constant writes only wrote 20 bytes of data(5 floats).
Causing 4 bytes at the end to be left undefined.
Resolved by pushing an extra zero.
* vk: Add `VK_EXT_provoking_vertex` optimization
The dreamcast uses the last vertex as the provoking vertex, while vulkan uses the first vertex.
This requires an additional call to `setFirstProvokingVertex` to reorder the vertices for all incoming geometry.
With `VK_EXT_provoking_vertex`, the pipeline can designate that the provoking vertex is to be the last vertex, which removes the need to re-order incoming geometry on the CPU.
* vk: Propagate physical device API version to VMA
Allows VMA to make assumptions such as using the `*KHR` or non-`KHR` versions of certain function names.
* vk: Refactor libretro device initialization for `VK_EXT_provoking_vertex`
* vk: Top out at vulkan API version to VMA to 1.1
Despite the physical device possibly being 1.2 or 1.3, we only want up to 1.1. Otherwise we will be responsible for other API functions being resolved and loaded when passing to VMA.
* vk: Enable `VK_EXT_provoking_vertex` usage for ModVol and Final(OIT) pipeline
* vk: Enable `VK_EXT_provoking_vertex` for ModVol(OIT) pipeline
Pretty much anything handling dreamcast-geometry should use this extension when available
* vk: Additional `VK_EXT_provoking_vertex` pipeline fixes
This addresses the `BestPractices-Arm-vkCreateSampler-lod-clamping` message from ARM:
65b79bac61/layers/best_practices/bp_descriptor.cpp (L103-L110)
Rather than clamping the LOD in the samplers, instead rely on the Image-View's `vk::ImageSubresourceRange` to limit the number of sampled LODs.
Currently, only game-textures actually have MipMaps, so this does not introduce any additional mip-map sampling or filtering anywhere. If any code want's to actually limit the number of LODs sampled, then they would allocate an additional ImageView for the range of MipMaps to be sampled.
Co-authored-by: flyinghead <flyinghead@users.noreply.github.com>
This addresses the `BestPractices-Arm-vkCreateSampler-different-wrapping-modes` message from ARM:
65b79bac61/layers/best_practices/bp_descriptor.cpp (L95-L100)
The `W`-axis for these samplers is always unused, it's never the case that these samplers are going to be used for 3D textures.
ARM suggests trying to keep all of the wrapping-modes the same if possible for performance.
`wRepeat` will be set to the same value as `vRepeat` to try and encourage all three wrapping-modes to be the same.
* Uses a utility-lambda for repeated extension-adding logic
* Uses an `std::set` for the list of available extensions for quick queries
* `VK_EXT_DEBUG_REPORT` and `VK_EXT_DEBUG_UTILS` aren't device extensions and don't need to be here. They are instance extensions
* Each extension that is tested to be added has a corresponding log message for if it was enabled or if it was unavailable
```
00:00:162 rend\vulkan\vulkan_context.cpp:427 N[RENDERER]: Device extension enabled: VK_KHR_swapchain
00:00:162 rend\vulkan\vulkan_context.cpp:427 N[RENDERER]: Device extension enabled: VK_KHR_get_memory_requirements2
00:00:162 rend\vulkan\vulkan_context.cpp:427 N[RENDERER]: Device extension enabled: VK_KHR_dedicated_allocation
00:00:162 rend\vulkan\vulkan_context.cpp:430 N[RENDERER]: Device extension unavailable: VK_KHR_portability_subset
00:00:162 rend\vulkan\vulkan_context.cpp:430 N[RENDERER]: Device extension unavailable: VK_EXT_debug_marker
```
Rather than electing the first physical device it finds, and falling back on the first-listed GPU: a series of stable-partitions are done so that the "least compromising" GPU is selected based on a series of criteria.
It will now maximally try to find a GPU that(in order of priority):
* Is a discrete GPU
* Supports `fragmentStoresAndAtomics`
* Supports `R5G5B5`/`R5G6B5A1`/`R4G4B4A4`
In the case that a system has two dGPUs and one of them supports optimal-formats, the optimal-format one is selected
In the case that a system has an iGPU and the dGPU and they both support optimal formats, the dGPU is selected.
In the case that a system has an iGPU and the dGPU and the dGPU doesn't support optimal formats, the dGPU is still selected.
Don't clip modifier volumes but tesselate triangles intersecting the
near plane. Then project clipped vertices onto it in the vertex shader.
Issue #1651