Commit Graph

241 Commits

Author SHA1 Message Date
Matt Borgerson 9020913e29 nv2a: Extract GLSL generation options from {Vsh,Psh}State 2025-06-28 00:18:28 -07:00
Matt Borgerson c575b08b5f nv2a: Extract VshState from ShaderState 2025-06-28 00:18:28 -07:00
Matt Borgerson 9d9c88f71d nv2a: Unset some FF ShaderState if unnecessary 2025-06-28 00:18:28 -07:00
Matt Borgerson 3ad4eb3101 nv2a: Remove colorkey_mask from PshState
It's a uniform, so we don't want it to be part of the PSH setup state,
which used as the shader cache key.
2025-06-28 00:18:28 -07:00
Erik Abair 634577c753 nv2a: Clamp fog factor to valid float range 2025-06-26 14:20:20 -07:00
Matt Borgerson 1fa242bd00 nv2a/vk: Only include palette in texture key when necessary
Fixes degraded performance due to garbage palette details polluting the
texture cache.
2025-06-23 01:09:51 -07:00
Matt Borgerson a242964793 nv2a/vk: Require fillModeNonSolid feature 2025-06-22 16:44:14 -07:00
Matt Borgerson cbcfd1d1d6 nv2a/vk: Enable wideLines feature before use 2025-06-22 16:44:14 -07:00
Matt Borgerson df1ac31eb2 nv2a/vk: Set line width state dynamically 2025-06-22 16:44:14 -07:00
specialfred453@gmail.com d8afb35c40 nv2a/vk: Scale line width by surface scale 2025-06-22 16:44:14 -07:00
Erik Abair 348b03d6ce
nv2a: Handle PGRAPH color keying 2025-06-21 13:25:24 -07:00
Erik Abair 9732026aca nv2a: Ignore unsupported depth funcs to match HW 2025-06-03 12:11:46 -07:00
Erik Abair 8667193001 nv2a: Prevent NaN in specular power factor calculation 2025-05-20 13:28:39 -07:00
coldhex ce936bccdd nv2a/gl: y-flipped rendering to framebuffer object
Render scenes upside-down to framebuffer objects (FBO). The strange thing
about rendering to OpenGL FBO is that it follows the bottom-left triangle
rasterization rule with common PC GPUs. At least Intel and AMD. NVIDIA to
be tested. My raster-rule-test github gist demonstrates this.

This commit flips coordinates in y-direction, which effectively turns the
bottom-left rule into top-left rule needed for Xbox compatibility.

This (together with the previous commit) fixes Midtown Madness 3 Seine
water rectangular seam rendering artifacts (and the remaining seams are
present with Xbox hardware too.) May fix similar artifacts in other games.
2025-05-20 13:15:12 -07:00
coldhex a316d74872 nv2a: Use trunc in vertex rounding instead of floor
Xbox seems to truncate instead of flooring, which can be inferred from
interpolated depth buffer values.
2025-05-20 13:15:12 -07:00
coldhex 11dcae01b9 nv2a: implement screen coordinate rounding to 4 bit fractional precision
Xbox triangle rasterization appears to follow the usual top-left rule.
However, since Xemu renders to an OpenGL framebuffer object (FBO) instead
of directly to the default framebuffer, Xemu actually has what could be
called the bottom-left triangle rasterization rule. I'll address that in
another commit.

Also, note that the ProjAdjacentGeometry_0.5625 test in nxdk_pgraph_tests
is very sensitive to floating point rounding errors. For example, the
nxdk_pgraph_tests commit 66b32a0b1feba32a0db7a95d6358e84f7a6246ad changed
the math library which caused the test result to change also on real Xbox
hardware due to floating point rounding error differences in matrix
inverse computation. Apart from the bottom-left rasterization issue, the
differing result between Xbox and the rounding I am proposing here for
Xemu seems to stem from floating point rounding that happens in screen
coordinate calculations before the rounding to 4 bit precision takes place.
Fixing such rounding issues would require carrying all preceding floating
point computations exactly in the same order and with same precision as
Xbox. Note that Xbox Direct3D library seems to add 0.03125 (1/32) to
screen coordinates by default. Likely the idea there was to make floating
point screen coordinates round to the nearest screen coordinates in
4 bit fixed point precision. So the Xbox Direct3D library (and therefore
games) already mitigate against precarious rounding when exactly
half-integer coordinates are used by games. Actually they would use
integer coordinates because it is Direct3D 8, but since nv2a appears to
rasterize at half-integer coordinates like OpenGL, Xbox Direct3D
also adds 0.5 to screen coordinates in addition to 1/32.
2025-05-20 13:15:12 -07:00
Erik Abair c720af00bb
nv2a/vsh: Replace NaN with 1.0 for Bx, Dx, Fog outputs and MUL zero-check 2025-05-15 12:54:56 -07:00
Erik Abair 428c975f09 nv2a: Allow multiframe RenderDoc captures with nv2a traces
Allows multiple frames to be captured at once by holding shift while pressing
F10.

Temporarily toggles nv2a trace messages if control is held while pressing F10.
2025-05-15 08:37:13 -07:00
Erik Abair d593869429
nv2a: Move point params to uniforms
Co-authored-by: Matt Borgerson <contact@mborgerson.com>
2025-04-30 23:43:38 -07:00
Matt Borgerson 6e513ed948 nv2a/psh: Fix 2D texture addressing in DOT_STR_3D mode 2025-04-29 23:41:05 -07:00
Erik Abair 89185e6937 nv2a/psh: Fix default alpha for unbound texture samplers 2025-04-22 20:16:15 -07:00
Erik Abair 270dbe01ea nv2a: Increase MAX_BATCH_LENGTH beyond highest known retail use 2025-04-18 10:46:43 -07:00
Matt Borgerson 5685a6290c nv2a/vk: Set specular power uniform 2025-04-16 20:26:22 -07:00
Erik Abair 679f6d06bd nv2a: Handle LOCALEYE light control 2025-04-16 18:24:46 -07:00
Erik Abair 34ed0f75de nv2a: Handle LOCAL_RANGE 2025-04-16 18:24:46 -07:00
Erik Abair 69c8df2a3e nv2a: Partial implementation of SET_SPECULAR_PARAMS 2025-04-16 18:24:46 -07:00
Erik Abair 7a34eedd6f nv2a: Partially handle SET_LIGHT_CONTROL 2025-04-16 18:24:46 -07:00
Erik Abair 86c85023e6 nv2a: Handle SET_FOG_COORD and SET_WEIGHT* commands 2025-04-16 14:09:13 -07:00
Matt Borgerson 2cc926588b nv2a/gl: Fix COLOR_LE_G8B8 GL surface format type 2025-04-11 04:18:28 -07:00
Erik Abair ebec5e3028 nv2a: Fix assert when setting fog gen mode to fog_x 2025-04-08 16:24:50 -07:00
Erik Abair 672e9cd553 nv2a: Handle SET_SPECULAR_ENABLE 2025-03-28 02:18:42 -07:00
Matt Borgerson 2d73e8aafe nv2a: Use root-relative paths to reference parent dir headers 2025-03-27 23:33:40 -07:00
Matt Borgerson 0e18d11d90 nv2a: Rename methods.h -> methods.h.inc 2025-03-27 23:33:40 -07:00
Matt Borgerson 1893b56c38 nv2a/vk: Fix vertex ram buffer dirty bit check 2025-03-19 02:25:33 -07:00
Matt Borgerson b929d4eced nv2a: Drop surface compat clip constraint 2025-03-17 14:48:47 -07:00
Matt Borgerson c3a8b9569f nv2a: Simplify surface clip to scissor size calculation 2025-03-17 14:48:47 -07:00
Logan Stromberg 860bccb722
nv2a: Fix surface clip to scissor origin 2025-03-17 14:32:40 -07:00
coldhex 02659dd3cc nv2a: Fix cubemap fourth texture coordinate component handling
Xbox hardware ignores fourth texture coordinate component for cubemaps.
2025-03-17 11:37:41 -07:00
wilkovatch a00820746f
nv2a: Handle texture dimensions not divisible by 4 in S3TC decoder 2025-03-14 18:44:25 -07:00
Matt Borgerson a143f66ce4 nv2a/psh: Handle 3D textures in PROJECT2D mode 2025-03-10 16:13:09 -07:00
Matt Borgerson 6e3dfb36d8 nv2a/vk: Don't set compressed, swizzled when attribute is uniform 2025-03-10 14:23:43 -07:00
Matt Borgerson 4665515d80 nv2a: Group attributes in pgraph_get_glsl_vtx_header 2025-03-10 12:30:16 -07:00
coldhex 3eb22b6b81 nv2a: Explicit float representation for RCC and vertex shader W range 2025-03-08 14:54:18 -07:00
coldhex 63cb75ce84 nv2a: Fix -0.0 clamping of RCC instruction and vertex shader W-output
Xbox rounds -0.0 to the negative range and 0.0 to the positive range. This
commit also restores RCC instruction clamping to be done on the output of
reciprocal calculation (which current Xemu release does) with fix for the
input=Infinity case.
2025-03-08 14:54:18 -07:00
coldhex 8dc6c90e11 nv2a/vk: Drop unnecessary dirty check for NV_PGRAPH_ZCOMPRESSOCCLUDE
This was used to enable/disable Vulkan depth clamping, but that was
removed in previous commit.
2025-03-08 14:54:18 -07:00
coldhex 854a001063 nv2a: Fix zero-vector input in fixed function vertex shader
If tPosition is a zero-vector, then invViewport matrix had no effect.
Bounding w-coordinate away from zero and infinity must be done before
applying invViewport (which is needed for OpenGL/Vulkan) to emulate
Xbox hardware behaviour properly.
2025-03-08 14:54:18 -07:00
coldhex 798ad30819 nv2a: Perspective-correct interpolation for w-buffering
z_perspective is true implies w-buffering and then the w-coordinate stored
in the depth buffer should also be interpolated in a perspective-correct
way. We do this by calculating w and setting gl_FragDepth in the fragment
shader.

Since enabling polygon offset and setting values using glPolygonOffset
won't have any effect when manually setting gl_FragDepth for w-buffering,
we introduce the depthOffset variable to obtain similar behaviour (but the
glPolygonOffset factor-argument is currently not emulated.) (Note that
glPolygonOffset is OpenGL implementation-dependent and it might be good to
use depthOffset for z-buffering as well, but this is not done here and we
still use OpenGL/Vulkan zbias functionality.)

This also implements depth clipping and clamping in the fragment shader.
If triangles are clipped, the shadows of the small rocks in Halo 2 Beaver
Creek map can have flickering horizontal lines. The shadows are drawn on
the ground in another pass with the same models as for the ground, but for
some reason with depth clamping enabled. The flickering happens if Xemu
clips the ground triangles, but the exact same shadow triangles are depth
clamped, so there are small differences in the coordinates. The shadows
are drawn with depth function GL_EQUAL so there is no tolerance for any
differences. Clipping in the fragment shader solves the problem because
the ground and shadow triangles remain exactly the same regardless of
depth clipping/clamping. For some performance gain, it might be a good
idea to cull triangles by depth in the geometry shader, but this is not
implemented here.

In the programmable vertex shader we always multiply position output by w
because this improves numerical stability in subsequent floating point
computations by modern GPUs. This usually means that the perspective
divide done by the vertex program gets undone.

The magic bounding constants 5.42101e-020 and 1.884467e+019 are replaced
by 5.421011e-20 and 1.8446744e19, i.e. more decimals added. This makes the
32-bit floating point numbers represent exactly 2^(-64) and 2^64 (raw bits
0x1f800000 and 0x5f800000) which seem more likely the correct values
although testing with hardware was not done to this precision.

Testing indicates that the same RCC instruction magic constants are also
applied to both fixed function and programmable vertex shader w-coordinate
output. This bounding replaces the special test for w==0.0 and abs(w)==inf
which used to set vtx_inv_w=1.0 (which did not match Xbox hardware
behaviour.)
2025-03-08 14:54:18 -07:00
Erik Abair f701573d44 nv2a: Use rounded values for alpha testing 2025-03-05 17:12:14 -07:00
Matt Borgerson 7cb7bb68a9 nv2a: Multiversion [un]swizzle to optimize for common bpp 2025-01-26 18:47:46 -07:00
Matt Borgerson eae328dc19 nv2a: Move [un]swizzle_rect to swizzle.h 2025-01-26 18:47:46 -07:00
Matt Borgerson bb5ee6865b nv2a: Drop osdep.h, add stdbool.h to swizzle.c 2025-01-26 18:47:46 -07:00
NZJenkins ae4b5c0695
nv2a: Speed up software swizzling 2025-01-26 14:00:35 -07:00
Matt Borgerson 7eba0d3124 nv2a/gl: Update copyright on recently modified files 2025-01-07 17:37:06 -07:00
Matt Borgerson 510c280b05 nv2a/gl: Unify ShaderBinding and ShaderLruNode 2025-01-07 17:35:06 -07:00
Matt Borgerson 6c389194b6 nv2a/psh: Remove unused arguments in string format 2025-01-07 00:52:51 -07:00
Matt Borgerson d76898f63b nv2a: Fix variable shadowing complaints 2025-01-07 00:52:51 -07:00
Matt Borgerson 57c6d82fa3 nv2a/vk: Simplify debug indent loop to a variable field width format 2025-01-07 00:52:51 -07:00
Matt Borgerson 6ac52147a4 nv2a/psh: Remove function scope variable i, which was being shadowed 2025-01-07 00:52:51 -07:00
Matt Borgerson 3070d6422c mstring: Remove mstring_append_{int,char} 2025-01-07 00:52:51 -07:00
Matt Borgerson 0e50741c28 ui,xbox: Copyright updates on changed files 2025-01-06 23:06:21 -07:00
Matt Borgerson 75ce25c9b5 nv2a: Define DEBUG_NV2A_*=0 ifndef 2025-01-06 23:05:53 -07:00
Matt Borgerson 3106ea97e5 mcpx: Use new bql_[un]lock functions 2025-01-06 23:05:53 -07:00
Matt Borgerson 5cb65d1791 nv2a: Migrate nv2a_get_offsets to new _get_params model 2025-01-06 23:05:53 -07:00
Matt Borgerson 824af3978f meson: Migrate nv2a_vsh_cpu submodule to a subproject 2025-01-02 19:07:25 -07:00
Matt Borgerson 8f478e017a nv2a/psh: Handle 3D textures in BUMPENVMAP[_LUM] modes 2024-12-31 03:17:52 -07:00
Matt Borgerson b6d6a4709d nv2a/gl: Use snake case for line width ranges 2024-12-31 01:37:05 -07:00
Matt Borgerson e67f19d03b nv2a/vk,gl: Fix a couple 64b shift/printing bugs 2024-12-31 01:37:05 -07:00
Matt Borgerson ae3fe91223 nv2a/gl: Rebase line width feature 2024-12-31 01:37:05 -07:00
Matt Borgerson fb7feb7b1f nv2a/vk: Fix missing display surface addr in debug marker 2024-12-31 01:37:05 -07:00
Matt Borgerson 4a09eeb121 nv2a/vk: Use unsigned types for clear scissor calculation for now 2024-12-31 01:37:05 -07:00
Matt Borgerson 1e5cae068a nv2a/vk: Drop unused vertex_buffer_inline field 2024-12-31 01:37:05 -07:00
Matt Borgerson 477d5489ac nv2a/vk: Copy remapped vert data after pre-draw 2024-12-31 01:37:05 -07:00
Matt Borgerson 28c9f5f6ef nv2a/vk: Load 16b float depth textures as unorm to match surface w/a 2024-12-31 01:37:05 -07:00
Matt Borgerson c098b82108 nv2a/vk: VK_CHECK call to vkBindImageMemory 2024-12-31 01:37:05 -07:00
Matt Borgerson 085fb33141 nv2a/vk: Fix external memory handle type on Windows 2024-12-31 01:37:05 -07:00
Matt Borgerson 88835a1019 nv2a/vk,gl: Handle case where pline_offset == 0 2024-12-31 01:37:05 -07:00
Matt Borgerson 58c1daf594 nv2a/vk: Report dirty if no pipeline is bound 2024-12-31 01:37:05 -07:00
Matt Borgerson 8dc3b646a3 nv2a/vk: Move display GL compat after line_offset adjust 2024-12-31 01:37:05 -07:00
Matt Borgerson 7afeda5da0 nv2a/vk: Add regs control_{0,3}, setupraster to shader dirty test 2024-12-31 01:37:05 -07:00
Matt Borgerson 4cd4153937 nv2a/vk: Move reg dirty clear into create_pipeline 2024-12-31 01:37:05 -07:00
Matt Borgerson de1381c932 nv2a/vk: Drop pipeline merge stat 2024-12-31 01:37:05 -07:00
Matt Borgerson 986b18214c nv2a/vk: Drop display update early-out 2024-12-31 01:37:05 -07:00
Matt Borgerson 974b2be87a nv2a/vk: Add command buffer region debug markers 2024-12-31 01:37:05 -07:00
Matt Borgerson c7f82ab79f nv2a/gl: Fix bind_shaders dgroup 2024-12-31 01:37:05 -07:00
Matt Borgerson 580c2e9da4 nv2a/vk: Run full dirty texture check 2024-12-31 01:37:05 -07:00
Matt Borgerson 5527e908b7 nv2a/vk: Process pending surface upload just in time for display 2024-12-31 01:37:05 -07:00
Matt Borgerson e5be3f2714 nv2a/vk: Add missing math.h include 2024-12-31 01:37:05 -07:00
Matt Borgerson d054b366f8 nv2a/vk: Add pvideo support 2024-12-31 01:37:05 -07:00
Matt Borgerson f26b8c32d6 n2va/vk: Key textures on sampler state for now 2024-12-31 01:37:05 -07:00
Matt Borgerson 69b5318cb5 nv2a/vk: Fix create_pipeline debug marker inbalance 2024-12-31 01:37:05 -07:00
Matt Borgerson 9ab1f96911 nv2a/vk: Make pgraph_vk_insert_debug_marker format strings 2024-12-31 01:37:05 -07:00
Matt Borgerson ca42f0f2df nv2a/vk: Clear in separate renderpass for now 2024-12-31 01:37:05 -07:00
Matt Borgerson 31db8d04b0 nv2a/vk: Ensure queries do not include clears 2024-12-31 01:37:05 -07:00
Matt Borgerson d47fef9467 nv2a/vk: Fix reports 2024-12-31 01:37:05 -07:00
Matt Borgerson 2f910eeacf nv2a/vk: Fixup unaligned attribute data in inline buffer 2024-12-31 01:37:05 -07:00
Matt Borgerson 3096f2a9c8 nv2a/vk: Always bind fragment shader in draw pipeline 2024-12-31 01:37:05 -07:00
Matt Borgerson a2b994d80d nv2a/vk: Only bind clear fragment shader on partial color clear 2024-12-31 01:37:05 -07:00
Matt Borgerson 76e2b779e3 nv2a/psh: Handle rect tex on project3d 2024-12-31 01:37:05 -07:00
Matt Borgerson 62acb2db7e nv2a/psh: Drop rect_tex assertion 2024-12-31 01:37:05 -07:00
Matt Borgerson 1c38a0a42b nv2a/psh: Normalize coords at sample time 2024-12-31 01:37:05 -07:00