- Historically (ever since commit b1e4934 and commit d5bb6fd), VERTLIST_SIZE (based on POLYLIST_SIZE) could reference an index over 66535, but POLY.vertIndexes has always been an unsigned 16-bit value with a max value of 66535, making for a potential overflow. In practice, an overflow has never happened in the past 15 years because a hardware NDS has a limit of 6144 polygons (or 24576 vertices), and that even a VERTLIST_SIZE of 80000 is way more than plenty to accommodate that. Rather than increasing POLY.vertIndexes to 32-bit, it makes more sense to reduce POLYLIST_SIZE to 16383 to keep VERTLIST_SIZE below 66536. Even with this change, POLYLIST_SIZE and VERTLIST_SIZE should be more than plenty in practice.
- Also, now that we're performing polygon clipping for all client 3D renderers (ever since commit e06d11f), we're finally accounting for the fact that the clipped polygon list size is larger than POLYLIST_SIZE. Client 3D renderers have been updated to now reflect this change and avoid theoretical overflow issues that have never actually happened in practice.
- Since the shininess table is only ever used for vertex generation, it makes no sense for the table to included for the rasterization step. The shininess table has been moved to the NDSGeometryEngine class, which is a class dedicated to just vertex and polygon generation. This reorganization seems more sensible.
- There are two significant behavior changes in this commit that will require further testing.
- Behavior change: Before, MTX_LOAD_4x4 and MTX_LOAD_4x3 commands would update the individual values in the current matrix for each command. Now, these commands will batch the values into a temporary matrix until the temp matrix is complete, and then copy the temp matrix into the current matrix. This now matches the batching behavior that the other matrix commands already do.
- Behavior change: Before, there was a single shared temporary multiplication matrix used to batch the values of incoming MTX_MULT_4x4, MTX_MULT_4x3, and MTX_MULT_3x3 commands, theoretically allowing these commands to be used interchangeably and overwrite values from previous commands until the last command made a completed multiplier matrix. Now, there are 3 separate temporary multiplier matrices, one for each of the MTX_MULT_* commands, which means that each command type must now complete its own multiplier matrix before it can perform a matrix multiply.
- The integer-based Box Test should be just as good as the old float-based one, as tested by "American Girl: Julie Finds a Way". Of course, there are still bugs compared to a hardware NDS, but we haven't transitioned to rendering with integer-based vertices yet to make a meaningful test possible.
- Rename a bunch of variables to better reflect their intended usage.
- Add new data types for organizing 3D vectors and coordinates.
- C functions that are called in response to 3D commands now follow the exact same pattern: “static void gfx3d_glFuncName(const u32 param)”
- Be super explicit about the usage of numeric data types and their typecasts to improve their code visibility.
- Remove implementation-specific ambiguity of right bit-shifting signed numerics (Logical-Shift-Right vs Arithmetic-Shift-Right) in the following functions: gfx3d_glLightDirection_cache(), gfx3d_glNormal(), gfx3d_glTexCoord(), gfx3d_glVertex16b(), gfx3d_glVertex10b(), gfx3d_glVertex3_cord(), gfx3d_glVertex_rel(), gfx3d_glVecTest().
- Also remove Viewer3D_State.indexList. This member is obsolete since Viewer3D_State.gList.clippedPolyList is generated in the same order described by indexList.
- Encapsulate and rename some more lists to make their intended purpose more descriptive.
- Most importantly, refactor the GFX3D_Clipper::ClipPoly() method into the standalone GFX3D_GenerateClippedPoly() function, which drops all class member dependencies and is much more straightforward to use.
- Remove the GFX3D_Clipper class. It has been slowly gutted over the years, but the loss of the ClipPoly() method makes the class obsolete, putting the final nail in its coffin.
- Using the new GFX3D_GenerateClippedPoly() function, gfx3d_PerformClipping() operates on the unsorted clipped polygon list directly. This means that the polygon clipping step requires one less buffer copy.
- Clipped polygons no longer retain direct pointers to POLY structs, instead using their index member to reference a POLY struct at a POLY array location. All 3D renderers have been updated to reflect this change.
- Rename a bunch of variables that reference POLY, CPoly, and VERT structs to clearly differentiate them between raw NDS data and our own internally processed data.
- Fix a potential bug in GFX3D_GenerateRenderLists() where sorting clipped polygons would reorder their polygon indices without reordering their other data members along with the indices, causing the data members to desync. OpenGL rendering was immune to this bug, but SoftRasterizer might have possibly seen it. In any case, this fix is the correct behavior.
- Problems only occurred when LOADING a save state. However, SAVING a save state on commit 5426509e should be okay, so such save states should not be broken.
- Viewports are now processed on VIEWPORT register write instead of being processed at render time.
- CLEAR_DEPTH, CLRIMAGE_OFFSET, EDGE_COLOR, FOG_TABLE, and TOON_TABLE register writes are now handled more consistently.
- The fogDensityTable check for force-drawing clear images in gfx3d_VBlankEndSignal() has been removed. Changes done in commit 8438a5a6 will always causes this check to fail, and this commit will always cause this check to fail. Therefore, this check is now obsolete.
- Change a bunch of GFX3D-related structs from C++ style constructed structs into C-style POD structs.
- Better account for UltraSPARC's unique memory page size.
- malloc_alignedCacheLine() no longer returns 16-byte aligned memory if the architecture is neither 32-bit or 64-bit. Now, the function only returns 64-byte alignment for 64-bit architectures OR 32-byte alignment for 32-bit architectures.
- The new code works by pre-swapping big-endian words on disp_fifo.buf write, rather than swapping the big-endian words during disp_fifo.buf read.
- There is a behavior change here. Before, 8-bit and 16-bit writes to disp_fifo.buf would increment disp_fifo.tail. Now, 8-bit and 16-bit writes only increment disp_fifo.tail when the most significant bit within the FIFO value's 32-bit boundary is written to.
- Behavior is unchanged when doing 32-bit writes. In practice, the rare games that use display FIFO have only ever done 32-bit writes, so this scenario is well tested.