- Specifically, if the previous frame is determined to draw the entire HD layer directly over the backdrop layer, then the current frame's entire custom framebuffer is asynchronously cleared using line 0's backdrop color since most games will keep the backdrop color constant for all scanlines. Because this is a common rendering case, many 3D games should see a performance improvement when running very large HD framebuffers (8x or higher).
- Also fix a compiling issue for non-SSE2 systems. (Regression from commit 3890431.)
- Most notably, fix a performance regression where polygon drawing was no longer getting batched due to an incorrect polygon-facing test. (Regression from commit dab414c.)
- New Behavior: In addition to emulating the existing Depth Equals Test Tolerance, NDS-Style Depth Calculation accounts for all NDS depth calculations within the fragment shader. Most notably, disabling this option forgoes the W-depth / Z-depth differentiation that the NDS uses, instead preferring the GPU's native Z-depth calculation. Using the GPU's native depth calculation significantly improves performance, but many games use W-depth calculations or are sensitive to subtleties in the Z-depth calculation, and so this option must remain ON by default for compatibility's sake.
- Also fixes a shader initialization issue on the Windows port. (Regression from commit 7080e21.)
- Fix compiling issues for big-endian systems.
- Fix bug where the Recent ROMs menu and also launching the app while loading a ROM file would fail to load the ROM on macOS v10.5 Leopard.
- Fix bug where GPU main memory display mode would show incorrect pixels on big-endian systems when running at 15-bit color depth.
- As an unintended collateral improvement, GPUEngineA::_HandleDisplayModeMainMemory() now has SSE2-accelerated versions for 18-bit and 24-bit color depths. This was done less for its performance benefit (main memory display mode is an extremely rare feature) and more for better code consistency and code completeness.
- After years of testing, no one has reported running into the assert in gfx3d_ysort_compare() so I think we should be safe in reverting std::stable_sort() back to std::sort().
- For the sorting function, use gfx3d_ysort_compare_orig() since this function compiles down to fewer instructions than gfx3d_ysort_compare_kalven() does, resulting in better sorting performance.
- Of note, I'm pretty sure that SF commit r5132 is what fixed the original bug (see SF#1461 for more details) by getting rid of the NaN comparisons that were tripping up std::sort(). In the future, we should research why we're dividing by 0 in the first place, since r5132 is clearly a hack of a fix.
- Also do a minor performance optimization by only doing the framebuffer clear once for each power-off condition, rather than repeatedly and unnecessarily clearing the framebuffer for each and every V-blank.
- Of note, when running at custom resolutions, we are now being more aggressive in performing early tests for rejecting pixels as soon as possible. This may yield a minor performance improvement in some very specific rendering scenarios that require the window test.