- Added some additional comments so that I'm not tempted to change the native line tracking paradigm ever again.
- Do some refactoring to make GPUEngineBase::_targetDisplay handle more buffer associations itself instead of relying on GPUEngineBase's copies of the associations.
- For purposes of maintaining a record and make for easier reversions, the code has NOT been fully optimized or cleaned up. This will happen over a period of time as the code settles down through testing.
- All "native" buffers are no longer assumed to be in any color space and are now assumed to always be 15-bit. The native buffers are now referenced using uint16_t pointers and are now suffixed with "16" in order to reflect this change.
- Of note, all clients that reference masterNativeBuffer or nativeBuffer via NDSDisplayInfo must now assume that these native buffers will always be in the 16-bit color space.
- Any 18-bit and 24-bit rendering now happens in the custom buffers.
- Also fixes a bug in PixelOperation_SSE2::_unknownEffectMask32() that would cause 3D layers to appear black if the user was running 15-bit color mode. (Regression from commit 0db9872.)
- This fix has the side effect of greatly increasing the code size.
- Quick testing shows that this fix increases overall graphics performance by 2% - 3%. But is this small performance gain worth the massive increase in code size? Hmmm....
- In practice, no games seemed to be affected by this bug, but even so, this fix is correct.
- While technically unnecessary, when the index is singly incremented, it's better to hard reset an overrunning index to zero in order to improve the theoretical stability of the code.
- Byte swapping can now be independently controlled for both input and output data.
- As an application to this new API, VRAM display mode now shows the correct colors on big-endian systems.
- This also discovers an existing issue with the fog weight calculation code in both OpenGL and SoftRasterizer, since Fog Shift could be zero and thereby cause the calculations to divide by zero. This issue will have to be dealt with at a later time.
- Also rework SoftRasterizerRenderer::_UpdateFogTable() to use the same variable naming scheme as OpenGL. This is done for better code consistency.
- In reality, I'm already looking to scrapping this algorithm in OpenGL for something that could be better in every possible way, but I want to commit this SoftRasterizer-esque algorithm first so that we have a working version of it on record.
- Most notably, each version of the manually vectorized code now resides in their own files.
- Depending on the rendering situation, the new AVX2 code may increase rendering performance by 5% to up to 50%.
- Certain functions automatically gain manual vectorization support since the new GPU code makes use of the new general-purpose copy functions that were added in commit e991b16. In other words, AVX-512 and AltiVec builds also benefit from this.
- Also renames "Altivec" to "AltiVec" to remain consistent with Colorspace Handler's naming.
- Also adds an AltiVec accelerated version of the clear image parser.
- Final Release builds still remain as PowerPC 32-bit, Intel 32-bit, and Intel 64-bit. ARM64 is not supported yet.
- PowerPC 32-bit and Intel 32-bit continue to require macOS v10.5 Leopard like before, but the Intel 64-bit binary now requires macOS v10.7 Lion or later. (Now, the Intel 64-bit binary will simply fail to run on Leopard and Snow Leopard.)
- Specifically, we're now respecting uniform control flow for texture lookups, for which older/stricter drivers will silently fail because they consider texture lookups within conditional blocks to be undefined.