- This fix has the side effect of greatly increasing the code size.
- Quick testing shows that this fix increases overall graphics performance by 2% - 3%. But is this small performance gain worth the massive increase in code size? Hmmm....
- In practice, no games seemed to be affected by this bug, but even so, this fix is correct.
- While technically unnecessary, when the index is singly incremented, it's better to hard reset an overrunning index to zero in order to improve the theoretical stability of the code.
- Byte swapping can now be independently controlled for both input and output data.
- As an application to this new API, VRAM display mode now shows the correct colors on big-endian systems.
- This also discovers an existing issue with the fog weight calculation code in both OpenGL and SoftRasterizer, since Fog Shift could be zero and thereby cause the calculations to divide by zero. This issue will have to be dealt with at a later time.
- Also rework SoftRasterizerRenderer::_UpdateFogTable() to use the same variable naming scheme as OpenGL. This is done for better code consistency.
- In reality, I'm already looking to scrapping this algorithm in OpenGL for something that could be better in every possible way, but I want to commit this SoftRasterizer-esque algorithm first so that we have a working version of it on record.
- Most notably, each version of the manually vectorized code now resides in their own files.
- Depending on the rendering situation, the new AVX2 code may increase rendering performance by 5% to up to 50%.
- Certain functions automatically gain manual vectorization support since the new GPU code makes use of the new general-purpose copy functions that were added in commit e991b16. In other words, AVX-512 and AltiVec builds also benefit from this.
- Also renames "Altivec" to "AltiVec" to remain consistent with Colorspace Handler's naming.
- Also adds an AltiVec accelerated version of the clear image parser.
- Final Release builds still remain as PowerPC 32-bit, Intel 32-bit, and Intel 64-bit. ARM64 is not supported yet.
- PowerPC 32-bit and Intel 32-bit continue to require macOS v10.5 Leopard like before, but the Intel 64-bit binary now requires macOS v10.7 Lion or later. (Now, the Intel 64-bit binary will simply fail to run on Leopard and Snow Leopard.)
- Specifically, we're now respecting uniform control flow for texture lookups, for which older/stricter drivers will silently fail because they consider texture lookups within conditional blocks to be undefined.
- This change partially reverts commit 87cb2f6, but still preserves the elimination of the destructor, which is probably the code simplification that was originally wanted, I guess.