GPU: Make the overall functionality of CopyLineExpand() and CopyLineReduce() more complete. Also do some small optimizations to GPUEngineBase::_LineCopy() while I'm at it.
- GPUEngineBase::_LineCopy() optimizations only apply to 2x, 3x, and 4x scaling. - Add SSE2 version of 3x CopyLineExpand() when using ELEMENTSIZE==1. - Add SSE2 versions of CopyLineReduce() and add specific 2x/3x/4x versions of CopyLineReduce_*() algorithms. - CopyLineExpand() now supports vertical scaling in addition to horizontal scaling. - GPU buffers that were previously only cache-aligned are now page-aligned if appropriate.
This commit is contained in:
parent
7c80205a40
commit
acb140209a