GPU: Make the overall functionality of CopyLineExpand() and CopyLineReduce() more complete. Also do some small optimizations to GPUEngineBase::_LineCopy() while I'm at it.

- GPUEngineBase::_LineCopy() optimizations only apply to 2x, 3x, and 4x scaling.
- Add SSE2 version of 3x CopyLineExpand() when using ELEMENTSIZE==1.
- Add SSE2 versions of CopyLineReduce() and add specific 2x/3x/4x versions of CopyLineReduce_*() algorithms.
- CopyLineExpand() now supports vertical scaling in addition to horizontal scaling.
- GPU buffers that were previously only cache-aligned are now page-aligned if appropriate.
This commit is contained in:
rogerman 2018-09-19 16:06:39 -07:00
parent 7c80205a40
commit acb140209a
1 changed files with 817 additions and 138 deletions

File diff suppressed because it is too large Load Diff