mirror of https://github.com/PCSX2/pcsx2.git
27 lines
2.3 KiB
Plaintext
27 lines
2.3 KiB
Plaintext
Table 2. Throughput of Native Arithmetic Instructions. (Number of Operations per Clock Cycle per Multiprocessor) Compute Capability
|
|
|
|
Architecture , FER , FER , KPL , MAX
|
|
32-bit floating-point add multiply multiply-add , 32 , 48 , 192 , 128
|
|
64-bit floating-point add multiply multiply-add , 16 , 4 , 8 , 1
|
|
32-bit floating-point reciprocal reciprocal square root log2f/exp2f/sine/cosine , 4 , 8 , 32 , 32
|
|
32-bit integer add extended-precision add subtract extended-precision subtract , 32 , 48 , 160 , 128
|
|
32-bit integer multiply multiply-add extended-precision multiply-add , 16 , 16 , 32 , Multiple instructions
|
|
32-bit integer shift , 16 , 16 , 32 , 64
|
|
compare minimum maximum , 32 , 48 , 160 , 64
|
|
32-bit integer bit reverse bit field extract/insert , 16 , 16 , 32 , 64
|
|
32-bit bitwise AND / OR / XOR , 32 , 160 , 160 , 128
|
|
count of leading zeros most significant non-sign bit , 16 , 16 , 32 , Multiple instructions
|
|
population count , 16 , 16 , 32 , 32
|
|
warp shuffle , N/A , N/A , 32 , 32
|
|
sum of absolute difference , 16 , 16 , 32 , 64
|
|
SIMD video instructions vabsdiff2 , N/A , N/A , 160 , Multiple instructions
|
|
SIMD video instructions vabsdiff4 , N/A , N/A , 160 , Multiple instructions
|
|
All other SIMD video instructions , 16 , 16 , 32 , Multiple instructions
|
|
Type conversions from 8/16-bit integer to 32-bit types , 16 , 16 , 128 , 32
|
|
Type conversions from and to 64-bit types , 16 , 4 , 8 , 4
|
|
All other type conversions , 16 , 16 , 32 , 32
|
|
|
|
|
|
Some tips:
|
|
* bit field operations are as fast as shift operations.
|