More efficient code can be generated if the shift amount is known at
compile time. We can once again take advantage of shifts with the shift
amount in an 8-bit immediate to eliminate ECX as a scratch register,
reducing register pressure and removing the occasional spill. We can
also do 32-bit shifts instead of 64-bit operations.
We recognize four distinct cases:
- The special case where we're dealing with the PowerPC's quirky shift
amount masking. If the shift amount is a number from 32 to 63, all
bits are shifted out and the result it either all zeroes or all ones.
Before:
B9 F0 FF FF FF mov ecx,0FFFFFFF0h
8B F7 mov esi,edi
48 C1 E6 20 shl rsi,20h
48 D3 FE sar rsi,cl
8B C6 mov eax,esi
48 C1 EE 20 shr rsi,20h
85 F0 test eax,esi
0F 95 45 58 setne byte ptr [rbp+58h]
After:
8B F7 mov esi,edi
C1 FE 1F sar esi,1Fh
0F 95 45 58 setne byte ptr [rbp+58h]
- The shift amount is zero. Not calculation needs to be done, just clear
the carry flag.
Before:
B9 00 00 00 00 mov ecx,0
49 C1 E5 20 shl r13,20h
49 D3 FD sar r13,cl
41 8B C5 mov eax,r13d
49 C1 ED 20 shr r13,20h
44 85 E8 test eax,r13d
0F 95 45 58 setne byte ptr [rbp+58h]
After:
C6 45 58 00 mov byte ptr [rbp+58h],0
- The carry flag doesn't need to be computed. Just do the arithmetic
shift.
Before:
B9 02 00 00 00 mov ecx,2
48 C1 E7 20 shl rdi,20h
48 D3 FF sar rdi,cl
48 C1 EF 20 shr rdi,20h
After:
C1 FF 02 sar edi,2
- The carry flag must be computed. In addition to the arithmetic shift,
we do a shift to the left and and them together to know if any ones
were shifted out. It's still better than before, because we can do
32-bit shifts.
Before:
B9 02 00 00 00 mov ecx,2
49 C1 E5 20 shl r13,20h
49 D3 FD sar r13,cl
41 8B C5 mov eax,r13d
49 C1 ED 20 shr r13,20h
44 85 E8 test eax,r13d
0F 95 45 58 setne byte ptr [rbp+58h]
After:
41 8B C5 mov eax,r13d
41 C1 FD 02 sar r13d,2
C1 E0 1E shl eax,1Eh
44 85 E8 test eax,r13d
0F 95 45 58 setne byte ptr [rbp+58h]