Replace the instruction with the scalar variant FCVT instruction.
FCVT{N,L} 8 cycles latency on the Cortex A57
FCVT has five cycle latency and slightly higher throughput
On the A72 all three of these instructions will have three cycle latency,
While FCVT{N,L} will have half the throughput.