This implementation is pretty efficient in my opinion. And "As long as we aren't falling back to interpreter we're winning a lot" applies to basically every instruction to some degree anyway.