My new favourite AArch64 CPU instruction: rotate then merge in to flags (RMIF)

I find myself writing some CPU emulators at the moment, which has caused the AArch64 (aka. ARM64) RMIF instruction to become my new favourite instruction. It takes a 64-bit general purpose register, rotates it right by a specified number of bits, then selectively merges the low four bits into the flags register. A 6-bit field in the instruction gives the rotate amount, and a 4-bit field in the instruction gives a mask of which flag bits to overwrite versus which to leave unchanged.

One use of rmif is to emulate the x86 bt reg, imm instruction, which extracts one bit from a general purpose register, writes that bit to the C flag, and leaves other flags unchanged. Thus bt reg, imm in x86 becomes rmif reg, #((imm - 1) & 63), #2 in AArch64.

At the other end of the spectrum is the x86 inc instruction, which adds 1 to a general purpose register, and then sets most flags based on this addition, but leaves the C flag unchanged. To emulate inc reg, we can first save off the old value of the C flag (via csinc tmp, wzr, wzr, cc or adc tmp, wzr, wzr), then do adds reg, reg, #1 to perform the addition and set all the flags, then rmif tmp, #63, #2 to restore the old value of the C flag.

As another example, the AArch32 muls instruction sets the N and Z flags based on the result of the multiplication, but leaves the C and V flags unchanged. To emulate this on AArch64, we can save off all the flags (mrs tmp, NZCV), then do the multiplication, then set N and Z based on the result but also clobber C and V (ands wzr, dst, dst or adds wzr, dst, #0), then restore the old values of C and V (rmif tmp, #28, #3).