Windows ARM64 Frame Unwind Code Details
This post is intended to supplement MSDN's ARM64 exception handling information. To start, here's a fleshed-out version of a table listing the available unwind codes:
Name | Encoding | Prolog Instruction | Epilog Instruction | Unwind Effect |
---|---|---|---|---|
alloc_s | 000iiiii | sub sp, sp, i*16 | add sp, sp, i*16 | Emulate epilog instruction |
alloc_m | 11000iii iiiiiiii | sub sp, sp, i*16 | add sp, sp, i*16 | Emulate epilog instruction |
alloc_l | 11100000 i:u24 | sub sp, sp, i*16 | add sp, sp, i*16 | Emulate epilog instruction |
add_fp | 11100010 i:u8 | add fp, sp, i*8 (NB: not 16 ) | sub sp, fp, i*8 (NB: not 16 ) | Emulate epilog instruction |
set_fp | 11100001 | mov fp, sp | mov sp, fp | Emulate epilog instruction |
pac_sign_lr | 11111100 | pacibsp (sign lr using sp ) | autibsp (authenticate lr using sp ) | Emulate autibsp or xpaclri |
save_fplr | 01iiiiii | stp fp, lr, [sp+i*8] | ldp fp, lr, [sp+i*8] | Emulate epilog instruction |
save_lrpair | 1101011n nniiiii | stp x(19+2*n), lr, [sp+i*8] | ldp x(19+2*i), lr, [sp+i*8] | Emulate epilog instruction |
save_fplr_x | 10iiiiii | stp fp, lr, [sp-(i+1)*8]! | ldp fp, lr, [sp], (i+1)*8 | Emulate epilog instruction |
save_regp | 110010nn nniiiiii | stp x(19+n), x(20+n), [sp+i*8] | ldp x(19+n), x(20+n), [sp+i*8] | Emulate epilog instruction |
save_regp_x | 110011nn nniiiiii | stp x(19+n), x(20+n), [sp-(i+1)*8]! | ldp x(19+n), x(20+n), [sp], (i+1)*8 | Emulate epilog instruction |
save_r19r20_x | 001iiiii | stp x19, x20, [sp-i*8]! (NB: not i+1 ) | ldp x19, x20, [sp], i*8 (NB: not i+1 ) | Emulate epilog instruction |
save_next | 11100110 | stp similar to previous non-lr stp † | ldp similar to next non-lr ldp † | Extend next effect by two † |
save_reg | 110100nn nniiiiii | str x(19+n), [sp+i*8] | ldr x(19+n), [sp+i*8] | Emulate epilog instruction |
save_reg_x | 1101010n nnniiiii | str x(19+n), [sp-(i+1)*8]! | ldr x(19+n), [sp], (i+1)*8 | Emulate epilog instruction |
save_fregp | 1101100n nniiiiii | stp d(8+n), d(9+n), [sp+i*8] | ldp d(8+n), d(9+n), [sp+i*8] | Emulate epilog instruction |
save_fregp_x | 1101101n nniiiiii | stp d(8+n), d(9+n), [sp-(i+1)*8]! | ldp d(8+n), d(9+n), [sp], (i+1)*8 | Emulate epilog instruction |
save_freg | 1101110n nniiiiii | str d(8+n), [sp+i*8] | ldr d(8+n), [sp+i*8] | Emulate epilog instruction |
save_freg_x | 11011110 nnniiiii | str d(8+n), [sp-(i+1)*8]! | ldr d(8+n), [sp], (i+1)*8 | Emulate epilog instruction |
save_any_reg | 11100111 x:u16 * | One str or stp to stack * | One ldr or ldp from stack * | Varies * |
custom | 11101xxx | No instruction | No instruction | Bespoke |
reserved | 11110xxx | No instruction | No instruction | Unwind fails |
reserved | 11111000 x:u8 | Any one instruction | Any one instruction | No effect (yet) |
reserved | 11111001 x:u16 | Any one instruction | Any one instruction | No effect (yet) |
reserved | 11111010 x:u24 | Any one instruction | Any one instruction | No effect (yet) |
reserved | 11111011 x:u32 | Any one instruction | Any one instruction | No effect (yet) |
reserved | 11111101 | Any one instruction | Any one instruction | No effect (yet) |
reserved | 11111110 | Any one instruction | Any one instruction | No effect (yet) |
reserved | 11111111 | Any one instruction | Any one instruction | No effect (yet) |
nop | 11100011 | Any one instruction | Any one instruction | No effect |
end_c | 11100101 | No instruction, start of prolog ‡ | Any one instruction, then end of epilog ‡ | No effect |
end | 11100100 | No instruction, start of prolog | ret (or tailcall b ), then end of epilog | Unwind complete |
(*) save_any_reg
was added for Arm64EC. The 16-bit payload is made from a number of fields, packed as rpxnnnnn
mmiiiiii
. This gives rise to many different variants:
Name | Encoding | Prolog Instruction | Epilog Instruction | Unwind Effect |
---|---|---|---|---|
save_any_reg | 11100111 000nnnnn 00iiiiii | str x(0+n), [sp+i*8] | ldr x(0+n), [sp+i*8] | Emulate epilog instruction |
save_any_reg | 11100111 001nnnnn 00iiiiii | str x(0+n), [sp-(i+1)*16]! | ldr x(0+n), [sp], (i+1)*16 | Emulate epilog instruction |
save_any_reg | 11100111 010nnnnn 00iiiiii | stp x(0+n), x(1+n), [sp+i*16] | ldp x(0+n), x(1+n), [sp+i*16] | Emulate epilog instruction |
save_any_reg | 11100111 011nnnnn 00iiiiii | stp x(0+n), x(1+n), [sp-(i+1)*16]! | ldp x(0+n), x(1+n), [sp], (i+1)*16 | Emulate epilog instruction |
save_any_reg | 11100111 000nnnnn 01iiiiii | str d(0+n), [sp+i*8] | ldr d(0+n), [sp+i*8] | Emulate epilog instruction |
save_any_reg | 11100111 001nnnnn 01iiiiii | str d(0+n), [sp-(i+1)*16]! | ldr d(0+n), [sp], (i+1)*16 | Emulate epilog instruction |
save_any_reg | 11100111 010nnnnn 01iiiiii | stp d(0+n), d(1+n), [sp+i*16] | ldp d(0+n), d(1+n), [sp+i*16] | Emulate epilog instruction |
save_any_reg | 11100111 011nnnnn 01iiiiii | stp d(0+n), d(1+n), [sp-(i+1)*16]! | ldp d(0+n), d(1+n), [sp], (i+1)*16 | Emulate epilog instruction |
save_any_reg | 11100111 000nnnnn 10iiiiii | str q(0+n), [sp+i*16] (NB: not 8 ) | ldr q(0+n), [sp+i*16] (NB: not 8 ) | Emulate epilog instruction |
save_any_reg | 11100111 001nnnnn 10iiiiii | str q(0+n), [sp-(i+1)*16]! | ldr q(0+n), [sp], (i+1)*16 | Emulate epilog instruction |
save_any_reg | 11100111 010nnnnn 10iiiiii | stp q(0+n), q(1+n), [sp+i*16] | ldp q(0+n), q(1+n), [sp+i*16] | Emulate epilog instruction |
save_any_reg | 11100111 011nnnnn 10iiiiii | stp q(0+n), q(1+n), [sp-(i+1)*16]! | ldp q(0+n), q(1+n), [sp], (i+1)*16 | Emulate epilog instruction |
reserved | 11100111 1xxxxxxx xxxxxxxx | Any one instruction | Any one instruction | Unwind fails |
reserved | 11100111 xxxxxxxx 11xxxxxx | Any one instruction | Any one instruction | Unwind fails |
(†) The save_next
unwind code requires further explanation. It causes the next unwind code to load four registers rather than two. It can be stacked; a sequence of N
save_next
unwind codes causes the next unwind code thereafter to load (N+1)*2
registers. Said next unwind code must be one of save_regp
/ save_regp_x
/ save_r19r20_x
/ save_fregp
/ save_fregp_x
(notably, this excludes pair instructions with lr
in their name). As examples:
- The combined effect of
save_next
save_regp
is to loadx(19+n)
fromsp+i*8
,x(20+n)
fromsp+i*8+8
,x(21+n)
fromsp+i*8+16
,x(22+n)
fromsp+i*8+24
. - The combined effect of
save_next
save_regp_x
is to loadx(19+n)
fromsp
,x(20+n)
fromsp+8
,x(21+n)
fromsp+16
,x(22+n)
fromsp+24
, then incrementsp
by(i+1)*8
. - The combined effect of
save_next
save_r19r20_x
is to loadx19
fromsp
,x20
fromsp+8
,x21
fromsp+16
,x22
fromsp+24
, then incrementsp
byi*8
. - The combined effect of
save_next
save_fregp
is to loadd(8+n)
fromsp+i*8
,d(9+n)
fromsp+i*8+8
,d(10+n)
fromsp+i*8+16
,d(11+n)
fromsp+i*8+24
. - The combined effect of
save_next
save_fregp_x
is to loadd(8+n)
fromsp
,d(9+n)
fromsp+8
,d(10+n)
fromsp+16
,d(11+n)
fromsp+24
, then incrementsp
by(i+1)*8
.
The MSDN documentation suggests that a sufficiently long sequence of save_next
codes can overflow from x
registers to d
registers, but best not to try this; stop at x31
or d15
.
(‡) In a prolog, end_c
corresponds to no instruction, and also causes subsequent codes to correspond to no instruction. In an epilog, end_c
corresponds to ret
(or b
, or any other one instruction that has no unwind effect), and causes subsequent codes to correspond to no instruction. In either case, subsequent codes (up to the first end
) are still executed during unwind.
Moving on, each .xdata
record describes a contiguous region of machine instructions. There is typically a 1:1 correspondence between regions and functions, though this needn't be true. A region contains:
- At most 218-1 machine instructions.
- At most one non-trivial prolog (at the start of the region), with associated unwind effects.
- At most 216-1 non-trivial epilogs (anywhere in the region), each with associated unwind effects.
- One list of unwind effects covering the entire region minus prologs/epilogs.
- At most 1020 bytes of encoded unwind effects.
- At most one exception handler address, covering the entire region minus prologs/epilogs (though the logic within the handler can make decisions based on
ControlPc
andControlPcIsUnwound
).
If any of the above limits would be exceeded, then the region needs to be artifically split into multiple smaller regions, such that no limit is exceeded. The MSDN documentation suggests that split boundaries should be chosen as to not be in the middle of an epilog (this suggestion would not be required if end_c
was treated as "No instruction, end of epilog", but alas it is treated as "Any one instruction, then end of epilog").
The length of the prolog is not explicitly given; it is calculated by analysing the unwind codes strictly preceding the first end
or end_c
and counting how many of them correspond to one instruction. Said unwind codes, once reversed, should correspond 1:1 with instructions in the prolog (nop
codes can be used to make things line up, if there are instructions in the prolog which do not need an unwind effect). If unwinding from within the prolog, we calculate how many instructions we are away from the end of the prolog, and skip that many unwind codes from the start of the list. To indicate a lack of prolog, the first unwind code should be end_c
(or end
if no unwind effects are required by the main body of the region). Note that unwind effects after end_c
are still executed if unwinding from within the prolog (though there's a Wine bug here).
The length of each epilog is not explicitly given; it is calculated by analysing the unwind codes starting at the epilog-specific offset and continuing up to and including the first end
or end_c
thereafter. Said unwind codes should correspond 1:1 with instructions in the epilog (nop
codes can be used to make things line up, if there are instructions in the epilog which do not need an unwind effect). If unwinding from within an epilog, we calculate how many instructions we are away from the start of the epilog, and skip that many unwind codes from the start of the list. Note that unwind effects after end_c
are still executed if unwinding from within an epilog (though again Wine bug).
If the E
bit is set in the .xdata
header, then the end of the epilog is equal to the end of the region (the start of the epilog is found by calculating the length of the epilog and subtracting this from the end). If the E
bit is not set, then the start of each epilog is explicitly specified.
If unwinding from a PC not within a prolog or epilog, unwind codes are executed until the first end
is reached. Note that these codes are shared with those describing the prolog; this is not normally a problem, but if it is, the prolog can be split off to a separate region consisting solely of prolog, leaving no prolog in the original region.
If the X
bit is set in the .xdata
header, and PC is not within a prolog or epilog, then the specified function is called during exception handler search and during exception unwinding. See __C_specific_handler for the prototype of this function, and see LuaJIT for a concrete example.