Jumping with DynASM
Directly continuing from the first DynASM example, one obvious optimisation would be to write the remaining loop of run_job
in assembly, thereby avoiding a function call on every iteration. This idea leads to the following version of transcode.dasm
:
|.arch x64
|.actionlist transcode_actionlist
|.section code
|.globals GLOB_
static void emit_transcoder(Dst_DECL, transcode_job_t* job)
{
| jmp ->loop_test
|->loop_body:
| dec r8
for(int f = 0; f < job->num_fields; ++f)
{
field_info_t* field = job->fields + f;
switch(field->byte_width)
{
case 4:
| mov eax, [rcx + field->input_offset]
if(field->input_endianness != field->output_endianness) {
| bswap eax
}
| mov [rdx + field->output_offset], eax
break;
case 8:
| mov rax, [rcx + field->input_offset]
if(field->input_endianness != field->output_endianness) {
| bswap rax
}
| mov [rdx + field->output_offset], rax
break;
default:
throw std::exception("TODO: Other byte widths");
}
}
| add rcx, job->input_record_size
| add rdx, job->output_record_size
|->loop_test:
| test r8, r8
| jnz ->loop_body
| ret
In order, the changes to note are:
- The addition of the following:
|.globals GLOB_
- The addition of the following loop head:
| jmp ->loop_test |->loop_body: | dec r8
- The addition of the following loop tail:
| add rcx, job->input_record_size | add rdx, job->output_record_size |->loop_test: | test r8, r8 | jnz ->loop_body
->
prefix is DynASM's notation for so-called global labels, then the syntax becomes the same as in any other assembler: labels are introduced by suffixing them with a colon, and are jumped to by being used as an operand to a jump instruction. As well as global labels, DynASM also supports so-called local labels. The defining difference between the two is that an assembly fragment containing a global label can only be emitted once, whereas local labels can be emitted an unlimited number of times. As a consequence, when jumping to a local label, you need to specify whether to jump backwards to the nearest previous emission of that label, or forwards to the next subsequent emission of that label. As global labels can only be emitted once, so no such specification is needed.
Label type | Syntax | Usage | Available names | Maximum emissions | Retrievable in C |
---|---|---|---|---|---|
Global | ->name: | jmp ->name | Any C identifier | 1 | Yes |
Local | name: | jmp >name (forward) orjmp <name (backward) | Integers between 1 and 9 | ∞ | No |
PC | =>expr: | jmp =>expr | Any C expression | N/A | No |
.globals
directive: its effect is to emit a C enumeration with the names of all global labels. For this example, it causes the following to be written in transcode.h
:
//|.globals GLOB_
enum {
GLOB_loop_test,
GLOB_loop_body,
GLOB__MAX
};
Now that we're using labels, we need to do slightly more initialisation work. In particular, between calling dasm_init
and dasm_setup
, we need to do the following:
void* global_labels[GLOB__MAX];
dasm_setupglobal(&state, global_labels, GLOB__MAX);
After calling dasm_encode
, the absolute address of ->loop_test:
will be stored in global_labels[GLOB_loop_test]
, and likewise the absolute address of ->loop_body:
will be stored in global_labels[GLOB_loop_body]
.
For completeness, the final C code is as follows:
void (*make_transcoder(transcode_job_t* job))(const void*, void*, int)
{
dasm_State* state;
int status;
void* code;
size_t code_size;
void* global_labels[GLOB__MAX];
dasm_init(&state, DASM_MAXSECTION);
dasm_setupglobal(&state, global_labels, GLOB__MAX);
dasm_setup(&state, transcode_actionlist);
emit_transcoder(&state, job);
status = dasm_link(&state, &code_size);
assert(status == DASM_S_OK);
code = VirtualAlloc(nullptr, code_size, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
status = dasm_encode(&state, code);
assert(status == DASM_S_OK);
dasm_free(&state);
return (void(*)(const void*, void*, int))code;
}
void run_job(transcode_job_t* job)
{
void (*transcode_n_records)(const void*, void*, int) = make_transcoder(job);
transcode_n_records(job->input, job->output, job->num_input_records);
}