Callbacks with the LuaJIT FFI
The foreign function interface (FFI) present in the latest beta releases of LuaJIT is really nice for when you need to do things outside the Lua world. At runtime you pass it some C function definitions, and then everything you've defined becomes callable, and these calls are subsequently JIT compiled in the most efficient way possible. For example, we can define and then call some Windows API functions:
ffi = require "ffi"
ffi.cdef [[
typedef void* HWND;
HWND FindWindowA(const char* lpClassName, const char* lpWindowName);
int GetWindowTextA(HWND hWnd, char* lpString, int nMaxCount); ]]
len = 300
buffer = ffi.new("char[?]", len)
window = ffi.C.FindWindowA("Notepad++", nil)
len = ffi.C.GetWindowTextA(window, buffer, len)
print(ffi.string(buffer, len)) --> C:\Lua\ffi_example.lua - Notepad++
This is fine and dandy for calling C from Lua, but things get rather more complicated with callbacks from C back to Lua. For example, the Windows EnumChildWindows
function accepts a callback, which gets calls for every child of the given window. LuaJIT will happily accept and understand the definition of this function:
ffi.cdef [[
typedef void* HWND;
typedef bool (*WNDENUMPROC)(HWND, long);
bool EnumChildWindows(HWND hWndParent, WNDENUMPROC lpEnumFunc, long lParam); ]]
You quickly run into a problem if you try to call it though, as you realise that the LuaJIT FFI currently lacks support for turning Lua functions into something which can called from C. At this point, most people would acknowledge that the FFI isn't yet complete, and then go to write their own C glue around EnumChildWindows
using the traditional (slow) Lua C API. On the other hand, if you're feeling foolhardy, then you can fight the FFI to get callbacks working, and do so without resorting to any external C code. Naturally, this is what we'll do.
Our strategy will be to perform some control flow contortions so that when EnumChildWindows
calls the callback, it infact returns to Lua, then Lua calls back to resume the enumeration. If we could write it in Lua, then it might look something like:
EnumChildWindows = coroutine.wrap(function()
while true do
ffi.C.EnumChildWindows(coroutine.yield(), function(hWnd)
coroutine.yield(hWnd)
end, nil)
end
end)
Naturally we cannot write this in Lua, but we can write it in machine code, and we can then use the FFI to load and execute machine code. The coroutine trickery will be done by the Windows fiber API, as fibers are fairly similar to coroutines.
To start with, ConvertThreadToFiber
can be called to convert the currently running thread into a fiber and return the handle to the fiber. Though if the thread is already a fiber then we run into a problem, as GetCurrentFiber
is a macro rather than a function, and hence is not callable by the FFI. For now we'll ignore this issue, but it will be addressed later. Next we can call VirtualAlloc
to allocate some executable memory, use ffi.copy
to copy some machine code into said executable memory, then call the equivalent of coroutine.wrap
, which is CreateFiber
. In code, this looks like:
ffi.cdef [[
void* ConvertThreadToFiber(void* lpParameter);
typedef void (*LPFIBER_START_ROUTINE)(void*);
void* CreateFiber(size_t dwStackSize, LPFIBER_START_ROUTINE lpStartAddress, void* lpParameter);
void* VirtualAlloc(void* lpAddress, size_t dwSize, uint32_t flAllocationType, uint32_t flProtect); ]]
our_fiber = ffi.C.ConvertThreadToFiber(nil)
machine_code = "TODO"
procs = ffi.C.VirtualAlloc(nil, #machine_code + 1, 0x3000, 0x40)
ffi.copy(procs, machine_code)
contortion_fiber = ffi.C.CreateFiber(1024, ffi.cast("LPFIBER_START_ROUTINE", procs), nil)
The next task is to replace the TODO with the machine code equivalent of the following pseudo-C:
for(;;) {
EnumChildWindows(coroutine.yield(), EnumerationProcedure, 0);
}
BOOL EnumerationProcedure(HWND hWnd, void* lpParam) {
coroutine.yield(hWnd);
return TRUE;
}
First of all we need to make the pseudo-C slightly more C-like. In particular, the above still uses a hypothetical coroutine.yield
. The fiber API presents a SwitchToFiber
function, which differs from coroutine.yield
in that it doesn't support parameters or return values, and it requires telling which fiber to switch to. We thus end up with something like:
void* our_fiber; // The result of ConvertThreadToFiber.
void* transfer_slot[2]; // To yield a value, put the value in [0] and a non-NULL value in [1].
// To yield nothing, put anything in [0] and NULL in [1].
for(;;) {
EnumChildWindows(transfer_slot[0], enum_proc, 0);
transfer_slot[1] = NULL;
SwitchToFiber(our_fiber);
}
BOOL EnumerationProcedure(HWND hWnd, void* lpParam) {
transfer_slot[0] = hWnd;
SwitchToFiber(our_fiber);
return TRUE;
}
Next we need to convert this down to assembly code, firstly for x86:
fiber_proc:
push 0
push enum_proc
mov eax, dword ptr [transfer_slot]
push eax
call EnumChildWindows
mov dword ptr [transfer_slot + 4], 0
push our_fiber
call SwitchToFiber
jmp fiber_proc
enum_proc:
mov eax, dword ptr [esp+4]
mov dword ptr [transfer_slot + 4], eax
push our_fiber
call SwitchToFiber
mov eax, 1
retn 8
And secondly for x64:
fiber_proc:
sub rsp, 28h
after_prologue:
mov rcx, qword ptr [rip->transfer_slot]
lea rdx, qword ptr [rip->enum_proc]
call qword ptr [rip->EnumChildWindows]
mov qword ptr [rip->transfer_slot + 8], 0
mov rcx, qword ptr [rip->our_fiber]
call qword ptr [rip->SwitchToFiber]
jmp after_prologue
enum_proc:
sub rsp, 28h
mov qword ptr [rip->transfer_slot], rcx
mov rcx, qword ptr [rip->our_fiber]
call qword ptr [rip->SwitchToFiber]
mov rax, 1
add rsp, 28h
ret
transfer_slot: dq
dq
EnumChildWindows: dq
our_fiber: dq
SwitchToFiber: dq
At this point, we return to our earlier problem of GetCurrentFiber
being a macro, and note that it boils down to the following assembly code, firstly for x86:
mov eax, dword ptr fs:[10h]
ret
And similarly for x64:
mov rax, qword ptr gs:[20h]
ret
Now we can convert the assembly down to machine code, and put everything together:
local ffi = require "ffi"
-- The definitions we want to use.
ffi.cdef [[
typedef void* HWND;
typedef bool (*WNDENUMPROC)(HWND, long);
bool EnumChildWindows(HWND hWndParent, WNDENUMPROC lpEnumFunc, long lParam);
int GetWindowTextA(HWND hWnd, char* lpString, int nMaxCount); ]]
-- Extra definitions we need for performing contortions with fibers.
ffi.cdef [[
void* ConvertThreadToFiber(void* lpParameter);
void SwitchToFiber(void* lpFiber);
typedef void (*LPFIBER_START_ROUTINE)(void*);
void* CreateFiber(size_t dwStackSize, LPFIBER_START_ROUTINE lpStartAddress, void* lpParameter);
uint32_t GetLastError(void);
void* VirtualAlloc(void* lpAddress, size_t dwSize, uint32_t flAllocationType, uint32_t flProtect);
bool RtlAddFunctionTable(void* FunctionTable, uint32_t EntryCount, void* BaseAddress); ]]
local EnumChildWindows
do
local GetLastError = ffi.C.GetLastError
local contortion_fiber
local procs
local transfer_slot
local init_callbacks
if ffi.arch == "x86" then
init_callbacks = function()
-- Ensure that the thread is a fiber, converting if required.
local our_fiber = ffi.C.ConvertThreadToFiber(nil)
if our_fiber == nil and GetLastError() ~= 1280 then
error("Unable to convert thread to fiber")
end
transfer_slot = ffi.new("void*[2]")
-- fiber_proc: for(;;) {
-- EnumChildWindows(transfer_slot[0], enum_proc, 0);
-- transfer_slot[1] = 0; // to mark end of iteration
-- SwitchToFiber(our_fiber);
-- }
local asm = "\x6A\x00" -- push 0
.. "\x68????" -- push ????
.. "\xA1????\x50" -- mov eax, dword ptr [????], push eax
.. "\xE8????" -- call ????
.. "\xC7\x05????\x00\x00\x00\x00" -- mov dword ptr [????], 0
.. "\x68????" -- push ????
.. "\xE8????" -- call ????
.. "\xEB\xD8" -- jmp $-40
-- enum_proc: transfer_slot[0] = *(esp+4); // the HWND
-- SwitchToFiber(our_fiber);
-- return TRUE;
.. "\x8B\x44\x24\x04" -- mov eax, dword ptr [esp+4]
.. "\x3E\xA3????" -- mov dword ptr [????], eax
.. "\x68????" -- push ????
.. "\xE8????" -- call ????
.. "\x33\xC0\x40" -- mov eax, 1
.. "\xC2\x08" -- retn 8 (*)
procs = ffi.C.VirtualAlloc(nil, #asm + 1, 0x3000, 0x40)
if our_fiber == nil then
-- GetCurrentFiber()
ffi.copy(procs, "\x64\xA1\x10\x00\x00\x00\xC3") -- return __readfsdword(0x10)
our_fiber = ffi.cast("void*(*)(void)", procs)()
end
ffi.copy(procs, asm)
local function fixup(offset, ptr, isrelative)
local dst = ffi.cast("char*", procs) + offset
ptr = ffi.cast("char*", ptr)
if isrelative then
ptr = ffi.cast("char*", ptr - (dst + 4))
end
ffi.cast("char**", dst)[0] = ptr
end
fixup( 3, ffi.cast("char*", procs) + 40)
fixup( 8, transfer_slot)
fixup(14, ffi.C.EnumChildWindows, true)
fixup(20, transfer_slot + 1)
fixup(29, our_fiber)
fixup(34, ffi.C.SwitchToFiber, true)
fixup(46, transfer_slot)
fixup(51, our_fiber)
fixup(56, ffi.C.SwitchToFiber, true)
contortion_fiber = ffi.C.CreateFiber(1024, ffi.cast("LPFIBER_START_ROUTINE", procs), nil)
init_callbacks = function() end
end
elseif ffi.arch == "x64" then
init_callbacks = function()
-- Ensure that the thread is a fiber, converting if required.
local our_fiber = ffi.C.ConvertThreadToFiber(nil)
if our_fiber == nil and GetLastError() ~= 1280 then
error("Unable to convert thread to fiber")
end
-- fiber_proc: for(;;) {
-- EnumChildWindows(transfer_slot[0], enum_proc, 0);
-- transfer_slot[1] = 0; // to mark end of iteration
-- SwitchToFiber(our_fiber);
-- }
local asm = "\x48\x83\xEC\x28" -- sub rsp, 28h
.. "\x48\x8B\x0D\x75\x00\x00\x00" -- mov rcx, [rip->transfer_slot_0]
.. "\x48\x8D\x15\x26\x00\x00\x00" -- lea rdx, [rip->enum_proc]
.. "\x48\xFF\x15\x77\x00\x00\x00" -- call [rip->EnumChildWindows]
.. "\x48\xC7\x05\x64\x00\x00\x00\x00\x00\x00\x00" -- mov [rip->transfer_slot_1], 0
.. "\x48\x8B\x0D\x6D\x00\x00\x00" -- mov rcx, [rip->our_fiber]
.. "\x48\xFF\x15\x6E\x00\x00\x00" -- call [rip->SwitchToFiber]
.. "\xEB\xD9" -- jmp $-48
.. "\x90\x90\x90\x90" -- pad 8
-- enum_proc: transfer_slot[0] = rcx; // the HWND
-- SwitchToFiber(our_fiber);
-- return TRUE;
.. "\x48\x83\xEC\x28" -- sub rsp, 28h
.. "\x48\x89\x0D\x3D\x00\x00\x00" -- mov [rip->transfer_slot_0], rcx
.. "\x48\x8B\x0D\x4E\x00\x00\x00" -- mov rcx, [rip->our_fiber]
.. "\x48\xFF\x15\x4F\x00\x00\x00" -- call [rip->SwitchToFiber]
.. "\x48\xC7\xC0\x01\x00\x00\x00" -- mov rax, 1
.. "\x48\x83\xC4\x28" -- add rsp, 28h
.. "\xC3" -- ret
.. "\x90\x90\x90" -- pad 8
-- unwind data
.. "\0\0\0\0\52\0\0\0\120\0\0\0"
.. "\56\0\0\0\93\0\0\0\120\0\0\0"
.. "\1\4\1\0\4\66"
-- pad 8
-- mutable data
-- transfer_slot_0
-- transfer_slot_1
-- EnumChildWindows
-- our_fiber
-- SwitchToFiber
procs = ffi.C.VirtualAlloc(nil, #asm + 42, 0x103000, 0x40)
if our_fiber == nil then
-- GetCurrentFiber()
ffi.copy(procs, "\x65\x48\x8B\x04\x25\x20\x00\x00\x00\xC3") -- return __readgsqword(0x20)
our_fiber = ffi.cast("void*(*)(void)", procs)()
end
ffi.copy(procs, asm)
transfer_slot = ffi.cast("void**", ffi.cast("char*", procs) + 128)
transfer_slot[2] = ffi.cast("void*", ffi.C.EnumChildWindows)
transfer_slot[3] = ffi.cast("void*", our_fiber)
transfer_slot[4] = ffi.cast("void*", ffi.C.SwitchToFiber)
ffi.C.RtlAddFunctionTable(ffi.cast("void*", ffi.cast("char*", procs) + 96), 2, procs)
contortion_fiber = ffi.C.CreateFiber(1024, ffi.cast("LPFIBER_START_ROUTINE", procs), nil)
init_callbacks = function() end
end
else
error("Only x86 and x64 are supported")
end
EnumChildWindows = function(wnd)
init_callbacks()
transfer_slot[0] = wnd
transfer_slot[1] = ffi.cast("void*", 1)
local results = {}
while true do
ffi.C.SwitchToFiber(contortion_fiber)
if transfer_slot[1] == nil then
return results
else
results[#results + 1] = transfer_slot[0]
end
end
end
end
With this mass of heavy complicated machinery in place, we can finally perform our original goal of enumerating windows:
local buffer = ffi.new("char[?]", 300)
for _, window in ipairs(EnumChildWindows(nil)) do
local len = ffi.C.GetWindowTextA(window, buffer, 300)
if len ~= 0 then
print(ffi.string(buffer, len))
end
end
I freely admit that this solution isn't at all elegant, but it does show that callbacks are possible with the current LuaJIT FFI, without the need of resorting to additional C libraries.