Dynamically loadable shared libraries typically come in one of a few formats:
- As Mach-O files with the
.dylib
extension on OSX.
- As ELF files with the
.so
extension on Linux.
- As PE files with the
.dll
extension on Windows.
The whole point of dynamically loadable shared libraries is to export symbols,
and these formats typically store exported symbol information as a list of
exported symbols or a hash table of exported symbols. One nice property of
lists and hash tables is that they're finite by default: unless you deliberately
try to make them infinite, they'll be finite.
One oddity of the Mach-O format is that exported symbol information can be
represented as a trie. The term trie is meant to allude to tree, and trees
are also finite by default. However, a trie can also be thought of as a
directed rooted graph, and if that graph were to have a cycle, then the number
of paths in the graph would be infinite.
Let us begin with a file called finite.c
:
void corsix() {}
void corsix_() {}
#define C2(x) void corsix_##x() {}
#define C1(x) C2(x##a) C2(x##b) C2(x##c) C2(x##d) C2(x##e)
#define C0(x) C1(x##a) C1(x##b) C1(x##c) C1(x##d) C1(x##e)
C0(a) C0(b) C0(c) C0(d) C0(e)
We can compile this to a shared library like so:
$ clang finite.c -shared -o finite.dylib
This gives us a shared library called finite.dylib
which exports
127 symbols: corsix
, corsix_
, and the 125 symbols matching the
regex corsix_[a-e][a-e][a-e]
. These symbols aren't overly interesting,
and the sheer number of symbols is merely to ensure that the exported
symbol trie in finite.dylib
occupies sufficiently many bytes.
The exported symbol trie in finite.dylib
looks something like the
following diagram:
+-"a"-> corsix_a ...
|
+-"b"-> corsix_b ...
|
root -"_corsix"-> corsix -"_"-> corsix_ -+-"c"-> corsix_c ...
|
+-"d"-> corsix_d ...
|
+-"e"-> corsix_e ...
Our aim is to replace the exported symbol trie with something like the
following diagram:
+- <---"_"---+
| |
root -"_corsix"-+-> corsix --+
| |
+- <---"a"---+
| |
+- <---"b"---+
| |
+- <---"c"---+
| |
+- <---"d"---+
| |
+- <---"e"---+
With such a trie, the symbol originally called corsix
should now be
exported under all the names matching the regex corsix[_a-e]*
. We
could also go slightly further, adding more looping edges to the trie,
in order to reach corsix[_a-z0-9]*
.
We'll use the following transform.lua
program to do the dirty work of
trie replacement:
dylib = io.read"*a"
nof, pos, tsz = dylib:match"_corsix%z(.)()(.)"
node = dylib:sub(pos, pos + tsz:byte()) .. "\37" ..
("_abcdefghijklmnopqrstuvwxyz0123456789"):gsub(".", "%0\0" .. nof)
io.write(dylib:sub(1, pos-1) .. node .. dylib:sub(pos + #node))
Running the program like so will generate a file called infinite.dylib
:
$ lua transform.lua <finite.dylib >infinite.dylib
We'll then use the following client.cpp
program to query the exported
symbols of the two .dylib
files:
#include <dlfcn.h>
#include <stdio.h>
void check_dylib(const char* path) {
void* dylib = dlopen(path, RTLD_LOCAL);
printf("\nName lookup results in %s:\n", path);
const char* names[] = {
"foobar23", "corsix", "corsix_aaa", "corsix_abc",
"corsix_xyz", "corsix_foobar23", "corsix_dot_org"
};
for (const char* name : names) {
printf("%-15s -> %p\n", name, dlsym(dylib, name));
}
}
int main() {
check_dylib("./finite.dylib");
check_dylib("./infinite.dylib");
return 0;
}
Compiling and running gives the following output:
$ clang -std=c++11 client.cpp && ./a.out
Name lookup results in ./finite.dylib:
foobar23 -> 0x0
corsix -> 0x1076347b0
corsix_aaa -> 0x1076347d0
corsix_abc -> 0x107634840
corsix_xyz -> 0x0
corsix_foobar23 -> 0x0
corsix_dot_org -> 0x0
Name lookup results in ./infinite.dylib:
foobar23 -> 0x0
corsix -> 0x1076377b0
corsix_aaa -> 0x1076377b0
corsix_abc -> 0x1076377b0
corsix_xyz -> 0x1076377b0
corsix_foobar23 -> 0x1076377b0
corsix_dot_org -> 0x1076377b0
I don't know of any particularly useful reason for exporting an infinite number of symbols,
but it does trip up Apple's dyldinfo tool,
and it might also trip up other tools of a similar nature:
$ dyldinfo -export infinite.dylib
export information (from trie):
Segmentation fault: 11
$ dyldinfo -export_dot infinite.dylib
digraph {
node000;
node000 -> node011 [ label=_corsix ] ;
node011 [ label=_corsix,addr0x000007B0 ];
node011 -> node011 [ label=_ ] ;
node011 [ label=_corsix_,addr0x000007B0 ];
node011 -> node011 [ label=_ ] ;
node011 [ label=_corsix__,addr0x000007B0 ];
node011 -> node011 [ label=_ ] ;
node011 [ label=_corsix___,addr0x000007B0 ];
node011 -> node011 [ label=_ ] ;
node011 [ label=_corsix____,addr0x000007B0 ];
node011 -> node011 [ label=_ ] ;
node011 [ label=_corsix_____,addr0x000007B0 ];
node011 -> node011 [ label=_ ] ;
node011 [ label=_corsix______,addr0x000007B0 ];
node011 -> node011 [ label=_ ] ;
node011 [ label=_corsix_______,addr0x000007B0 ];
node011 -> node011 [ label=_ ] ;
node011 [ label=_corsix________,addr0x000007B0 ];
... 15000 lines of output ommitted ...
Segmentation fault: 11