From 13be151e6c08fafeaa0077d350b7d035214887f4 Mon Sep 17 00:00:00 2001 From: Jan-Erik Rediger Date: Tue, 2 Jan 2024 13:38:28 +0200 Subject: [PATCH] new post: oelf - Mach-O support for sqlelf --- _posts/2024-01-02-oelf.md | 162 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 162 insertions(+) create mode 100644 _posts/2024-01-02-oelf.md diff --git a/_posts/2024-01-02-oelf.md b/_posts/2024-01-02-oelf.md new file mode 100644 index 0000000..6846458 --- /dev/null +++ b/_posts/2024-01-02-oelf.md @@ -0,0 +1,162 @@ +--- +permalink: "/{{ year }}/{{ month }}/{{ day }}/oelf" +title: "oelf - Mach-O support for sqlelf" +published_date: "2024-01-02 13:00:00 +0200" +layout: post.liquid +data: + route: blog + tags: + - rust +--- + +[sqlelf] lets you explore ELF objects through the power of SQL. +It turns any executable into a queryable database. +Any? No, just those in ELF format, the standard binary file format on most Unix and Linux systems. +But not on macOS. +macOS relies on the Mach object file format, short Mach-O and sqlelf doesn't support that. +The library it uses for parsing in theory does, but that failed on my machine. +It depends on a heavy C++ library and I didn't want to bother figuring out how to build and change that. + +I still wanted [sqlelf for mach-o binaries][macho-support]. +Luckily the hardest part was settled early on: [naming]. + +> <@fnordfish@mastodon.social> +> @jer sollte auf jeden Fall „Ölf“ heißen! +> (_translation:_ it should definitely be called „Ölf“) + +So [oelf][oelf-py] exists now. +The [source code is on GitHub][oelf] +and [my fork of sqlelf adds it to sqlelf][macho-support] for easy use. + +### Install + +I have released pre-built versions of oelf, +but nothing is upstreamed to sqlelf yet. +You can install it from git: + +```shell +pip install git+https://github.com/badboy/sqlelf@with-macho-support#egg=sqlelf +``` + +On my M1 MacBook sqlelf doesn't work out of the box. +sqlelf depends on [Capstone](https://www.capstone-engine.org/) +and the installed library coming with the Python wrapper is `x86_64` only, +so it won't load + +That's fixable. +Assuming you installed into a Python venv `.venv`: +Install capstone from Homebrew, remove the bundled library and link to the global one instead: + +```shell +brew install capstone +rm .venv/lib/python3.11/site-packages/capstone/lib/libcapstone.dylib +ln -s $(brew --cellar capstone)/5.0.1/lib/libcapstone.5.dylib .venv/lib/python3.11/site-packages/capstone/lib/libcapstone.dylib +``` + +### Usage + +Invoke `sqlelf` and pass any number of Mach-O binaries as arguments. +This gives you an SQLite REPL, or you specify SQL commands with `--sql`. + +For example sqlelf knows about libraries references by the binary: +``` +$ sqlelf /usr/bin/grep --sql 'select * from macho_libs' +┌───────────────┬────────────────────────────┐ +│ path │ lib │ +│ /usr/bin/grep │ self │ +│ /usr/bin/grep │ /usr/lib/libbz2.1.0.dylib │ +│ /usr/bin/grep │ /usr/lib/liblzma.5.dylib │ +│ /usr/bin/grep │ /usr/lib/libz.1.dylib │ +│ /usr/bin/grep │ /usr/lib/libSystem.B.dylib │ +└───────────────┴────────────────────────────┘ +``` + +None of those `/usr/lib/*.dylib` actually exists in the filesystem though, +because Apple now ships those as a big bundled cache file instead. + +I have not yet documented the schema nor is it anywhere near complete. +Use `.schema` to get an overview. + +``` +$ sqlelf /usr/bin/grep --sql '.schema macho_headers' +CREATE TABLE macho_headers( + path, + magic, + cputype, + cpusubtype, + filetype, + ncmds, + sizeofcmds, + flags, + reserved +); +``` + +Tables are persisted views over the data, so everything is in memory. +Most values are the raw values read from the file, +so you will have to look up what those values mean. + +For example the headers include all sorts of magic numbers and file types as integers: + +``` +$ sqlelf /usr/bin/grep --sql 'select * from macho_headers' +┌───────────────┬────────────┬──────────┬────────────┬──────────┬───────┬────────────┬─────────┬──────────┐ +│ path │ magic │ cputype │ cpusubtype │ filetype │ ncmds │ sizeofcmds │ flags │ reserved │ +│ /usr/bin/grep │ 4277009103 │ 16777223 │ 3 │ 2 │ 21 │ 1688 │ 2097285 │ 0 │ +└───────────────┴────────────┴──────────┴────────────┴──────────┴───────┴────────────┴─────────┴──────────┘ +``` + +You can slice and dice the data as you wish[^1]. + +``` +$ sqlelf /usr/bin/grep --sql "select name, type, global, n_value from macho_symbols where path = '/usr/bin/grep' limit 3" +┌─────────────────────┬────────┬────────┬────────────┐ +│ name │ type │ global │ n_value │ +│ radr://5614542 │ N_PBUD │ 0 │ 90260802 │ +│ __mh_execute_header │ N_SECT │ 1 │ 4294967296 │ +│ _BZ2_bzRead │ N_UNDF │ 1 │ 0 │ +└─────────────────────┴────────┴────────┴────────────┘ +``` + +### Status + +I hacked together `oelf` in a matter of days. +I'm using the excellent [pyo3] to wrap [goblin]'s functionality into a Python package, built with [maturin]. +It works reliably (yey for great tooling written in and for Rust!), +but so far I haven't documented much. +`oelf` itself is a bit inconsistent in how it exposes different data. +The `sqlelf` integration is really simple, +thanks to a nice extensible code structure of the project. +Now every newly exposed functionality in `oelf` needs only defining the schema of a table +and mapping the retrieved data to its columns. + +I have yet to actually _use_ sqlelf myself more to explore binaries and all the data in there. +I also have only a bare understanding of the ELF format and much much less of the Mach-O format, +I'm just barely good at plugging together existing things. + +Some things that might be good to do: + +* Documentation (of course!) +* Add more sections and tables and "translate" magic values + * e.g. load commands (`macho_load_commands`) pretty-print the Rust object right now, this should be proper data in the column, maybe just JSON to begin with? +* Can we extract and parse system libraries from the shared dyld cache? + * Others [have built stuff](https://github.com/keith/dyld-shared-cache-extractor) +* Can we more lazily fetch data instead of copying into a persisted table once? +* Upstream changes or fork it so `sqlelf` actually works out of the box on any non-Linux machine + +--- + +_Footnotes_: + +[^1]: `adr://5614542`: [this is a workaround for a workaround because of a bug in the old classic static linker.][radr] + +[sqlelf]: https://fzakaria.com/2023/03/19/sqlelf-and-20-years-of-nix.html +[oelf]: https://github.com/badboy/oelf +[oelf-py]: https://pypi.org/project/oelf/ +[my-toot]: https://hachyderm.io/@jer/111470860656151925 +[naming]: https://hachyderm.io/@fnordfish@mastodon.social/111476474716125707 +[macho-support]: https://github.com/badboy/sqlelf/tree/with-macho-support +[pyo3]: https://pyo3.rs/ +[radr]: https://github.com/PureDarwin/PureDarwin/blob/a9f762d321016242bb95542301a91ecb4eb9bfd3/tools/cctools/misc/strip.c#L3789-L3817 +[goblin]: https://crates.io/crates/goblin +[maturin]: https://maturin.rs/