1
Fork 0
blog/_posts/2024-01-02-oelf.md

167 lines
7.5 KiB
Markdown
Raw Permalink Normal View History

---
permalink: "/{{ year }}/{{ month }}/{{ day }}/oelf"
title: "oelf - Mach-O support for sqlelf"
2024-03-12 10:35:38 +00:00
published_date: "2024-01-02 13:00:00 +0100"
layout: post.liquid
data:
route: blog
tags:
- rust
2024-01-02 13:29:04 +00:00
excerpt: |
sqlelf lets you explore ELF objects through the power of SQL.
It turns any executable into a queryable database.
I added support for Mach-O binaries & libraries.
---
[sqlelf] lets you explore ELF objects through the power of SQL.
It turns any executable into a queryable database.
Any? No, just those in ELF format, the standard binary file format on most Unix and Linux systems.
But not on macOS.
macOS relies on the Mach object file format, short Mach-O and sqlelf doesn't support that.
The library it uses for parsing in theory does, but that failed on my machine.
It depends on a heavy C++ library and I didn't want to bother figuring out how to build and change that.
I still wanted [sqlelf for mach-o binaries][macho-support].
Luckily the hardest part was settled early on: [naming].
> <@fnordfish@mastodon.social>
> @jer sollte auf jeden Fall „Ölf“ heißen!
> (_translation:_ it should definitely be called „Ölf“)
So [oelf][oelf-py] exists now.
The [source code is on GitHub][oelf]
and [my fork of sqlelf adds it to sqlelf][macho-support] for easy use.
### Install
I have released pre-built versions of oelf,
but nothing is upstreamed to sqlelf yet.
You can install it from git:
```shell
pip install git+https://github.com/badboy/sqlelf@with-macho-support#egg=sqlelf
```
On my M1 MacBook sqlelf doesn't work out of the box.
sqlelf depends on [Capstone](https://www.capstone-engine.org/)
and the installed library coming with the Python wrapper is `x86_64` only,
so it won't load
That's fixable.
Assuming you installed into a Python venv `.venv`:
Install capstone from Homebrew, remove the bundled library and link to the global one instead:
```shell
brew install capstone
rm .venv/lib/python3.11/site-packages/capstone/lib/libcapstone.dylib
ln -s $(brew --cellar capstone)/5.0.1/lib/libcapstone.5.dylib .venv/lib/python3.11/site-packages/capstone/lib/libcapstone.dylib
```
### Usage
Invoke `sqlelf` and pass any number of Mach-O binaries as arguments.
This gives you an SQLite REPL, or you specify SQL commands with `--sql`.
For example sqlelf knows about libraries references by the binary:
```
$ sqlelf /usr/bin/grep --sql 'select * from macho_libs'
┌───────────────┬────────────────────────────┐
│ path │ lib │
│ /usr/bin/grep │ self │
│ /usr/bin/grep │ /usr/lib/libbz2.1.0.dylib │
│ /usr/bin/grep │ /usr/lib/liblzma.5.dylib │
│ /usr/bin/grep │ /usr/lib/libz.1.dylib │
│ /usr/bin/grep │ /usr/lib/libSystem.B.dylib │
└───────────────┴────────────────────────────┘
```
None of those `/usr/lib/*.dylib` actually exists in the filesystem though,
because Apple now ships those as a big bundled cache file instead.
I have not yet documented the schema nor is it anywhere near complete.
Use `.schema` to get an overview.
```
$ sqlelf /usr/bin/grep --sql '.schema macho_headers'
CREATE TABLE macho_headers(
path,
magic,
cputype,
cpusubtype,
filetype,
ncmds,
sizeofcmds,
flags,
reserved
);
```
Tables are persisted views over the data, so everything is in memory.
Most values are the raw values read from the file,
so you will have to look up what those values mean.
For example the headers include all sorts of magic numbers and file types as integers:
```
$ sqlelf /usr/bin/grep --sql 'select * from macho_headers'
┌───────────────┬────────────┬──────────┬────────────┬──────────┬───────┬────────────┬─────────┬──────────┐
│ path │ magic │ cputype │ cpusubtype │ filetype │ ncmds │ sizeofcmds │ flags │ reserved │
│ /usr/bin/grep │ 4277009103 │ 16777223 │ 3 │ 2 │ 21 │ 1688 │ 2097285 │ 0 │
└───────────────┴────────────┴──────────┴────────────┴──────────┴───────┴────────────┴─────────┴──────────┘
```
You can slice and dice the data as you wish[^1].
```
$ sqlelf /usr/bin/grep --sql "select name, type, global, n_value from macho_symbols where path = '/usr/bin/grep' limit 3"
┌─────────────────────┬────────┬────────┬────────────┐
│ name │ type │ global │ n_value │
│ radr://5614542 │ N_PBUD │ 0 │ 90260802 │
│ __mh_execute_header │ N_SECT │ 1 │ 4294967296 │
│ _BZ2_bzRead │ N_UNDF │ 1 │ 0 │
└─────────────────────┴────────┴────────┴────────────┘
```
### Status
I hacked together `oelf` in a matter of days.
I'm using the excellent [pyo3] to wrap [goblin]'s functionality into a Python package, built with [maturin].
It works reliably (yey for great tooling written in and for Rust!),
but so far I haven't documented much.
`oelf` itself is a bit inconsistent in how it exposes different data.
The `sqlelf` integration is really simple,
thanks to a nice extensible code structure of the project.
Now every newly exposed functionality in `oelf` needs only defining the schema of a table
and mapping the retrieved data to its columns.
I have yet to actually _use_ sqlelf myself more to explore binaries and all the data in there.
I also have only a bare understanding of the ELF format and much much less of the Mach-O format,
I'm just barely good at plugging together existing things.
Some things that might be good to do:
* Documentation (of course!)
* Add more sections and tables and "translate" magic values
* e.g. load commands (`macho_load_commands`) pretty-print the Rust object right now, this should be proper data in the column, maybe just JSON to begin with?
* Can we extract and parse system libraries from the shared dyld cache?
* Others [have built stuff](https://github.com/keith/dyld-shared-cache-extractor)
* Can we more lazily fetch data instead of copying into a persisted table once?
* Upstream changes or fork it so `sqlelf` actually works out of the box on any non-Linux machine
---
_Footnotes_:
[^1]: `adr://5614542`: [this is a workaround for a workaround because of a bug in the old classic static linker.][radr]
[sqlelf]: https://fzakaria.com/2023/03/19/sqlelf-and-20-years-of-nix.html
[oelf]: https://github.com/badboy/oelf
[oelf-py]: https://pypi.org/project/oelf/
[my-toot]: https://hachyderm.io/@jer/111470860656151925
[naming]: https://hachyderm.io/@fnordfish@mastodon.social/111476474716125707
[macho-support]: https://github.com/badboy/sqlelf/tree/with-macho-support
[pyo3]: https://pyo3.rs/
[radr]: https://github.com/PureDarwin/PureDarwin/blob/a9f762d321016242bb95542301a91ecb4eb9bfd3/tools/cctools/misc/strip.c#L3789-L3817
[goblin]: https://crates.io/crates/goblin
[maturin]: https://maturin.rs/