167 lines
7.5 KiB
Markdown
167 lines
7.5 KiB
Markdown
---
|
|
permalink: "/{{ year }}/{{ month }}/{{ day }}/oelf"
|
|
title: "oelf - Mach-O support for sqlelf"
|
|
published_date: "2024-01-02 13:00:00 +0100"
|
|
layout: post.liquid
|
|
data:
|
|
route: blog
|
|
tags:
|
|
- rust
|
|
excerpt: |
|
|
sqlelf lets you explore ELF objects through the power of SQL.
|
|
It turns any executable into a queryable database.
|
|
I added support for Mach-O binaries & libraries.
|
|
---
|
|
|
|
[sqlelf] lets you explore ELF objects through the power of SQL.
|
|
It turns any executable into a queryable database.
|
|
Any? No, just those in ELF format, the standard binary file format on most Unix and Linux systems.
|
|
But not on macOS.
|
|
macOS relies on the Mach object file format, short Mach-O and sqlelf doesn't support that.
|
|
The library it uses for parsing in theory does, but that failed on my machine.
|
|
It depends on a heavy C++ library and I didn't want to bother figuring out how to build and change that.
|
|
|
|
I still wanted [sqlelf for mach-o binaries][macho-support].
|
|
Luckily the hardest part was settled early on: [naming].
|
|
|
|
> <@fnordfish@mastodon.social>
|
|
> @jer sollte auf jeden Fall „Ölf“ heißen!
|
|
> (_translation:_ it should definitely be called „Ölf“)
|
|
|
|
So [oelf][oelf-py] exists now.
|
|
The [source code is on GitHub][oelf]
|
|
and [my fork of sqlelf adds it to sqlelf][macho-support] for easy use.
|
|
|
|
### Install
|
|
|
|
I have released pre-built versions of oelf,
|
|
but nothing is upstreamed to sqlelf yet.
|
|
You can install it from git:
|
|
|
|
```shell
|
|
pip install git+https://github.com/badboy/sqlelf@with-macho-support#egg=sqlelf
|
|
```
|
|
|
|
On my M1 MacBook sqlelf doesn't work out of the box.
|
|
sqlelf depends on [Capstone](https://www.capstone-engine.org/)
|
|
and the installed library coming with the Python wrapper is `x86_64` only,
|
|
so it won't load
|
|
|
|
That's fixable.
|
|
Assuming you installed into a Python venv `.venv`:
|
|
Install capstone from Homebrew, remove the bundled library and link to the global one instead:
|
|
|
|
```shell
|
|
brew install capstone
|
|
rm .venv/lib/python3.11/site-packages/capstone/lib/libcapstone.dylib
|
|
ln -s $(brew --cellar capstone)/5.0.1/lib/libcapstone.5.dylib .venv/lib/python3.11/site-packages/capstone/lib/libcapstone.dylib
|
|
```
|
|
|
|
### Usage
|
|
|
|
Invoke `sqlelf` and pass any number of Mach-O binaries as arguments.
|
|
This gives you an SQLite REPL, or you specify SQL commands with `--sql`.
|
|
|
|
For example sqlelf knows about libraries references by the binary:
|
|
```
|
|
$ sqlelf /usr/bin/grep --sql 'select * from macho_libs'
|
|
┌───────────────┬────────────────────────────┐
|
|
│ path │ lib │
|
|
│ /usr/bin/grep │ self │
|
|
│ /usr/bin/grep │ /usr/lib/libbz2.1.0.dylib │
|
|
│ /usr/bin/grep │ /usr/lib/liblzma.5.dylib │
|
|
│ /usr/bin/grep │ /usr/lib/libz.1.dylib │
|
|
│ /usr/bin/grep │ /usr/lib/libSystem.B.dylib │
|
|
└───────────────┴────────────────────────────┘
|
|
```
|
|
|
|
None of those `/usr/lib/*.dylib` actually exists in the filesystem though,
|
|
because Apple now ships those as a big bundled cache file instead.
|
|
|
|
I have not yet documented the schema nor is it anywhere near complete.
|
|
Use `.schema` to get an overview.
|
|
|
|
```
|
|
$ sqlelf /usr/bin/grep --sql '.schema macho_headers'
|
|
CREATE TABLE macho_headers(
|
|
path,
|
|
magic,
|
|
cputype,
|
|
cpusubtype,
|
|
filetype,
|
|
ncmds,
|
|
sizeofcmds,
|
|
flags,
|
|
reserved
|
|
);
|
|
```
|
|
|
|
Tables are persisted views over the data, so everything is in memory.
|
|
Most values are the raw values read from the file,
|
|
so you will have to look up what those values mean.
|
|
|
|
For example the headers include all sorts of magic numbers and file types as integers:
|
|
|
|
```
|
|
$ sqlelf /usr/bin/grep --sql 'select * from macho_headers'
|
|
┌───────────────┬────────────┬──────────┬────────────┬──────────┬───────┬────────────┬─────────┬──────────┐
|
|
│ path │ magic │ cputype │ cpusubtype │ filetype │ ncmds │ sizeofcmds │ flags │ reserved │
|
|
│ /usr/bin/grep │ 4277009103 │ 16777223 │ 3 │ 2 │ 21 │ 1688 │ 2097285 │ 0 │
|
|
└───────────────┴────────────┴──────────┴────────────┴──────────┴───────┴────────────┴─────────┴──────────┘
|
|
```
|
|
|
|
You can slice and dice the data as you wish[^1].
|
|
|
|
```
|
|
$ sqlelf /usr/bin/grep --sql "select name, type, global, n_value from macho_symbols where path = '/usr/bin/grep' limit 3"
|
|
┌─────────────────────┬────────┬────────┬────────────┐
|
|
│ name │ type │ global │ n_value │
|
|
│ radr://5614542 │ N_PBUD │ 0 │ 90260802 │
|
|
│ __mh_execute_header │ N_SECT │ 1 │ 4294967296 │
|
|
│ _BZ2_bzRead │ N_UNDF │ 1 │ 0 │
|
|
└─────────────────────┴────────┴────────┴────────────┘
|
|
```
|
|
|
|
### Status
|
|
|
|
I hacked together `oelf` in a matter of days.
|
|
I'm using the excellent [pyo3] to wrap [goblin]'s functionality into a Python package, built with [maturin].
|
|
It works reliably (yey for great tooling written in and for Rust!),
|
|
but so far I haven't documented much.
|
|
`oelf` itself is a bit inconsistent in how it exposes different data.
|
|
The `sqlelf` integration is really simple,
|
|
thanks to a nice extensible code structure of the project.
|
|
Now every newly exposed functionality in `oelf` needs only defining the schema of a table
|
|
and mapping the retrieved data to its columns.
|
|
|
|
I have yet to actually _use_ sqlelf myself more to explore binaries and all the data in there.
|
|
I also have only a bare understanding of the ELF format and much much less of the Mach-O format,
|
|
I'm just barely good at plugging together existing things.
|
|
|
|
Some things that might be good to do:
|
|
|
|
* Documentation (of course!)
|
|
* Add more sections and tables and "translate" magic values
|
|
* e.g. load commands (`macho_load_commands`) pretty-print the Rust object right now, this should be proper data in the column, maybe just JSON to begin with?
|
|
* Can we extract and parse system libraries from the shared dyld cache?
|
|
* Others [have built stuff](https://github.com/keith/dyld-shared-cache-extractor)
|
|
* Can we more lazily fetch data instead of copying into a persisted table once?
|
|
* Upstream changes or fork it so `sqlelf` actually works out of the box on any non-Linux machine
|
|
|
|
---
|
|
|
|
_Footnotes_:
|
|
|
|
[^1]: `adr://5614542`: [this is a workaround for a workaround because of a bug in the old classic static linker.][radr]
|
|
|
|
[sqlelf]: https://fzakaria.com/2023/03/19/sqlelf-and-20-years-of-nix.html
|
|
[oelf]: https://github.com/badboy/oelf
|
|
[oelf-py]: https://pypi.org/project/oelf/
|
|
[my-toot]: https://hachyderm.io/@jer/111470860656151925
|
|
[naming]: https://hachyderm.io/@fnordfish@mastodon.social/111476474716125707
|
|
[macho-support]: https://github.com/badboy/sqlelf/tree/with-macho-support
|
|
[pyo3]: https://pyo3.rs/
|
|
[radr]: https://github.com/PureDarwin/PureDarwin/blob/a9f762d321016242bb95542301a91ecb4eb9bfd3/tools/cctools/misc/strip.c#L3789-L3817
|
|
[goblin]: https://crates.io/crates/goblin
|
|
[maturin]: https://maturin.rs/
|