new post: oelf - Mach-O support for sqlelf
This commit is contained in:
parent
899e806e53
commit
13be151e6c
162
_posts/2024-01-02-oelf.md
Normal file
162
_posts/2024-01-02-oelf.md
Normal file
|
@ -0,0 +1,162 @@
|
|||
---
|
||||
permalink: "/{{ year }}/{{ month }}/{{ day }}/oelf"
|
||||
title: "oelf - Mach-O support for sqlelf"
|
||||
published_date: "2024-01-02 13:00:00 +0200"
|
||||
layout: post.liquid
|
||||
data:
|
||||
route: blog
|
||||
tags:
|
||||
- rust
|
||||
---
|
||||
|
||||
[sqlelf] lets you explore ELF objects through the power of SQL.
|
||||
It turns any executable into a queryable database.
|
||||
Any? No, just those in ELF format, the standard binary file format on most Unix and Linux systems.
|
||||
But not on macOS.
|
||||
macOS relies on the Mach object file format, short Mach-O and sqlelf doesn't support that.
|
||||
The library it uses for parsing in theory does, but that failed on my machine.
|
||||
It depends on a heavy C++ library and I didn't want to bother figuring out how to build and change that.
|
||||
|
||||
I still wanted [sqlelf for mach-o binaries][macho-support].
|
||||
Luckily the hardest part was settled early on: [naming].
|
||||
|
||||
> <@fnordfish@mastodon.social>
|
||||
> @jer sollte auf jeden Fall „Ölf“ heißen!
|
||||
> (_translation:_ it should definitely be called „Ölf“)
|
||||
|
||||
So [oelf][oelf-py] exists now.
|
||||
The [source code is on GitHub][oelf]
|
||||
and [my fork of sqlelf adds it to sqlelf][macho-support] for easy use.
|
||||
|
||||
### Install
|
||||
|
||||
I have released pre-built versions of oelf,
|
||||
but nothing is upstreamed to sqlelf yet.
|
||||
You can install it from git:
|
||||
|
||||
```shell
|
||||
pip install git+https://github.com/badboy/sqlelf@with-macho-support#egg=sqlelf
|
||||
```
|
||||
|
||||
On my M1 MacBook sqlelf doesn't work out of the box.
|
||||
sqlelf depends on [Capstone](https://www.capstone-engine.org/)
|
||||
and the installed library coming with the Python wrapper is `x86_64` only,
|
||||
so it won't load
|
||||
|
||||
That's fixable.
|
||||
Assuming you installed into a Python venv `.venv`:
|
||||
Install capstone from Homebrew, remove the bundled library and link to the global one instead:
|
||||
|
||||
```shell
|
||||
brew install capstone
|
||||
rm .venv/lib/python3.11/site-packages/capstone/lib/libcapstone.dylib
|
||||
ln -s $(brew --cellar capstone)/5.0.1/lib/libcapstone.5.dylib .venv/lib/python3.11/site-packages/capstone/lib/libcapstone.dylib
|
||||
```
|
||||
|
||||
### Usage
|
||||
|
||||
Invoke `sqlelf` and pass any number of Mach-O binaries as arguments.
|
||||
This gives you an SQLite REPL, or you specify SQL commands with `--sql`.
|
||||
|
||||
For example sqlelf knows about libraries references by the binary:
|
||||
```
|
||||
$ sqlelf /usr/bin/grep --sql 'select * from macho_libs'
|
||||
┌───────────────┬────────────────────────────┐
|
||||
│ path │ lib │
|
||||
│ /usr/bin/grep │ self │
|
||||
│ /usr/bin/grep │ /usr/lib/libbz2.1.0.dylib │
|
||||
│ /usr/bin/grep │ /usr/lib/liblzma.5.dylib │
|
||||
│ /usr/bin/grep │ /usr/lib/libz.1.dylib │
|
||||
│ /usr/bin/grep │ /usr/lib/libSystem.B.dylib │
|
||||
└───────────────┴────────────────────────────┘
|
||||
```
|
||||
|
||||
None of those `/usr/lib/*.dylib` actually exists in the filesystem though,
|
||||
because Apple now ships those as a big bundled cache file instead.
|
||||
|
||||
I have not yet documented the schema nor is it anywhere near complete.
|
||||
Use `.schema` to get an overview.
|
||||
|
||||
```
|
||||
$ sqlelf /usr/bin/grep --sql '.schema macho_headers'
|
||||
CREATE TABLE macho_headers(
|
||||
path,
|
||||
magic,
|
||||
cputype,
|
||||
cpusubtype,
|
||||
filetype,
|
||||
ncmds,
|
||||
sizeofcmds,
|
||||
flags,
|
||||
reserved
|
||||
);
|
||||
```
|
||||
|
||||
Tables are persisted views over the data, so everything is in memory.
|
||||
Most values are the raw values read from the file,
|
||||
so you will have to look up what those values mean.
|
||||
|
||||
For example the headers include all sorts of magic numbers and file types as integers:
|
||||
|
||||
```
|
||||
$ sqlelf /usr/bin/grep --sql 'select * from macho_headers'
|
||||
┌───────────────┬────────────┬──────────┬────────────┬──────────┬───────┬────────────┬─────────┬──────────┐
|
||||
│ path │ magic │ cputype │ cpusubtype │ filetype │ ncmds │ sizeofcmds │ flags │ reserved │
|
||||
│ /usr/bin/grep │ 4277009103 │ 16777223 │ 3 │ 2 │ 21 │ 1688 │ 2097285 │ 0 │
|
||||
└───────────────┴────────────┴──────────┴────────────┴──────────┴───────┴────────────┴─────────┴──────────┘
|
||||
```
|
||||
|
||||
You can slice and dice the data as you wish[^1].
|
||||
|
||||
```
|
||||
$ sqlelf /usr/bin/grep --sql "select name, type, global, n_value from macho_symbols where path = '/usr/bin/grep' limit 3"
|
||||
┌─────────────────────┬────────┬────────┬────────────┐
|
||||
│ name │ type │ global │ n_value │
|
||||
│ radr://5614542 │ N_PBUD │ 0 │ 90260802 │
|
||||
│ __mh_execute_header │ N_SECT │ 1 │ 4294967296 │
|
||||
│ _BZ2_bzRead │ N_UNDF │ 1 │ 0 │
|
||||
└─────────────────────┴────────┴────────┴────────────┘
|
||||
```
|
||||
|
||||
### Status
|
||||
|
||||
I hacked together `oelf` in a matter of days.
|
||||
I'm using the excellent [pyo3] to wrap [goblin]'s functionality into a Python package, built with [maturin].
|
||||
It works reliably (yey for great tooling written in and for Rust!),
|
||||
but so far I haven't documented much.
|
||||
`oelf` itself is a bit inconsistent in how it exposes different data.
|
||||
The `sqlelf` integration is really simple,
|
||||
thanks to a nice extensible code structure of the project.
|
||||
Now every newly exposed functionality in `oelf` needs only defining the schema of a table
|
||||
and mapping the retrieved data to its columns.
|
||||
|
||||
I have yet to actually _use_ sqlelf myself more to explore binaries and all the data in there.
|
||||
I also have only a bare understanding of the ELF format and much much less of the Mach-O format,
|
||||
I'm just barely good at plugging together existing things.
|
||||
|
||||
Some things that might be good to do:
|
||||
|
||||
* Documentation (of course!)
|
||||
* Add more sections and tables and "translate" magic values
|
||||
* e.g. load commands (`macho_load_commands`) pretty-print the Rust object right now, this should be proper data in the column, maybe just JSON to begin with?
|
||||
* Can we extract and parse system libraries from the shared dyld cache?
|
||||
* Others [have built stuff](https://github.com/keith/dyld-shared-cache-extractor)
|
||||
* Can we more lazily fetch data instead of copying into a persisted table once?
|
||||
* Upstream changes or fork it so `sqlelf` actually works out of the box on any non-Linux machine
|
||||
|
||||
---
|
||||
|
||||
_Footnotes_:
|
||||
|
||||
[^1]: `adr://5614542`: [this is a workaround for a workaround because of a bug in the old classic static linker.][radr]
|
||||
|
||||
[sqlelf]: https://fzakaria.com/2023/03/19/sqlelf-and-20-years-of-nix.html
|
||||
[oelf]: https://github.com/badboy/oelf
|
||||
[oelf-py]: https://pypi.org/project/oelf/
|
||||
[my-toot]: https://hachyderm.io/@jer/111470860656151925
|
||||
[naming]: https://hachyderm.io/@fnordfish@mastodon.social/111476474716125707
|
||||
[macho-support]: https://github.com/badboy/sqlelf/tree/with-macho-support
|
||||
[pyo3]: https://pyo3.rs/
|
||||
[radr]: https://github.com/PureDarwin/PureDarwin/blob/a9f762d321016242bb95542301a91ecb4eb9bfd3/tools/cctools/misc/strip.c#L3789-L3817
|
||||
[goblin]: https://crates.io/crates/goblin
|
||||
[maturin]: https://maturin.rs/
|
Loading…
Reference in a new issue