---
permalink: "/{{ year }}/{{ month }}/{{ day }}/a-lil-data-processing"
title: "A Lil data processing"
published_date: "2024-12-30 10:00:00 +0100"
layout: post.liquid
data:
  route: blog
excerpt: |
  I used Lil to process some JSON data in a git repository and turn that into SQL statements.
---

As I [mentioned before](/2024/12/20/a-lil-advent-of-code/) I've been playing around with [Lil] and I like it so far.
So much that for a side project I wrote yet another small one-off script in Lil.

[Lil]: https://beyondloom.com/decker/lil.html

## The scenario

I have a git repository with various files, each of which contains coordinates of a given trip.
Early on in this project's lifetime a script fetched new data, converted it to JSON, updated the corresponding file and committed the changes.
Only later I extended it to actually save additional metadata, such as a timestamp per data point.

So for these early tracking points I now want to restore _some_ sort of timeline, not an exact one, but as close as I can get.
The best I can do is to associate every tracking point added by a commit with that commit's timestamp.
Oh, and also I will end up storing that data in SQLite, not in JSON anymore.

## The data

Every commit diff looks something like this[^1]:

```diff
diff --git trip001.json trip001.json
index 99af4c9..e0c1dea 100644
--- trip001.json
+++ trip001.json
@@ -42,4 +42,6 @@
 ,{"type":"Feature", "properties":{}, "geometry": { "type": "Point", "coordinates": [-28.01577,46.74132] }}
 ,{"type":"Feature", "properties":{}, "geometry": { "type": "Point", "coordinates": [-28.01572,46.74132] }}
 ,{"type":"Feature", "properties":{}, "geometry": { "type": "Point", "coordinates": [-28.01574,46.74126] }}
+,{"type":"Feature", "properties":{}, "geometry": { "type": "Point", "coordinates": [-28.01568,46.74132] }}
+,{"type":"Feature", "properties":{}, "geometry": { "type": "Point", "coordinates": [-28.01571,46.74704] }}
 ] }
```

I will look for each line starting with a `+` and read that line as JSON (after stripping its first `,`).
Then parse out the coordinates[^2].

## The script

Let's start by getting all commits for a particular file:

```lil
commits:"\n" split shell["git log --follow --pretty=format:'%H' -- trip001.json"].out
commits:extract value orderby index desc from commits
```

The second line sorts the resulting list in reverse, ensuring I have the earliest commit first[^3].

```lil
valuefmt:"(%i, %f, %f, '%s', %i);"
insertstmt:"INSERT INTO messages (id, latitude, longitude, dateTime, unixTime) VALUES"
```

Just some globals to use later: The SQL statement to generate and a formatting string for the values to put in.

```lil
cmd:" " fuse "git log -1 --pretty=format:%at",commits[0]
id:"%i" parse shell[cmd].out
```

I need _some_ ID and I decided to use the earliest commit's UNIX timestamp to start of with.
This gets incremented to generate a new unique ID per data point.

```lil
each commit in commits
```

Now iterating each commit in the list.

```lil
  cmd:" " fuse "git show --pretty=format:%aI",commit
  diff:"\n" split shell[cmd].out
  dateTime:diff[0]
  unixTime:"%e" parse dateTime
```

To get the diff for every commit invoke `git show`.
The first line will be the commit date in strict ISO 8601 format (that's the `%aI` in that command).
Lil's parsing functionality can turn this into a UNIX timestamp for me.

```lil
  each line in diff
```

The rest of the diff contains the actual text diff.
Go through it line by line.

```lil
    if line[0] = "+"
      s:1 drop line
      if s[0] = ","
        s:1 drop s
      end
      j:"%j" parse s
```

Only lines that start with a `+` (added lines in the diff) are processed further,
stripping the `,` at the start and parsing it as JSON.

```lil
      coord:j.geometry.coordinates
      long:coord[0]
      lat:coord[1]
```

`j` is already the parsed JSON, so I can access its fields.
Should those fields not exist it will coerce to `0`. No error is thrown.

```
      if long = 0
      else
        values:valuefmt format (id, lat, long, dateTime, unixTime)
        stmt:" " fuse (insertstmt, values)
        print[stmt]
        id:id + 1
      end
```

The coordinates might still be `0`, e.g. for the initial `+++ trip001.json`, which will never be valid JSON.
No need to insert null coordinates.
Last but not least, we print the final statement and increment the ID.

```lil
    end
  end
end
```

And it's done.
Now for every data point it outputs an SQL statement:

```shell
; lilt import-trip001.lil
INSERT INTO messages (id, latitude, longitude, dateTime, unixTime) VALUES (1717709835, 46.74132, 28.01577, '2024-06-06T21:37:15Z', 1717709835);
INSERT INTO messages (id, latitude, longitude, dateTime, unixTime) VALUES (1717709836, 46.74132, 28.01572, '2024-06-06T21:37:15Z', 1717709835);
INSERT INTO messages (id, latitude, longitude, dateTime, unixTime) VALUES (1717709837, 46.74126, 28.01574, '2024-06-06T21:37:15Z', 1717709835);
INSERT INTO messages (id, latitude, longitude, dateTime, unixTime) VALUES (1717709838, 46.74132, 28.01568, '2024-06-06T21:37:15Z', 1717709835);
INSERT INTO messages (id, latitude, longitude, dateTime, unixTime) VALUES (1717709839, 46.74704, 28.01571, '2024-06-06T21:37:15Z', 1717709835);
```

Here's the full script once again:

```lil
commits:"\n" split shell["git log --follow --pretty=format:'%H' -- trip001.json"].out
commits:extract value orderby index desc from commits

valuefmt:"(%i, %f, %f, '%s', %i);"
insertstmt:"INSERT INTO messages (id, latitude, longitude, dateTime, unixTime) VALUES"

cmd:" " fuse "git log -1 --pretty=format:%at",commits[0]
id:"%i" parse shell[cmd].out

each commit in commits
  cmd:" " fuse "git show --pretty=format:%aI",commit
  diff:"\n" split shell[cmd].out
  dateTime:diff[0]
  unixTime:"%e" parse dateTime

  each line in diff
    if line[0] = "+"
      s:1 drop line
      if s[0] = ","
        s:1 drop s
      end
      j:"%j" parse s
      coord:j.geometry.coordinates
      long:coord[0]
      lat:coord[1]

      if long = 0
      else
        values:valuefmt format (id, lat, long, dateTime, unixTime)
        stmt:" " fuse (insertstmt, values)
        print[stmt]
        id:id + 1
      end
    end
  end
end
```

---

_Footnotes:_

[^1]: The coordinates are intentionally obscured.  
[^2]: They are in `longitude, latitude` order. In case you were wondering. Like me every time I stare at them.
[^3]: Sure, I could have used `--reverse`, but where's the fun in that?