1
Fork 0
blog/_posts/2022-02-25-your-personal-glean-data-pipeline.md

68 lines
3 KiB
Markdown
Raw Permalink Normal View History

---
permalink: "/{{ year }}/{{ month }}/{{ day }}/your-personal-glean-data-pipeline"
title: "This Week in Glean: Your personal Glean data pipeline"
published_date: "2022-02-25 15:00:00 +0100"
layout: post-yt.liquid
data:
route: blog
tags:
- mozilla
excerpt: |
On February 11th, 2022 I gave a lightning talk titled
"Your personal Glean data pipeline", presenting a little side project for
ingesting, transforming and analyzing data collected from Glean-powered applications myself.
---
(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work.
They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean.)
All "This Week in Glean" blog posts are listed in the [TWiG index](https://mozilla.github.io/glean/book/appendix/twig.html)
(and on the [Mozilla Data blog](https://blog.mozilla.org/data/category/glean/)).
This article is [cross-posted on the Mozilla Data blog][datablog].
[datablog]: https://blog.mozilla.org/data/2022/02/25/this-week-in-glean-your-personal-glean-data-pipeline
---
On February 11th, 2022 we hosted a Data Club Lightning Talk session.
2022-02-25 13:45:10 +00:00
There I presented my small side project of setting up a minimal data pipeline & data storage for Glean.
The premise:
> Can I build and run a small pipeline & data server to collect telemetry data from my own usage of tools?
To which I can now answer: Yes, it's possible.
The complete ingestion server is a couple hundred lines of Rust code.
It's able to receive pings conforming to the Glean ping schema, transform them and store it into an SQLite database.
It's very robust, not crashing once on me (except when I created an infinite loop within it).
You can watch the lightning talk here:
<br>
<lite-youtube videoid="V5FgVbxm-cc" playlabel="Play: Your personal Glean data pipeline">
<a href="https://youtube.com/watch?v=V5FgVbxm-cc" class="lty-playbtn" title="Play Video">
<span class="lyt-visually-hidden">Play Video: Your personal Glean data pipeline</span>
</a>
</lite-youtube>
Instead of creating some slides for the talk I created an interactive report.
The full report [can be read online][report].
[report]: https://data.fnordig.de/glean/report.html
Besides actually writing a small pipeline server this was also an experiment
in trying out [Irydium][] and [Datasette][]
to produce an interactive & live-updated data report.
Irydium is a set of tooling designed to allow people to create interactive documents using web technologies, started by [wlach][] a while back.
Datasette is an open source multi-tool for exploring and publishing data, created and maintained by [simonw][].
Combining both makes for a nice experience, even though there's still some things that could be simplified.
My pipeline server is currently not open source.
I might publish it as an example at a later point.
[irydium]: https://irydium.dev/
[wlach]: https://twitter.com/wrlach
[datasette]: https://datasette.io/
[simonw]: https://twitter.com/simonw