In an effort to relaunch tychoish with a more contemporary theme and a
publishing tool that (hopefully) will support a more regular posting
schedule, I also wrote a nifty go library for dealing with
reStructuredText, which may be useful and I think illustrates
something about build systems.
In my (apparently still) usual style, there’s some narrative lead in
that that takes a bit to get through.
Over the past couple of weeks, I redesigned and redeployed my blog. The
system it replaced was somewhat cobbled together, was missing a number
of features (e.g. archives, rss feeds, social features, etc) and to add
insult to injury it was pretty publishing was pretty slow, and it was
difficult to manage a pipeline of posts.
In short, I didn’t post much, though I’ve written things from time to
time that I haven’t done a great job of actually posting them, and it
was hard to actually get people to read them, which was further
demotivating. I’ve been reading a lot of interesting things, and I’m
not writing that much for work any more, and I’ve been doing enough
things recently that I want to write about them. See this twitter
strand I had a bit ago on the
topic.
So I started playing around again. Powering this blog is hard, because I
have a lot of content and I very much want to use
restructuredText.
There’s this thing called hugo which seems to be
pretty popular. I’ve been using static site generators for years, and
prefer the approach. It’s also helpful that I worked with Steve
(hugo’s original author) during its initial development, and either by
coincidence, or as a result our conversations and a couple of very small
early contributions a number of things I cared about were included in
its design:
- support for multiple text markup features (including
reStructuredText,) (I cobbled together rst support. )
- customizeable page metadata formats. (I think I pushed for support of
alternate front-matter formats, specifically YAML, and might have made
a few prototype commits on this project)
- the ability to schedule posts in the future, (I think we talked about
this.)
I think I also winged a bunch in those days about performance. I’ve
written about this here before, but one of the classic problems with
static site generators is that no one expects sites with one or two
thousand posts/content atoms, and so they’re developed against
relatively small corpus' and then have performance that doesn’t really
scale.
Hugo is fast, but mostly because go is fast, which I think is, in most
cases, good enough, but not in my case, and particularly not with the
rst implementation as it stood. After all this preamble, we’ve gotten
to the interesting part: a tool I’m calling
shimgo.
The initial support for rst in hugo is straight forward. Every time hugo
encounters an rst file, it calls the shell rst2html
utility that is
installed when you install docutils, passing it the content of the file
on standard input, and parsing from the output, the content we need.
It’s not pretty, it'’s not smart, but it works.
Slowly: to publish all of tychoish it took about 3 minutes.
I attempted an rst-to-markdown translation of my exiting content and
then ran that through the markdown parsers in hugo, just to get
comparative timings: 3ish seconds.
reStructuredText is a bit slower to parse than markdown, on account of
it’s comparative strictness and the fact that the toolchain is in
python and not go, but this difference seemed absurd.
There’s a go-rst project to write
a pure-go implementation of reStructuredText, but I’ve kept my eye on
that project for a couple of years, and it’s a lot of work that is
pretty far off. While I do want to do more to support this project, I
wanted to get a new blog up and running in a few weeks, not years.
Based on the differences in timing, and some intuition from years of
writing build systems, I made a wager with myself: while the python rst
implementation is likely really slow, it’s not that slow, and I was
loosing a lot of time to process creation, teardown, and context
switching: processing a single file is pretty quick, but the overhead
gets to be too much at scale.
I built a little prototype where I ran a very small HTTP service that
took rst as a POST
request and returned processed HTML. Now there was
one process running, and instead of calling fork/exec a bunch, we just
had a little but of (local) network overhead.
Faster: 20 second.
I decided I could deal with it.
What remains is making it production worthy or hugo. While it was good
enough for me, I very much don’t want to get into the position of
needing to maintain a single-feature fork of a software project in
active development, and frankly the existing rst support has a difficult
to express external dependency. Adding a HTTP service would be a hard
sell.
This brings us to shimgo: the idea is to package everything needed to
implement the above solution in an external go package, and package it
behind a functional interface, so that hugo maintainers don’t need to
know anything about its working.
Isn’t abstraction wonderful?
So here we are. I’m still working on getting this patch mainlined, and
there is some polish for shimgo itself (mostly the README file and some
documentation), but it works, and if you’re doing anything with
reStructuredText in go, then you ought to give shimgo a try.