In an effort to relaunch tychoish with a more contemporary theme and
a publishing tool that (hopefully) will support a more regular posting
schedule, I also wrote a nifty go library for dealing with
reStructuredText, which may be useful and I think illustrates
something about build systems.
In my (apparently still) usual style, there's some narrative lead in
that that takes a bit to get through.
Over the past couple of weeks, I redesigned and redeployed my
blog. The system it replaced was somewhat cobbled together, was
missing a number of features (e.g. archives, rss feeds, social
features, etc) and to add insult to injury it was pretty publishing
was pretty slow, and it was difficult to manage a pipeline of posts.
In short, I didn't post much, though I've written things from time to
time that I haven't done a great job of actually posting them, and it
was hard to actually get people to read them, which was further
demotivating. I've been reading a lot of interesting things, and I'm
not writing that much for work any more, and I've been doing enough
things recently that I want to write about them. See this twitter
strand I had a bit ago on the topic.
So I started playing around again. Powering this blog is hard, because
I have a lot of content and I very much want to use
restructuredText.
There's this thing called hugo which seems
to be pretty popular. I've been using static site generators for
years, and prefer the approach. It's also helpful that I worked with
Steve (hugo's original author) during its initial development, and
either by coincidence, or as a result our conversations and a couple
of very small early contributions a number of things I cared about were
included in its design:
- support for multiple text markup features (including
reStructuredText,) (I cobbled together rst support. )
- customizeable page metadata formats. (I think I pushed for support
of alternate front-matter formats, specifically YAML, and might have
made a few prototype commits on this project)
- the ability to schedule posts in the future, (I think we talked
about this.)
I think I also winged a bunch in those days about performance. I've
written about this here before, but one of the classic problems with
static site generators is that no one expects sites with one or two
thousand posts/content atoms, and so they're developed against
relatively small corpus' and then have performance that doesn't really
scale.
Hugo is fast, but mostly because go is fast, which I think is, in
most cases, good enough, but not in my case, and particularly not with
the rst implementation as it stood. After all this preamble, we've
gotten to the interesting part: a tool I'm calling shimgo.
The initial support for rst in hugo is straight forward. Every time
hugo encounters an rst file, it calls the shell rst2html utility
that is installed when you install docutils, passing it the content of
the file on standard input, and parsing from the output, the content
we need. It's not pretty, it''s not smart, but it works.
Slowly: to publish all of tychoish it took about 3 minutes.
I attempted an rst-to-markdown translation of my exiting content and
then ran that through the markdown parsers in hugo, just to get
comparative timings: 3ish seconds.
reStructuredText is a bit slower to parse than markdown, on account of
it's comparative strictness and the fact that the toolchain is in
python and not go, but this difference seemed absurd.
There's a go-rst project to
write a pure-go implementation of reStructuredText, but I've kept my
eye on that project for a couple of years, and it's a lot of work that
is pretty far off. While I do want to do more to support this project,
I wanted to get a new blog up and running in a few weeks, not years.
Based on the differences in timing, and some intuition from years of
writing build systems, I made a wager with myself: while the python
rst implementation is likely really slow, it's not that slow, and
I was loosing a lot of time to process creation, teardown, and context
switching: processing a single file is pretty quick, but the overhead
gets to be too much at scale.
I built a little prototype where I ran a very small HTTP service that
took rst as a POST request and returned processed HTML. Now there
was one process running, and instead of calling fork/exec a bunch, we
just had a little but of (local) network overhead.
Faster: 20 second.
I decided I could deal with it.
What remains is making it production worthy or hugo. While it was
good enough for me, I very much don't want to get into the position of
needing to maintain a single-feature fork of a software project in
active development, and frankly the existing rst support has a
difficult to express external dependency. Adding a HTTP service would
be a hard sell.
This brings us to shimgo: the idea is to package everything needed
to implement the above solution in an external go package, and package
it behind a functional interface, so that hugo maintainers don't need
to know anything about its working.
Isn't abstraction wonderful?
So here we are. I'm still working on getting this patch mainlined, and
there is some polish for shimgo itself (mostly the README file and
some documentation), but it works, and if you're doing anything with
reStructuredText in go, then you ought to give shimgo a try.