Shimgo Hugo

In an effort to relaunch tychoish with a more contemporary theme and a publishing tool that (hopefully) will support a more regular posting schedule, I also wrote a nifty go library for dealing with reStructuredText, which may be useful and I think illustrates something about build systems.

In my (apparently still) usual style, there's some narrative lead in that that takes a bit to get through.

Over the past couple of weeks, I redesigned and redeployed my blog. The system it replaced was somewhat cobbled together, was missing a number of features (e.g. archives, rss feeds, social features, etc) and to add insult to injury it was pretty publishing was pretty slow, and it was difficult to manage a pipeline of posts.

In short, I didn't post much, though I've written things from time to time that I haven't done a great job of actually posting them, and it was hard to actually get people to read them, which was further demotivating. I've been reading a lot of interesting things, and I'm not writing that much for work any more, and I've been doing enough things recently that I want to write about them. See this twitter strand I had a bit ago on the topic.

So I started playing around again. Powering this blog is hard, because I have a lot of content [1] and I very much want to use restructuredText. [2] There's this thing called hugo which seems to be pretty popular. I've been using static site generators for years, and prefer the approach. It's also helpful that I worked with Steve (hugo's original author) during its initial development, and either by coincidence, or as a result our conversations and a couple of very small early contributions a number of things I cared about were included in its design:

  • support for multiple text markup features (including reStructuredText,) (I cobbled together rst support. )
  • customizeable page metadata formats. (I think I pushed for support of alternate front-matter formats, specifically YAML, and might have made a few prototype commits on this project)
  • the ability to schedule posts in the future, (I think we talked about this.)

I think I also winged a bunch in those days about performance. I've written about this here before, but one of the classic problems with static site generators is that no one expects sites with one or two thousand posts/content atoms, and so they're developed against relatively small corpus' and then have performance that doesn't really scale.

Hugo is fast, but mostly because go is fast, which I think is, in most cases, good enough, but not in my case, and particularly not with the rst implementation as it stood. After all this preamble, we've gotten to the interesting part: a tool I'm calling shimgo.

The initial support for rst in hugo is straight forward. Every time hugo encounters an rst file, it calls the shell rst2html utility that is installed when you install docutils, passing it the content of the file on standard input, and parsing from the output, the content we need. It's not pretty, it''s not smart, but it works.

Slowly: to publish all of tychoish it took about 3 minutes.

I attempted an rst-to-markdown translation of my exiting content and then ran that through the markdown parsers in hugo, just to get comparative timings: 3ish seconds.

reStructuredText is a bit slower to parse than markdown, on account of it's comparative strictness and the fact that the toolchain is in python and not go, but this difference seemed absurd.

There's a go-rst project to write a pure-go implementation of reStructuredText, but I've kept my eye on that project for a couple of years, and it's a lot of work that is pretty far off. While I do want to do more to support this project, I wanted to get a new blog up and running in a few weeks, not years.

Based on the differences in timing, and some intuition from years of writing build systems, I made a wager with myself: while the python rst implementation is likely really slow, it's not that slow, and I was loosing a lot of time to process creation, teardown, and context switching: processing a single file is pretty quick, but the overhead gets to be too much at scale.

I built a little prototype where I ran a very small HTTP service that took rst as a POST request and returned processed HTML. Now there was one process running, and instead of calling fork/exec a bunch, we just had a little but of (local) network overhead.

Faster: 20 second.

I decided I could deal with it.

What remains is making it production worthy or hugo. While it was good enough for me, I very much don't want to get into the position of needing to maintain a single-feature fork of a software project in active development, and frankly the existing rst support has a difficult to express external dependency. Adding a HTTP service would be a hard sell.

This brings us to shimgo: the idea is to package everything needed to implement the above solution in an external go package, and package it behind a functional interface, so that hugo maintainers don't need to know anything about its working.

Isn't abstraction wonderful?

So here we are. I'm still working on getting this patch mainlined, and there is some polish for shimgo itself (mostly the README file and some documentation), but it works, and if you're doing anything with reStructuredText in go, then you ought to give shimgo a try.

[1]While I think it would be reasonable to start afresh, I think the whole point of having archives is that you mostly just leave them around.
[2]It's not the most popular markup language, but I've used it more than any other text markup, and I find the fact that other langauges (e.g. markdown) vary a lot between implementations to be distressing. Admitedly the fact that there aren't other implementations of rst is also distressing, but one the balance is somewhat less distressing.
comments powered by Disqus