Large Sphinx Deployments

I use Sphinx a lot. Both in the sense that I have easily a dozen active (or reasonably so) projects that I maintain and work with on a regular basis. Sphinx is great, and I feel safe asserting that it's probably the best documentation toolkit in existence and more generally the best tool kit for the production of structured text.

There are flaws. I've written here before with greater and lesser descriptions of the pain points of Sphinx. All of the problems are fixable, some fixes are delicate, and some small fixes require major work in some cases. In all, the challenges aren't insurmountable and Sphinx remains extremely usable and effective.

Like any large and complex system, there are ways to manage resources with Sphinx with greater flexibility and ease. This post explores several ways that have helped me (and my collaborators!) manage big Sphinx deployments.

These suggestions fall into two general categories: suggestions for making Sphinx projects with large volumes of content manageable and strategies for handling and managing larger groups of Sphinx deployments.

Single Source Content

To avoid duplicated content, when possible, it makes sense to reuse content. reStructuredText has an include directive for this purpose. In general, the best strategy is:

  • maintain a directory of included content that's distinct from the other directories that hold content. In our projects we use source/includes, where source/ is holds all content.
  • Use a different extension for the included text than you use for your content files. For example if all of your Sphinx processed rst files use .rst extensions, include files should have .txt extensions.
  • Smaller, more restricted resources are more effective, typically. Longer bits of text are more difficult to slipstream into the text. If your include snippet requires a section heading its probably too big.

This is most crucial for larger technical resources, and less crucial for other kinds of content, but in general, avoid duplicating common sections when possible.

To make this really awesome you might want to add tooling for yourself/writers so that you can:

  • see where a file is included in the larger text.
  • see if any include files are not being used.
  • detect if any two redirects are substantially similar.

Pre-process and Generate Content

reStructuredText is great for providing a human-friendly way to edit a structured text documents. However, for some kinds of structures its probably better to build the restructured text from some other common structure. For instance: repeated content, tables of contents, image declarations, and dealing with different output for different content types are all good cases for building content yourself.

My main mode of doing this is to use rstcloth to write scripts that read YAML files that contain the content and can be converted into rst content. As an alternative you could write extensions to the reStructuredText processors (docutils) to handle this content, but that may make the production of the documents more (differently?) fragile.

Minimize Configuration Differences

The best thing about Sphinx configuration is that the configuration file itself is a Python module. Which means you can inject pretty much whatever logic you want into the configuration and via html_template_options, the template. This is an awesome power, but if you need to manage more than one similar Sphinx site, the more complex your configuration is, the harder everything becomes. Therefore, tend toward minimal configurations.

I've been experimenting with a number of solutions, and I don't have a "Sphinx Configuration Toolkit" established (yet,) but I've been trending toward where the canonical information about the project (urls, theme data, etc.) in a configuration object constructed from a metadata file. Then to populate site-specific lists (interspinx inventories; pdfs, manpages, etc.), I read from other data files. Keeping site-specific data seperate from the configuration code seems to work well.

Take Advantage of the Theme System

Sphinx's HTML output uses jinja, which is incredibly flexible. To be honest, I kind of wish that the LaTeX builder was also Jinja based, but I'll take what I can get. Sphinx gives you full access to build and customize really sophisticated display systems. If you're using default templates, then you can skip this tip.

If you do have custom display code, then take some time to read through the Jinja Documentation and the Sphinx Templating Documentation so you know what's possible. When developing a template for Sphinx (or in general,) remember the following:

  • minimize the amount of runtime logic required to render each template. While some template logic is unavoidable, and for some projects the performance hit may not be noticable.

    However, putting logic elsewhere (e.g. in the values passed to the template.) makes the data handled by the template as a compile-time rather than run-time cost (plus or minus the memory costs of larger template memory.)

  • use template inheritance.

    Basically, Jinja makes it possible to describe a complete template composed of "blocks." The blocks don't have any impact on the output; however, you take this template and use it as a "base" and then describe a new template that is like the base except that you can override some or all of the blocks. This is awesome, and makes it possible to reuse a lot template code without needing to duplicate anything or drive yourself crazy.

    It's easy to forget about this capability when you're trying to hack something together and template inheritance, like class inheritance in object oriented programming can add complexity and fragility. So you'reprobably well justified in being wary of using inheritance, but give it a shot!

Evaluate Build System Requirements

Sphinx is a documentation tool kit, and it's very extensible, and awesome. However, in the base configuration it's not a complete end-to-end publishing system: it doesn't have built in version control/maintenance, it's in general only aware of the current build (i.e. the HTML version of your documentation is unaware of the PDF version (and so forth,)) and once Sphinx compiles a site you still have to deploy it somewhere.

In short, to build a website or resource using Sphinx, there are other things the build system before and after running Sphinx to get the product you need. You can reduce a great deal of complexity and provide a number of common points to synchronize multiple projects.

Also, avoid doing crazy things with Makefiles.

Other Useful Optimizations

  1. Download intersphinx inventories independently of Sphinx.

    Sphinx will attempt to download each intersphinx inventory each time you build your site and it will download each inventory serially. It's trivial to do better on your own. Here's what we do:

  2. The index objects (which Sphinx uses for all special objects) live in a flat namespace, and collisions are not well handled.

comments powered by Disqus