I spent a lot of time at the end of the summer working on finishing out the basic buildcloth functionality, and haven't really gotten the chance to use it properly. There were some flaws:
a dumb oversight means that the hash-based dependency checking doesn't work.
Buildcloth is a bit complicated and designed for a general purpose. In practical terms, I made buildcloth to perform a task that I've been able to accomplish with 10% or less of the code.
There's no good separation between "the management of a build system" and "the build system data" in the system as it currently exists.
Buildcloth is a nifty idea, and one that I'd like to expand upon. Also since the project is still pre-1.0, it seems reasonable to take these lessons and work on building a more usable implementation.
This post is a collection of thoughts on what I'd like to accomplish for 0.3:
pull out the job queue/running system from the build data organization.
collect more state in the dependency checking system/infrastructure.
separate data ingestion from build system organization.
remove ingestion and processing logic from the command line tool.
impose sub-module structure to make the interfaces for all of the different aspects of the program.
No clue about time frame. Feel free (and encouraged) to leave comments if you're interested in helping or have a feature that you'd really like to see.
Onward and Upward!
I've spent a little bit of time building some non-critical tools for my teammates on my work project, which has got me thinking about tooling for documentation systems.
This collection of tools is something that we've started to take for granted, but I think it's pretty novel and worth talking about a bit more.
Documentation toolkits, traditionally refer to the system that deals with the production of the documentation for end use, which typically means taking the source text and rendering it into web sites, pdfs, ebooks, and embedded "online help" text.
These toolkits are really important and I think one of the best things you can do for a documentation tool kit is to produce documentation using a tool specifically designed to address the needs of documentation projects and technical writers.
Unfortunately, documentation production is mostly for readers and business owners of documentation and not really for the writers. Tooling for documentation, particularly the kind that I've been spending time on recently is about making documentation easier to maintain, and easier to improve at scale. For example:
I was going to write this up as a blog post, but I think it makes sense as a collection of wiki pages:
Onward and Upward!
I use Sphinx a lot. Both in the sense that I have easily a dozen active (or reasonably so) projects that I maintain and work with on a regular basis. Sphinx is great, and I feel safe asserting that it's probably the best documentation toolkit in existence and more generally the best tool kit for the production of structured text.
There are flaws. I've written here before with greater and lesser descriptions of the pain points of Sphinx. All of the problems are fixable, some fixes are delicate, and some small fixes require major work in some cases. In all, the challenges aren't insurmountable and Sphinx remains extremely usable and effective.
Like any large and complex system, there are ways to manage resources with Sphinx with greater flexibility and ease. This post explores several ways that have helped me (and my collaborators!) manage big Sphinx deployments.
These suggestions fall into two general categories: suggestions for making Sphinx projects with large volumes of content manageable and strategies for handling and managing larger groups of Sphinx deployments.
To avoid duplicated content, when possible, it makes sense to reuse content. reStructuredText has an include directive for this purpose. In general, the best strategy is:
maintain a directory of included content that's distinct from the
other directories that hold content. In our projects we use
source/ is holds all content.
Use a different extension for the included text than you use for
your content files. For example if all of your Sphinx processed rst
.rst extensions, include files should have
Smaller, more restricted resources are more effective, typically. Longer bits of text are more difficult to slipstream into the text. If your include snippet requires a section heading its probably too big.
This is most crucial for larger technical resources, and less crucial for other kinds of content, but in general, avoid duplicating common sections when possible.
To make this really awesome you might want to add tooling for yourself/writers so that you can:
see where a file is included in the larger text.
see if any include files are not being used.
detect if any two redirects are substantially similar.
reStructuredText is great for providing a human-friendly way to edit a structured text documents. However, for some kinds of structures its probably better to build the restructured text from some other common structure. For instance: repeated content, tables of contents, image declarations, and dealing with different output for different content types are all good cases for building content yourself.
My main mode of doing this is to use rstcloth to write scripts that read YAML files that contain the content and can be converted into rst content. As an alternative you could write extensions to the reStructuredText processors (docutils) to handle this content, but that may make the production of the documents more (differently?) fragile.
The best thing about Sphinx configuration is that the configuration
file itself is a Python module. Which means you can inject pretty much
whatever logic you want into the configuration and via
html_template_options, the template. This is an awesome power, but
if you need to manage more than one similar Sphinx site, the more
complex your configuration is, the harder everything
becomes. Therefore, tend toward minimal configurations.
I've been experimenting with a number of solutions, and I don't have a "Sphinx Configuration Toolkit" established (yet,) but I've been trending toward where the canonical information about the project (urls, theme data, etc.) in a configuration object constructed from a metadata file. Then to populate site-specific lists (interspinx inventories; pdfs, manpages, etc.), I read from other data files. Keeping site-specific data seperate from the configuration code seems to work well.
Sphinx's HTML output uses jinja, which is incredibly flexible. To be honest, I kind of wish that the LaTeX builder was also Jinja based, but I'll take what I can get. Sphinx gives you full access to build and customize really sophisticated display systems. If you're using default templates, then you can skip this tip.
If you do have custom display code, then take some time to read through the Jinja Documentation and the Sphinx Templating Documentation so you know what's possible. When developing a template for Sphinx (or in general,) remember the following:
minimize the amount of runtime logic required to render each template. While some template logic is unavoidable, and for some projects the performance hit may not be noticable.
However, putting logic elsewhere (e.g. in the values passed to the template.) makes the data handled by the template as a compile-time rather than run-time cost (plus or minus the memory costs of larger template memory.)
use template inheritance.
Basically, Jinja makes it possible to describe a complete template composed of "blocks." The blocks don't have any impact on the output; however, you take this template and use it as a "base" and then describe a new template that is like the base except that you can override some or all of the blocks. This is awesome, and makes it possible to reuse a lot template code without needing to duplicate anything or drive yourself crazy.
It's easy to forget about this capability when you're trying to hack something together and template inheritance, like class inheritance in object oriented programming can add complexity and fragility. So you'reprobably well justified in being wary of using inheritance, but give it a shot!
Sphinx is a documentation tool kit, and it's very extensible, and awesome. However, in the base configuration it's not a complete end-to-end publishing system: it doesn't have built in version control/maintenance, it's in general only aware of the current build (i.e. the HTML version of your documentation is unaware of the PDF version (and so forth,)) and once Sphinx compiles a site you still have to deploy it somewhere.
In short, to build a website or resource using Sphinx, there are other things the build system before and after running Sphinx to get the product you need. You can reduce a great deal of complexity and provide a number of common points to synchronize multiple projects.
Also, avoid doing crazy things with Makefiles.
Download intersphinx inventories independently of Sphinx.
Sphinx will attempt to download each intersphinx inventory each time you build your site and it will download each inventory serially. It's trivial to do better on your own. Here's what we do: intersphinx.py
The index objects (which Sphinx uses for all special objects) live in a flat namespace, and collisions are not well handled. Items of different If you're indexing items using Sphinx that have more than one n
Today I did a (slightly) more formal release of a software project that I've been working on pretty consistently for the last two months. It's an extension or elaboration on the buildcloth, and is the groundwork for some other projects I've been working on.
While there are some new fixes and improvements to the initial meta-build tool components of the project as I'd been working on 5 months ago, this one goes even further and includes a complete build automation tool.
The deal with this is that I'd been running a build system for months that had a bunch of very small tasks, and performance was awful for no really good reason. Well for one reason: process creation. Each task needed to create its own shell, run, and exit, which was awful. The solution to this problem was running each task (which was ultimately just a Python function) in a Python multi-processing pool. The new version of buildcloth is an attempt to build some common infrastructure around this practice.
It still needs some real world testing, and there are some missing features that I'd like to add, and always more documentation, but it's good enough that I wanted to get it out there so that people could start using it and giving feedback.
I'll post more later on the the experience and lessons learned here in a bit. While I work on that, see:
Onward and Upward!
See the rhizome archive for more posts.