In many ways this is the follow up to hard is
good and the post I promised recounting the
lessons of the buildcloth v0.2.0
release
This release of buildcloth is in some ways, the first real piece of
software I’ve written from the ground up. I’ve written a bunch of
code, and I’ve implemented a decent amount of functionality as
extensions and additions to other programs, written some very small
programs, and written an endless number of throw away scripts, but never
something quite on this scale. The remainder of this post is
Whats the coolest thing about Buildcloth 0.2.0?#
The buildsystem feature is pretty awesome, mostly because it makes it
possible to have legitimate, honest-to-goodness build systems running
with an Python project. Integration is a sweet thing.
I started working on this for two reasons, first because the MongoDB
documentation build process was lagging under some process creation
overhead and using buildcloth as a meta-build system was clearly not
holding up.
Second, I wanted to write a static site generator that used a fully
concurrent internal model. My initial plan was to use the buildcloth
meta-build system, but that clearly wouldn’t hold up at scale so I
needed something like buildcloth.
Finally, there aren’t actually a lot of generalized build automation
tools: Make, Ninja, SCons, Waf, and Rake plus a small cluster of Java
tools (Ant, Maven, sbt, Gradle). See the wikipedia list of build
automation
tools.
How can I use Buildcloth? Is Buildcloth right for my project?#
If you want, you can use Buildcloth as a Make replacement, using the
buildc
front end. For build systems that already have Python code to
wrap or implement build steps, Buildcloth may be much more efficient
than using something like Make. For other kinds of builds, the benefits
may be less pronounced.
You can also use Buildcloth as a library in your own Python programs if
you need a way of ruining build-jobs in a parallel, dependency aware
mode.
I’ve started to think think buildcloth is really a sort of embedable,
small-scale, local version of something like
celery. Or maybe it’s just a
collections of decent wrappers around
multiprocessing.Pool.
Regardless, there aren’t a lot of really intuitive tools around that
make async processing/parallel execution easy and fun in Python, so if
you need to do this kind of processing work, take a look.
How well tested is Buildcloth?#
There’s a complete unit test suite with 400 tests (last count) that
should ensure that things stay stable as the product continues to
develop. In this respect buildcloth is really well tested
(particularly for a project of its age.) In other respects, its less
well tested.
Now that things are comparatively stable, I’m pretty eager to begin
using the buildsystem and make sure all of the higher-level aspects work
well.
What aspects need the most improvement?#
I need to devise a way to save some state between builds so that
buildcloth can check to see if a target needs to be rebuilt by seeing if
the content of the dependent files has changed. Currently, (like many
tools,) buildcloth must rebuild things based on a comparison of modified
times from the file system, which do not necessarily indicate a required
rebuild.
This isn’t really a buildcloth problem, but I also find myself
frustrated by the error reporting of tasks running inside the
multiprocessing pool. I’m thinking of wrapping tasks right before
calling them in a way that will capture output and make it easier to
kill zombie tasks and dead pools.
What would you do differently next time?#
I wrote the implementation in a very bottom-up sort of way, and as a
result the design and testing suite feels a little bottom up. In the
long term I think it was the right decision, but I think that in the
medium term it will lead to some awkwardness.
Furthermore, build systems are fundamentally static and there’s no good
way to “add jobs to the top of the pipe.” I don’t yet have a good
answer to this problem (yet,) but shouldn’t be insurmountable.