Weekend Accomplishments

(Note: Because I’m terrible at remembering to post entries during the week, this post is actually from last week. But it’s still interesting!)

The past few weeks have been somewhat disjointed for me. I’d been working a lot to wrap up a long expected release, followed by a vacation without a project plan, and a few more busy weeks. On top of this, I spent a bunch of time working on wrapping up, or at least releasing a few personal project to assuage some guilt.

After all that, I found myself at loose ends: I didn’t have new projects because I hadn’t had enough time to think about them or more importantly I was so interested in finishing something that I’d been trying to suppress thinking about new projects.

Well that was a great idea. Not. Now, finally after spending too long lolling about and trying to restart the creative (and project planning) engines, I’ve actually done some things:

RstCloth

Basically this the first break at a very very simple API for generating reStructuredText. It’s modeled on the interface for my buildcloth which does the same sort of thing for generating Makefiles.

reStrucutredText exists to make text easier for humans to write well formed documents, which is great and useful for about 95% of use cases: human editable text formats for machine parsing are an amazing boon to documentation productivity.

There are cases, where it makes more sense to store content in a regular format, like JSON or YAML and build the content programatically, tabular data, integrating content from external sources. If most of your tool chain uses reStructuredText, then something like RstCloth is probably exactly what you need.

And because it’s a second-generation *Cloth tool, I already have most of the awkwardness worked out.

It’s still dev, and I’ll be getting documentation, a readme, and some examples nailed out in the next few weeks. In the meantime:

tumblr -- m.tycho.co

I reactivated my tumblr account, hooked it up to the awesome tumblesocks plugin for emacs, and have attached it to the concise m.tycho.co domain. I’m also mirroring the content at tychoish.com/micro.

I’m going to try to avoid over thinking this, but:

  • While I’ve had some struggles with the emacs integration, it’s generally really idiomatic.

  • I really like that you can queue posts. This is the feature that I miss the most about systems that I’ve used in the past to host tychoish.

  • I like that there are some community features, and that the tagging isn’t worthless in light of being able to use tags to jump to relevant posts from other people.

    While I like self-hosted websites, and am kind of freaked out by the whole “my blog is a service,” I think that the connection to a community/audience is useful and powerful, and is not to be overrated.

  • I like that tumblr does automatic integration with facebook and twitter. You can sort of do this manually, but baking things in leads to a better experience.

New Knitting Project: Cardigan

I’ve mentioned that I was working on a new sweater a few months ago, but I’ve neglected to post or write about the project at all. Let’s change that now:

In most respects it’s just like a number of existing sweaters that I’ve made: two color patterns, using a combination of mid-sized extrapolation of Scandinavian mitten patterns, with some influence of Turkish stocking patterns arranged in panels to convey strong vertical lines. The yarn is Harrisville Shetland, and another unidentified Shetland from a cone I got years ago and have now used in three sweaters. The plan is to have a simple fisherman’s-style drop shoulder construction with a simple short crew neck color.

The plan diverges somewhat from “tychoish standard” in two respects:

The biggest change is that it’s going to be a cardigan. I’ve never made a cardigan that I’d call a rocking success. I can do it, but the finishing always leaves something to be desired and it hangs funny or flares in a way that I don’t want.

The plan for finishing the cardigan opening this time around is to use the steek (the bit that you cut open) as the facing for a hem. the idea is minimal prep and let the yarn do its thing. For closure, I’ll do an attached i-cord band with room for buttons.

The slightly smaller change is that rather than use a hem, I used the “purl-when-you-can-and-want-to” for bottom hem treatment. The idea is that if you purl occasionally for the first few inches you can counteract the tendency of knitted fabric from rolling. It’s not perfect yet, but I’ve not steamed it, so we’ll see.

It’s fun to knit so far, and I look foraward to finally conquering my fear/avoidance of cardigans and perhaps finding the perfect lower edge finishing approach for stranded sweaters.

Onward and Upward!

While I've Been Gone...

… from blogging. See this post for the background.

I sometimes look at other people’s blogs, and think “wow, that’s sharp,” and while I really like the current tychoish theme, there’s a distinct lack of gradients, really polished typography, strong crisp lines, and elegant side bars.

Not that I have a clue what I’d put in a side bar: Hell, I can’t even find good things to put in the Cyborg Institute side bar. But it’s not just that my design has grown dated (I don’t think it has, that much,) and more that the practice of blogging has changed in a few ways:

The State of Blogging

  • self-hosted blogs are the exception rather than the rule.

  • it’s become increasingly difficult to aggregate content, the demise of Google Reader, both the removal of the product and the declining trend it its use point to the idea that RSS isn’t a user facing transmission method.

    People are getting content through other means, and publishers probably can’t depend that users will poll any content, which changes the role of the publishing system.

  • In fact, I don’t have a real clue what the current state of the art for publishing tools for blogs is these days. My sense is that a greater portion of blogs are hosted on services like Tumblr and WordPress.com.

    The big “advancements,” in blogging technology are probably related to integration and distribution of content to third party systems, which services can probably do better than hosted solutions.

  • There are fewer long-lived personal blogs, and even fewer that stray beyond a single niche.

Are there blogs that you read regularly? How do you know when there’s a new post?

Changes Afoot

Given these changes, and the chance to rethink how I approach this blog:

  • I’m curious as to the state of commenting and discourse related to blogs. Do people actually comment, in anything other than exceptional situations? Are most conversations on hacker-news/reddit or other domain specific common space and other blogs?

    I’ve been thinking about the prospect of even turning off the discussion/discourse pages here. They don’t get used, they’re kind of weird, people don’t really know how to use them, and I’m not sure they get used. At the same time, providing a space for conversation seems essential. More on this on a later post.

    Edit: I totally did this, and while I have some regrets, I think it’s generally a good move.

  • While it’d be nice to automate submitting content to various aggregation sites and social network-sites, I’ve added various browser extensions to do these submissions. It’s a pain in the ass, but I guess auto-submits makes for less useful content aggregator.

  • Just as tagging systems are inefficient and broken for wikis and “real” technical resources (see /posts/taxonomic-failure/ for my thoughts,) they’re not all that great for blogs. I’m considering completely removing the tagging system on tychoish, and just letting the search tool (which is pretty good) make content easy to discover.

Onward and Upward!

Doing versus Talking

In On my Return to Blogging post I attributed the fact that I’d taken a break from blogging because I wanted to get out and do things rather than just spend my free time writing and thinking about things.

A Critique

The problem with this kind of statement is that it evokes a certain kind of anti-intellectualism: thinking isn’t as good as doing things, which is counter productive. Actions, creation, feed and grow out of thinking (and vice versa.)

In light of this it’s difficult to re-calibrate ones practice without on the one hand taking an anti-intellectual stance or becoming too ungrounded in practice.

Cogitative Side Effects

I read an article a while back (source lost to the depths of the internet,) that mentioned the following effect; when you talk about something publicly the recognition and validation you get from talking about it is pretty much the same as the recognition and validation you’d get from actually doing something. The result is, if you talk about doing something, you become less likely to actually do it because you’ve already experienced most of the gratification of doing something.

(Sorry for the poor translation.)

In any case, it seems plausible, and certainly worth testing. So when I say “I want to spend time doing things,” rather than theorizing about possible future projects or talking about things I want to work on, as has been my wont, I’m just not.

This is an interesting conundrum for free software/open source: how do you start developing a project in a community centered way without shooting yourself in the proverbial foot. Sometimes it works (e.g. GNU MediaGoblin,) but often people hack a working prototype (and often a lot more) before talking about the project. There are too many examples to list.

There are also a large number of examples of projects that started that languish because they were clearly announced too soon. On the other hand, maybe early-public discussion or announcements is purely epiphenomenal and early public discussion is just a symptom of an always already weak project, that you’re more interested in talking about something that doing something. (Which might just prove the point?)

The Take Away

  • Don’t blog about something until it exists, and is in a form that you’d be willing to share and discuss.

Corollary: code names are probably the same as real names.

  • Strive for balance between “project work,” and meta-work. The ideal proportions are unclear.
  • Avoid anti-intellectualism when possible.

Buildcloth Release, No. 1

Today I released the first version of Buildcloth which is a tool that I’ve been using at work to programatically (and in some cases) dynamically generate build systems (i.e. Makefiles.)

Background

It’s obviously been “production ready” in some sense for a while, but I recently finished the API documentation, and a lot of the infrastructure for packaging and distribution, so it seemed like this was a good starting point.

The initial idea was basically that while Make syntax can be really powerful, in a number of situations:

  • to specify conditional elements,
  • to generate build targets and procedures based on system configuration or project state,
  • for large numbers similar of targets, and
  • for build with where single targets have a group of related rules,

defining build systems programatically ends up producing a much more reliable and maintainable build system. The wins are pretty big in terms of maintainability, clarity, and flexibility.

The idea, and naming, is sort of: do what fabric does for shell scripts and deployment but for build system generators. Maybe this is exactly what you’re looking for.

More Information

Check it out:

Bugs go here, and patches/pull requests are always welcome.

Cool Improvements:

  • full documentation.
  • support for specifying targets/dependencies as a list.
  • a build-rule abstraction called RuleCloth.
  • improved ninja support.

The Roadmap

  • making the tutorial and high level documentation better.
  • improving the “RuleCloth.”
  • adding some preliminary tools for managing data interactions.
  • pypy support (why not?)

On My Return to Blogging

I’ve been a blogging slacker in the last few months. I’ve been working a lot (software releases! content migrations!) and spending my free-time singing and working on a few odds-and-ends projects. And not blogging.

But I did this /posts/delegated-builds project and it seemed like blogging about it would be good.

And it was…

I’ve had this blog, in one form or another for 10 years, and my relationship to this blog has grown and changed a lot in that time and I don’t think it’s useful to really think about all the turns too much, but the recent developments are novel:

  • I write for a living, and have pretty consistently for the last 4-ish years (holy crap!) It’s not exactly that I’m burnt out on writing, but it does mean that I write differently now, which is a good thing, but I don’t always have the same ability to sit down after work and want to write more, for fun.

  • I noticed that blogging rigorously meant that I didn’t really have time to work on “research” projects, which is to say, I was putting a lot of energy into writing about ideas and theories but not too much time into actually doing things.

    I love theory. I love working on theory, but I’m not sure I see use for theory that doesn’t interact with the world outside of it. For example, before I started my current job I wrote some about technical writing here, and those theories definitely guide what I’m working on today and I think I can stand by what I said, but I think my understanding and knowledge of documentation has grown a lot for the experience of working on it.

  • To continue on this theme, the programming projects I’ve been working on, which I haven’t blogged about too much here, have been helpful in teaching me a lot about software development, (which I’ve long been fascinated by,) which improves the documentation I write and the way I approach problems.

    Furthermore, it means that things I used blog about and say “wouldn’t it be nice if a tool that did existed?” and now I spend a lot of time thinking about how to make them exist. Which isn’t to say that I’m a really fluent programmer (yet,) but I’m not helpless.

So in short, I’m back, and I’m hopeful that in the coming weeks and months I can use this space to talk about the things I’m working on and help build a little bit of (mostly personal) momentum behind these projects.

Onward and Upward!

Delegated Build Questions

This post accumulates what I thought would be the common questions about the /posts/delegated-builds post/tool. For more background see the /posts/build-woes post.

Couldn’t you just have a separate build-only repository?

Sure, but you’d still have to manage that repository which would probably require a non-trivial amount of code and wouldn’t support building/testing topic branches. Furthermore unless you linked the build directories in some way, which this solution does, you’d end up chronically overbuilding.

Doesn’t this use lot of disk space?

Sure, some. But I think in most competitions between disk space and improved productivity, productivity always wins. That not withstanding:

1. Little known fact: when run git clone and specify the remote as a “local” repository, git uses hardlinks, if possible, for it’s objects database. This means, that you’re only copying the source tree, indeed there are two or three copies of the source tree lying around as it is.

2. Most build processes aren’t terribly space intensive: the second checkout is only 7 megs, our .git directory is 7.3 megs (packed), which translates to a 5.4 meg source directory. By contrast the output of a full build of a branch is about 150 megabytes plus production staging.

At least in our case, the additional space costs are effectively trivial both given the size of contemporary hard drives and scale of other size requirements.

Source code may be larger: the MongoDB source tree has a 16 megabyte source tree (not counting 50+ megs of in-tree third party libraries) that becomes tens of gigabytes with build artifacts. Even so, given a project of this scale space costs wouldn’t be hard to justify. Having said that, most software build problems (that I’m aware of,) don’t face this kind of contention, so it’s pretty irrelevant.

This doesn’t make anything faster, so how does it help?

Indeed it probably makes things slower (tests are not yet conclusive,) but it means that any build process can happen entirely in the background and without possibly affecting your current work.

Sometimes the best way to optimize an inefficient process is to apply intelligence and actually make something slow faster. This is great, but it’s also quite hard (and time consuming) and often intelligence can only increase performance by a few percentage points. As a caveat, always make sure that things aren’t slow for a dumb and simple reason, but if an improvement isn’t obvious or there isn’t a simple easy to fix source of slowness, intelligence is often overrated in this regard.

Other times, perhaps even often, the best way to optimize an inefficient process is to make it not matter that it’s slow. Some things take a long time to do, and while it’s great to do things synchronously, it’s not always a real requirement.

This is a smart hack that falls into the second category: if builds are going to take 4 to 6 minutes to run, I don’t want that to prevent things from happening in that time. I don’t want to have to think about coordinating activities around a given period of dead time: this hack solves this handily.

Four to 6 minutes isn’t that long, but it’s starting to get to a point where it’s too long to maintain focus on a task and wait around for a build to finish, particularly for the longer ends.

With this I think we could tolerate ~15 minute builds without really causing a problem. Beyond that and we might need to reopen this case.

Build Woes

I thought I’d back up after the /posts/delegated-builds post and expand on the nature of the build engineering problem that I’ve been dealing with (at my day job) on a documentation related problem.

  • we do roughly continuous deployment. All active development and editing happens in topic branches, and there’s no really good reason to leave typos and what not on the site any longer than we need to.
  • we publish and maintain multiple versions of the same resource in parallel, and often backport basic changes to maintenance branches. This is great for users, and great for clarity, but is awful practically, because to deploy continuously, you have to be rebuilding.
  • all build is self-contained. This isn’t strictly a requirement, and we do use some internal continuous integration tools for internal development, but at the core, for a number of reasons I think it’s important that all writers be able to build the project locally:
  • as an open source project, it’s important that users can easily contribute at any level. We do lots of things to make it easy for people to submit patches, but if the build isn’t portable (within reason,) then it’s difficult for developers to work as peers.
  • if it’s difficult to view rendered content while developing, it’s hard to develop well or efficiently. While I think the what-you-see-is-what-you-get model (WYSIWYG) is the wrong answer, good feedback loops are important and being able to build locally, after you make changes, whenever you want, regardless of the availability of a network connection, is terribly important.
  • the tool we use, Sphinx, in combination with the size of our resource is a bottleneck. A single publication run takes anywhere from 4:30-6 depending on the hardware, and has grows on average 30 seconds every six months. I could rant about parallelism in documentation, but basically, if you want a system that handles cross referencing and internal links, and you want to generate static content, long compile times are mostly unavoidable.

Now there are a number of tricks that we’ve established to fight this underlying truth: Sphinx does some dependency checking to avoid “overbuilding,” which helps some, and I’ve done a lot of mangling in the Makefiles to make the build process more efficient for most common cases, but even so, long growing build times are inevitable.

The Sphinx part of the build has two qualities that are frustrating:

  • each build is single threaded, so it has to read all the files one by one, and then write each file one by one. You can build other output formats in parallel (with a small hack from the default makefile,) but you can’t get around the speed of a single build. There is a patch in consideration for the next version that would allow the write-stage of the build to run concurrently, but that’s not live yet.
  • during the read stage of the build, you can’t touch the source files, and extra files in the source tree can affect or break the build, which means that for the most part you can’t build and work at the same time, until now.

The solutions are often not much better than the problem:

  • use a different build tool, that was built to do incremental builds. The problem is that there aren’t a lot of good options in this area, and the build is really the primary objectionable feature of the build.

  • improve the build tool, or wait for it to improve. The aforementioned patch to let the write phase run concurrently will help a lot. Having said that, it’s important to keep the project on a standard public release of Sphinx and it’s difficult to modify core Sphinx behavior from the extension system.

    Perhaps I have Stockholm Syndrome with the build, but I tend to thing that on some level this is a pretty difficult problem, and building a safe concurrent build system is hard there aren’t a lot of extant solutions. At the same time, this blog is about 2.5 times as large as the documentation project and can do a complete rebuild in 20% of the time or less as much time. While the blog is probably a little less complex, they’re largely similar and it’s not 5-6 times less complex.

    I think the problem is that people writing new documentation systems have to target and support new users and smaller test projects, that by the time people have serious problems with the road blocks, the faulty designs are too baked in.

  • brute force the problem by making use non-local build infrastructure that has faster less contentious processor and disk resources. This sounds good, except our test machines are pretty fast, and the gains made by better hardware don’t keep up with continued growth. While we might gain some speed by moving builds off of our local machines, the experience is quite worse. Furthermore, we do build-non locally, and that’s great, but it’s not a replacement.

There aren’t a lot of solutions and most of them seem to come down to “deal with it and build less,” which is hardly a solution.

This is the foundation of the /posts/delegated-builds script that I wrote, which addresses the problem by making it less intrusive. I’m also working on a brief FAQ, which might help address some of the big questions about this project.