Novel Automation

This post is a follow up to the interlude in the /posts/programming-tutorials post, which part of an ongoing series of posts on programmer training and related issues in technological literacy and education.

In short, creating novel automations is hard. The process would have to look something like:

  1. Realize that you have an unfulfilled software need.
  2. Decide what the proper solution to that need is. Make sure the solution is sufficiently flexible to be able to support all required complexity.
  3. Then sit down, open an empty buffer and begin writing code.

Not easy. [1]

Something I've learned in the past few years is that the above process is relatively uncommon for actual working programmers: most of the time you're adding a few lines here and there, testing various changes or adding small features built upon other existing systems and features.

If this is how programming work is actually done, then the kinds of methods we use to teach programmers how to program should hold some resemblance to the actual work that programmers do. As an attempt at a case study, my own recent experience:

I've been playing with Buildbot for a few weeks now for personal curiosity, and it may be useful to automate some stuff for the Cyborg Institute. Buildbot has its merits and frustrations, but this post isn't really about buildbot. Rather, the experience of doing buildbot work has taught me something about programming and about "building things," including:

  • When you set up buildbot, it generates a python configuration file where all buildbot configuration and "programming" goes.

    As a bit of a sidebar, I've been using a base configuration derived from the buildbot configuration for buildbot itself, and the fact that the default configuration is less clean and a big and I'd assumed that I was configuring a buildbot in the "normal way."

    Turns out I haven't, and this hurts my (larger) argument slightly.

    I like the idea of having a very programmatic interface for systems that must integrate with other components, and I really like the idea of a system that produces a good starting template. I'm not sure what this does for overall maintainability in the long term, but it makes getting started and using the software in a meaningful way, much more possible.

  • Using organizing my buildbot configuration as I have, modeled on the "metabuildbot," has nicely illustrated the idea software is just a collection of modules that interact with each other in a defined way. Nothing more, nothing less.

  • Distributed systems are incredibly difficult to get people to conceptualize properly, for anyone, and I think most of the frustration with buildbot stems from this.

  • Buildbot provides an immediate object lesson on the trade-offs between simplicity and terseness on the one hand and maintainability and complexity on the other.

    This point relates to the previous one. Because distributed systems are hard, it's easy to configure something that's too complex and that isn't what you want at all in your Buildbot before you realize that what you actually need is something else entirely.

    This doesn't mean that there aren't nightmarish Buildbot configs, and there are, but the lesson is quite valuable.

  • There's something interesting and instructive in the way that Buildbot's user experience lies somewhere between "an application," that you install and use, and a program that you write using a toolkit.

    It's clearly not exactly either, and both at the same time.

I suspect some web-programming systems may be similar, but I have relatively little experience with systems like these. And frankly, I have little need for these kinds of systems in any of my current projects.

Thoughts?

[1]Indeed this may be why the incidence of people writing code, getting it working and then rewrite it from the ground up: writing things from scratch is an objectively hard thing, where rewriting and iterating is considerably easier. And the end result is often, but not always better.

In Favor of PDF

This is really a short rant, and should come as a surprise to no one.

I hate DOC files, and RTF files, to say nothing of ODF, DOCX, and their ilk because they have two necessarily conflicting properties:

1. They're oriented at producing documents on paper. Which is crazy. Paper is an output, but it's not the only output in common use, so it's nuts that generic document representation formats would be so tightly coupled with paper.

2. The rendering of the content is editor specific, particularly with regards to display options. If I compile a document and send it to you, I have no guarantee whatsoever about the presentation or display of the document on your system, particularly if I'm not certain that your system is similarly configured. Particularly with respect to fonts, page breaks, etc.

This is particularly idiotic with respect to 1.

It's not that PDF is great, or especially usable, but it's consistent and behaves as expected. Furthermore, it does a good job of appropriately expressing the limitations of paper.

So use PDF and accept no substitutions.

In Favor of Fast Builds

This is an entry in my loose series of posts about build systems.

I've been thinking recently about why I've come to think that build systems are so important, and this post is mostly just me thinking aloud about this issue and related questions.

Making Builds Efficient

Writing a build systems for a project is often relatively trivial, once you capture the process, and figure out the base dependencies, you can write scripts and make files to automate this process. The problem is that the most rudimentary build systems aren't terribly efficient, for two main reasons:

1. It's difficult to stumble into a build process that is easy to parallelize, so these rudimentary solutions often depend on a series of step happening in a specific order.

2. It's easier to write a build system that rebuilds too much rather than too little for subsequent builds. From the perspective of build tool designers, this is the correct behavior; but it means that it takes more work to ensure that you only rebuild what you need to.

As a corollary, you need to test build systems and approaches with significantly large systems, where "rebuilding too much," can be detectable.

Making a build system efficient isn't too hard, but it does require some amount of testing and experimentation, and often it centers on having explicit dependencies, so that the build tool (i.e. Make, SCons, Ninja, etc.) can build output files in the correct order and only build when a dependency changes. [1]

The Benefits of a Fast Build

  1. Fast builds increase overall personal productivity.

    You don't have to wait for a build to complete, and you're not tempted to context switch during the build, so you stay focused on your work.

  2. Fast builds increase quality.

    If your build system (and to a similar extent, your test system,) run efficiently, it's possible to detect errors earlier in the development process, which will prevent errors and defects. A tighter feedback loop on the code you write is helpful.

  3. Fast builds democratize the development process.

    If builds are easy to run, and require minimal cajoling and intervention, it becomes much more likely that many people

    This is obviously most prevalent in open source communities and projects, this is probably true of all development teams.

  4. Fast builds promote freshness.

    If the build process is frustrating, then anyone who might run the build will avoid it and run the build less frequently, and on the whole the development effort looses important feedback and data.

    Continuous integration systems help with this, but they require significant resources, are clumsy solutions, and above all, CI attempts to solve a slightly different problem.

Optimizing Builds

Steps you can take to optimizing builds:

(Note: I'm by no means an expert in this, so feel free to add or edit these suggestions.)

  • A large number of smaller jobs that can complete independently of other tools, are easy to run in parallel. If the jobs that create a product take longer and are more difficult to split into components, then the build will be slower, particularly on more powerful hardware.
  • Incremental builds are a huge win, particularly for larger processes. Most of the reasons why you want "fast builds," only require fast rebuilds and partial builds, not necessarily the full "clean builds." While fast initial builds are not unimportant, they account for a small percentage of use.
  • Manage complexity.

There are a lot of things you can do to make builds smarter, which should theoretically make builds faster.

Examples of this kind of complexity include storing dependency information in a database, or using hashing rather than "mtime" to detect staleness, or integrating the build automation with other parts of the development tool chain, or using a more limited method to specify build processes.

The problem, or the potential problem is that you lose simplicity, and it's possible that something in this "smarter and more complex" system can break or slow down under certain pressures, or can have enough overhead to render them unproductive optimizations.

[1]It's too easy to use wild-cards so that the system must rebuild a given output if any of a number of input files change. Some of this is unavoidable, and generally there are more input files than output files, but particularly with builds that have intermediate stages, or more complex relationships between files it's important to attend to these files.

On Build Processes

I've found myself writing a fair number of Makefiles in the last few weeks: In part because it was a tool, hell a class of tools, that I didn't really understand and I'm a big sucker for learning new things, and in part because I had a lot of build process-related tasks to automate. But I think my interest is a bit deeper than that.

Make and related tools provide a good metaphor for thinking about certain kinds of tasks and processes. Build systems are less about making something more efficient (though often it does do that,) and more about making processes reproducible and consistent. In some respects I think it's appropriate to think of build tools as.

I've written here before about the merits of /technical-writing/compilation for documentation, and I think that still holds true: build processes add necessary procedural structure. Indirectly, having formalized build process, also makes it very easy to extend and develop processes as needs change. There's some up-front work, but it nearly always pays off.

While I want to avoid thinking that everything is a Makefile-shaped nail, I think it's also probably true that there are a lot of common tasks in general purpose computing that are make shaped: format conversion, extracting and importing data, typesetting (and all sorts of publication related tasks,) archiving, system configuration, etc. Perhaps, more generic build tools need to be part of basic computer literacy. That's another topic for a much larger discussion.

Finally, I want to raise (or re-raise) the question, that another function of build systems is reduce friction on common tasks and increase the likelihood that tasks will get done, and that people will need less technical background to do fundamentally mundane tasks. Build systems are absolutely essential for producing output from any really complex process because it's hard to reliably produce builds without them; for less complex processes they're essential because no one (or fewer people) do those tasks without some kind of support.

Rough thoughts as always.

Making Things Easier

I spent a lot of time in the past few months thinking about "automation," as a project to take things that take a long time and require a lot of human intervention into things that just do themselves, and I think this is the wrong approach.

While total automation is an admirable, it's difficult, both because it requires more complex software to deal with edge cases, but also because it's hard to iterate into a fully automated solution.

Let's back up for a moment and talk about automation in general.

Computers are great at automating things. When you figure out how exactly to accomplish something digitally (i.e. polling an information source for an update, transforming data, testing a system or tool,) writing a program to perform this function is a great idea: not only does it reduce the workload on actual people (i.e. you.) I think the difference between people who are "good with computers," and people who are "great with computers," is the ability to spot opportunities for these kinds of automations, and potentially implement them..

To my mind the most important reason to automate tasks is to ensure consistency and to make it more likely that tedious tasks get done.

Having said this, rather than develop complete task automations for common functions, the better solution is probably to approach automation on the bottom up: instead of automating a complete process, automate smaller pieces particularly the most repetitive and invariable parts, and then provide a way for people to trigger the (now simplified) task.

The end result, is a system that's more flexible easier to write, and less prone to failure under weird edge cases. Perhaps this is a manifestation of "worse is better" also.

Thoughts?

Onward and Upward!

Loops and Git Automation

This post provides a few quick overviews of cool bits of shell script that I've written or put together recently. Nothing earth shattering, but perhaps interesting nonetheless.

Commit all Git Changes

For a long time, I used the following bit of code to provide the inverse operation of "git add .". Where "git add ." adds all uncommited changes to the staging area for the next commit, the following commit automatically removes all files that are no longer present on the file-system from the staging area for the next commit.

if [ "`git ls-files -d | wc -l`" -gt "0" ]; then
  git rm --quiet `git ls-files -d`
fi

This is great if you forget to use "git mv" or you delete a file using rm, you can run this operation and pretty quickly have git catch up with the state of reality. In retrospect I'm not really sure why I put the error checking if statement in there.

There are two other implementations of this basic idea that I'm aware of:

for i in `git ls-files -d`; do
  git rm --quiet $i
done

Turns out you can do pretty much the same thing with the following statement using the xargs command and you end up with something that's a bit more succinct:

git ls-files --deleted -z | xargs -0 git rm --quiet

I'm not sure why, I think it's because I started being a Unix nerd after Linux dropped the argument number limit, and as a result I've never really gotten a chance to become familiar with xargs. While I sometimes sense that a problems is xargs shaped, I almost never run into "too many arguments" errors, and always attempt other solutions first.

A Note About xargs

If you're familiar with xargs skip this section. Otherwise, it's geeky story time.

While this isn't currently an issue on Linux, some older UNIX systems (including older versions of Linux,) had this limitation where you could only pass a limited number of arguments to a command. If you had too many, the command would produce an error, and you had to find another way.

I'm not sure what the number was, and the specific number isn't particularly important to the story. Generally, I understand that this problem would crop up when attempting to take the output of a command like find and piping or passing it to another command like grep or the like. I'm not sure if you can trigger "too many arguments" errors with globbing (i.e. *) but like I said this kind of thing is pretty uncommon these days.

One of the "other ways" was to use the xargs command which basically takes very long list of arguments and passes them one by one (or in batches?) to another command. My gut feeling is that xargs can do some things, like the above a bit more robustly, but that isn't experimentally grounded. Thoughts?

Onward and Upward!

A Life Changing Laptop Riser

*tl;Dr>* I got one of those nifty laptop risers that puts your laptop up closer to eye level, and it has pretty much improved all of my interactions with computers a thousand fold and it's made it possible for me to effectively use two screens. This post explores this.


One of my coworkers had a laptop stand she wasn't using and I asked to borrow it for an afternoon, and my neck stopped hurting. I never thought my neck hurt before, but apparently it does.

Or did.

But there's more: for years now I've kept an extra monitor around (and had one at work) but the truth is that I have never really felt like I've been able to get the most out of an external monitor.

Somehow, putting my laptop 4 inches in the air was the little change that made everything better. The laptop is generally on the left of the external monitor, and I have task lists, notes buffers, the chat window, and my status logging window on the laptop, and then three windows on the external (emacs buffer, terminal, emacs buffer) on the right. My primary focus centers between the monitors, but probably edges slightly toward the external, most of the time.

Also, I discovered that I--apparently--have a slight processing/attention defect whereby I find it painful and difficult to focus on things that are happening on the right side of the screen for any amount of time. Which is weird because my right eye has always been noticeably stronger. I'll ponder this more later.

My virtual desktops for email and web browsing are a bit less rigid, but the same basic idea. Somehow it seems to work. I've done a little bit of work recently to get the layouts right, to minimize the impact of the window management of most context switching (scripting various transitions, saving layouts, etc.) In all things are going great.


It strikes me that I've not posted here even a little about my setup in a while. The truth is that it's not terribly surprising and I've not changed very much recently. I'm back to one laptop, and as anxious as having one laptop makes me sometimes (I fear the lack of redundancy,) not having to keep it synced makes life easier. I've put some time into doing a little bit of polish on all of little bits of configuration/code that I have that makes my computing world go around, but mostly it's pretty good.

It's nice, and I'd write more about it, but I want to get back to getting things done around here. Exporting and exploring some of this stuff in greater depth is definitely on my list, so hang in there, and if there's something you particularly want to see, be in touch.

Aeron Woes

I have an Aeron chair at my desk at home. Confession.

I got it in April when I moved to New York City. The only piece of furniture that I had that I couldn't move in my (now former) car was my desk chair. I found a good deal on an Aeron chair and I rationalized to myself that the cost of the chair was actually about the cost of movers. Savings right?

It also helped, that I was leaving a job where I had an Aeron chair in my office, and I knew that in the short term I would be working from home. While my old desk chair was (and is) quite nice, it's not quite the same. Sit in an Aeron chair for a couple of two years, and it's hard to go back. I've sat in other chairs since then, and it's never quite the same.

Having said that, after a cleaning incident today, I would like to collect a few gripes about the Aeron chair for your consideration.

  • The assembly right beneath the chair collects dust and dirt in a proportion that doesn't seem quite possible. It's clearly an artifact of the mesh, and likely a commentary on the air circulation of my apartment.

    Regardless, dusting nightmare.

  • The arms scuff and scratch on desks, if the bottom of the desk isn't completely smooth. This isn't an actual problem: the chair still works fine and is as comfortable as ever, but it's a annoying.

I've never looked at the underside of a desk before seriously. With every other chair I've either ordered a variant sans arms, or I've take then arms off as soon as possible.

The Aeron arms are low enough that they've never bothered me, so I thought "might as well." But it's still annoying.

That's all.