Today's Bottleneck

Computers are always getting faster. From the perspective of the casual observer it may seem like every year all of the various specs keep going up, and systems are faster.1 In truth, progress isn’t uniform across all systems and subsystems, and thinking about this progression of technology gives us a chance to think about the constraints that developers2 and other people who build technology face.

For most of the past year, I’ve used a single laptop, for all of my computing work, and while it’s been great, in this time I lost touch with the comparative speed of systems. No great loss, but I found myself surprised to learn that all computers did not have the same speed: It wasn’t until I started using other machines on a regular basis that I remembered that hardware could affect performance.

For most of the past decade, processors have been fast. While some processors are theoretically faster and some have other features like virtualization extensions and better multitasking capacities (i.e. hyperthreading and multi-core systems) the improvements have been incremental at best.

Memory (RAM) manages to mostly keep up with the processors, so there’s no real bottleneck between RAM and the processor. Although RAM capacities are growing, at current volumes extra RAM just means services/systems that had to be distributed given RAM density can all run on one server. In general: “ho hum.”

Disks are another story all together.

While disks got faster over this period, they didn’t get much faster during this period, and so for a long time disks were the bottle neck in computing speed. To address this problem, a number of things changed:

  • We designed systems for asynchronous operation.

Basically, folks spilled a lot of blood and energy to make sure that systems could continue to do work while waiting for the disk to reading or writing data. This involves using a lot of event loops, queuing systems, and so forth.

These systems are really cool, the only problem is that it means that we have to be smarter about some aspects of software design and deployment. This doesn’t fix the tons of legacy sitting around, or the fact that a lot of tools and programmers are struggling to keep up.

  • We started to build more distributed systems so that any individual spinning disk is responsible for writing/reading less data.

  • We hacked disks themselves to get better performance.

    There are some ways you can eek out a bit of extra performance from spinning disks: namely RAID-10, hardware RAID controllers, and using smaller platters. RAID approaches use multiple drives (4) to provide simple redundancy and roughly double performance. Smaller platters require less movement of the disk arm, and you get a bit more out of the hardware.

    Now, with affordable solid state disks (SSDs,) all of these disk related speed problems are basically moot. So what are the next bottlenecks for computers and performance:

  • Processors. It might be the case that processors are going to be the slow to develop bottleneck. There are a lot of expectations on processors these days: high speed, low power consumption, low temperature, high amount of parallelism (cores and hyperthreading.) But these expectations are necessarily conflicting.

    The main route to innovation is to make the processors themselves smaller, which does increase performance and helps control heat and power consumption, but there is a practical limit to the size of a processor.

    Also, no matter how fast you make the processor, it’s irrelevant unless the software is capable of taking advantage of the feature.

  • Software.

    We’re still not great at building software with asynchronous components. “Non-blocking” systems do make it easier to have systems that work better with slower disks. Still, we don’t have a lot of software that does a great job of using the parallelism of a processor, so it’s possible to get some operations that are slow and will remain slow because a single threaded process must grind through a long task and can’t share it.

  • Network overhead.

    While I think better software is a huge problem, network throughput could be a huge issue. The internet endpoints (your connection) has gotten much faster in the past few years. That’s a good thing, indeed, but there are a number of problems:

  • Transfer speeds aren’t keeping up with data growth or data storage, and if that trend continues, we’re going to end up with a lot of data that only exists in one physical location, which leads to catastrophic data loss.

    I think we’ll get back to a point where moving physical media around will begin to make sense. Again.

  • Wireless data speeds and architectures (particularly 802.11x, but also wide area wireless,) have become ubiquitous, but aren’t really sufficient for serious use. The fact that our homes, public places, and even offices (in some cases) aren’t wired correctly to be able to provide opportunities to plug in will begin to hurt.

Thoughts? Other bottlenecks? Different reading of the history?


  1. By contrast, software seems like its always getting slower, and while this is partially true, there are additional factors at play, including feature growth, programmer efficiency, and legacy support requirements. ↩︎

  2. Because developers control, at least to some extent, how everyone uses and understands technology, the constrains on the way they use computers id important to everyone. ↩︎

Github without Github

Github is great, and I think they’ve done a lot--for the better--to change and shape the way that everyone uses and does really awesome things with git.

But I worry about lock-in, I worry about having a project that relies on some feature of github that can’t be easily accomplished on another platform.

This post is an index of “git ecosystem” tools that let you get something that looks a bit like github on your own servers. Feel free to edit this page (it’s a wiki!) if you have other tools you like or can recommend!

Permissions Control

Wiki / Pages

Github has a wiki system that’s open source. I’ve never played around with it, because there’s ikiwiki which is better anyway.

Their page’s functionally is also open source, it’s Jekyll there’s no particular shortage of programs that do this kind of thing, and most aren’t that good but that’s an orthogonal point.

Hosted Solutions

There are about a million different repository viewers, but the magic of the github website is that there’s a lot of other integrated functionality (bug tracking, merge request queues, automatic forking/branching/etc.

  • gitorious - functional but inelegant.
  • gitlab - promising but untested.
  • repo.or.cz - functional but not practical for casual administration.

Web hooks

I’ve only recently found notify-webhook, but it basically implements something like github’s service hooks, as a traditional git post-recieve hook.

The Rest

There’s no great stand alone merge (pull) request system. Other code review tools are uneven, but the truth is that pull requests are not a sufficiently advanced code review tool.

Patchwork might work, but its a bit rustic for contemporary workflows.

Integrated issue tracking, hell any kind of issue tracking, remains an unsolved problem, but I think github’s approach is a good start, and that feature set isn’t as easily available from other projects/products/tools.

Knitting Shopping

… and planning

I did a little bit of holiday knitting shopping. Given how infrequently I buy yarn and knitting things and the fact that shopping for knitting things correlates strongly with my project planning, it seems worth sharing:

  • I got a cone of merino/tencel lace weight yarn in a steel blue color to knit a long plain tube to wear as a neck tube/scarf thing. I bought one of these a few months ago knit out of a jersey tencel knit, and I adore it, so it makes sense to knit something similar.

    Hopefully knitting these scarves will prove successful and useful. I’m not much of a knitted sock wearer, I find most flat scarves dreadful to knit, I find shawls difficult to pull off, and I enjoy knitted hats but don’t find them windproof enough for common use. Having good, small, lightweight, and plain knitting projects would probably be very good thing indeed.

  • I bought a couple of carbon fiber knitting needles, in sizes 2.5 (the size that my sweaters have been and will be for a little while,) and 0s (for the scarfs and hem facings as needed.)

    I’m a chronic needle bender and like sharp points and reasonably slippery needles. I also have a set of carbon fiber needles for socks which are great. Very much looking forward to trying these out.

  • Finally, I’ve procured a few cones of HD Shetland to complement some of my left overs. On the sweater queue:

    • a few cardigans. One for my mother, and a second one for me?
    • maybe something with shoulders/sleeves in a different color? I’ve used this kind of shading on otherwise plain sweaters, but it seems interesting to see how it might look on a two color sweater.

Sweater Evolutions

I’ve been knitting! Here’s an update:

  • I finished a sweater. Still need to block it, but it looks great so far. It’s a (near) duplicate of a sweater that I made a few years ago in blues. The biggest difference in construction is that I did the hem in a very slightly different way. Other than that, it’s very much the exemplar of “the default tychoish sweater.”

    Note to self, it would be good to have a version of this sweater in brown.

  • I started another sweater. This one is a medium gray and light blue-gray as a cardigan. Rather than do hems, the idea with this is:

    • purl when you can and want to border.

      This is an Elizabeth Zimmerman and Meg Swansen technique for colorwork to avoid ribbing or hems. If you purl occasionally in the first few inches, you can prevent rolling. Seems to work well enough, and it makes the bottom edge less bulky and more integrated into the sweater.

    • Crocheted front steek with knitted cord (i-cord) edge, using the steek as facing.

      Another Meg Swansen technique where you use the steek as facing, by crocheting along the edges and then knitting an i-cord to cause the steak to “fold” under and act as a facing. Blocking does the rest. Again, the end result is lightweight, flexible, and easy to handle.

    I have about five inches done, and I expect slow but steady progress on this over the next few months.

Documentation Maximalism

You may hear people, particularly people who don’t like to write documentation, something like:

Users need minimalist documentation that only answers their questions, and there’s no point in overwhelming users with bloated, maximalist documentation that they’ll never read.

Which sounds great, but doesn’t reflect reality or best practice. Consider the following:

  • Documentation is as much for the producers of the software as it is for the users. Having extensive documentation contributes too, and reflects a sane design process. Collecting and curating the documentation helps ensure that the software is usable and knowable.
  • Having complete documentation reduces support costs, both by reducing the volume of support requests and by lowering the complexity of the work associated with support.
  • Good extensive documentation drives adoption of software. Products with better documentation will always see better adoption than comparable products with worse documentation.

Without users, software is useless.

Does this mean that you shouldn’t value minimalism, particularly conceptual minimalism, and visual minimalism? Does this mean you should avoid customizing the documentation to fit the needs and patterns of your users?

No, of course not.

But minimalism for the sake of minimalism, without a particular strategy is an awful and ill-gotten ideology.

Even if it looks good, and particularly if it sounds good.

Information Debts

Like technical debt, information debt is a huge problem for all kinds of organizations, and one that all technical writers need to be aware and able to combat directly. Let’s backup a little…

Information debt is what happens when there aren’t proper systems, tools, and processes in place to maintain and create high quality information resources. A number of unfortunate and expensive things result:

  • People spend time recreating documents, pages, and research that already exists. This is incredibly innefficent and leads to:
  • Inaccurate information propagates throughout the organization and to the public.
  • Information and style “drifts,” when information and facts exist in many places.
  • Organizations spend more money on infrastructure and tools as a band-aid when data is poorly organized.
  • People lose confidence in information resources and stop relying on them, preferring to ask other people for information. This increases the communication overhead, noise level, and takes longer for everyone, than using a good resource.

To help resolve information debt:

  • Dedicate resources to paying back information debts. It takes time to build really good resources, to collect and consolidate information, and to keep them up to date. But given the costs of the debt, it’s often worth it.
  • Documents must be “living,” usefully versioned, and there must be a process for updating documents. Furthermore, while it doesn’t make sense to actually limit editing privileges, it’s important that responsibility for editing and maintaining documents isn’t diffused and thus neglected.
  • Information resources, must have an “owner” within an organization or group who is responsible for keeping it up to date, and making sure that people know it exists. You can have the best repository for facts, if no one uses it and the documents are not up to date, it’s worthless.
  • Minimize the number of information resources. While it doesn’t always make sense to keep all information in the same resource or system, the more “silos” where a piece of information or document might live the less likely a reader/user will find it.

… and more.

I’m working on adding a lot of writing on information debt in the technical writing section of the wiki. I’ll blog more about this, while I continue to work through some of these ideas, but I’m quite interested in hearing your thoughts on this post and on the information-debt pages as well.

Onward and Upward!

Novel Automation

This post is a follow up to the interlude in the /posts/programming-tutorials post, which part of an ongoing series of posts on programmer training and related issues in technological literacy and education.

In short, creating novel automations is hard. The process would have to look something like:

  1. Realize that you have an unfulfilled software need.
  2. Decide what the proper solution to that need is. Make sure the solution is sufficiently flexible to be able to support all required complexity.
  3. Then sit down, open an empty buffer and begin writing code.

Not easy.1

Something I’ve learned in the past few years is that the above process is relatively uncommon for actual working programmers: most of the time you’re adding a few lines here and there, testing various changes or adding small features built upon other existing systems and features.

If this is how programming work is actually done, then the kinds of methods we use to teach programmers how to program should hold some resemblance to the actual work that programmers do. As an attempt at a case study, my own recent experience:

I’ve been playing with Buildbot for a few weeks now for personal curiosity, and it may be useful to automate some stuff for the Cyborg Institute. Buildbot has its merits and frustrations, but this post isn’t really about buildbot. Rather, the experience of doing buildbot work has taught me something about programming and about “building things,” including:

  • When you set up buildbot, it generates a python configuration file where all buildbot configuration and “programming” goes.

    As a bit of a sidebar, I’ve been using a base configuration derived from the buildbot configuration for buildbot itself, and the fact that the default configuration is less clean and a big and I’d assumed that I was configuring a buildbot in the “normal way.”

    Turns out I haven’t, and this hurts my (larger) argument slightly.

    I like the idea of having a very programmatic interface for systems that must integrate with other components, and I really like the idea of a system that produces a good starting template. I’m not sure what this does for overall maintainability in the long term, but it makes getting started and using the software in a meaningful way, much more possible.

  • Using organizing my buildbot configuration as I have, modeled on the “metabuildbot,” has nicely illustrated the idea software is just a collection of modules that interact with each other in a defined way. Nothing more, nothing less.

  • Distributed systems are incredibly difficult to get people to conceptualize properly, for anyone, and I think most of the frustration with buildbot stems from this.

  • Buildbot provides an immediate object lesson on the trade-offs between simplicity and terseness on the one hand and maintainability and complexity on the other.

    This point relates to the previous one. Because distributed systems are hard, it’s easy to configure something that’s too complex and that isn’t what you want at all in your Buildbot before you realize that what you actually need is something else entirely.

    This doesn’t mean that there aren’t nightmarish Buildbot configs, and there are, but the lesson is quite valuable.

  • There’s something interesting and instructive in the way that Buildbot’s user experience lies somewhere between “an application,” that you install and use, and a program that you write using a toolkit.

    It’s clearly not exactly either, and both at the same time.

I suspect some web-programming systems may be similar, but I have relatively little experience with systems like these. And frankly, I have little need for these kinds of systems in any of my current projects.

Thoughts?


  1. Indeed this may be why the incidence of people writing code, getting it working and then rewrite it from the ground up: writing things from scratch is an objectively hard thing, where rewriting and iterating is considerably easier. And the end result is often, but not always better. ↩︎

Programming Tutorials

This post is a follow up to my :doc`/posts/coding-pedagogy` post. This “series,” addresses how people learn how to program, the state of the technical materials that support this education process, and the role of programming in technology development.

I’ve wanted to learn how to program for a while and I’ve been perpetually frustrated by pretty much every lesson or document I’ve ever encountered in this search. This is hyperbolic, but it’s pretty close to the truth. Teaching people how to program is hard and the materials are either written by people who:

  • don’t really remember how they learned to program.

Many programming tutorials were written by these kinds of programmers, and the resulting materials tend to be decent in and of themselves, but they fail to actually teach people how to program if they don’t know how to program already.

If you already know how to program, or have learned to program in a few different languages, it’s easy so substitute “learning how to program,” with “learn how to program in a new language” because that experience is more fresh, and easier to understand.

These kinds of materials will teach the novice programmer a lot about programming languages and fundamental computer science topics, but not anything that you really need to learn how to write code.

  • people who don’t really know how to program.

People who don’t know how to program tend to assume that you can teach by example, using guided tutorials. You can’t really. Examples are good for demonstrating syntax and procedure, and answering tactical questions, but aren’t sufficient for teaching the required higher order problem solving skills. Focusing on the concrete aspects of programming syntax, the standard library, and the process for executing code isn’t enough.

These kinds of documents can be very instructive, and outsider perspective are quite useful, but if the document can’t convey how to solve real problems with code, you’ll be hard pressed to learn how to write useful programs from these guides.

In essence, we have a chicken and egg problem.


Interlude:

Even six months ago, when people asked me “are you a programmer?” (or engineer,) I’d often object strenuously. Now, I wave my hand back and forth and say “sorta, I program a bit, but I’m the technical writer.” I don’t write code on a daily basis and I’m not very nimble at starting to write programs from scratch, but sometimes when the need arises, I know enough to write code that works, to figure out the best solution to fix at least some of the problems I run into.

I still ask other people to write programs or fix problems I’m having, but it’s usually more because I don’t have time to figure out an existing system that I know they’re familiar with and less because I’m incapable of making the change myself.

Even despite these advances, I still find it hard to sit down with a blank buffer and write code from scratch, even if I have a pretty clear idea of what it needs to do. Increasingly, I’ve begun to believe that this is the case for most people who write code, even very skilled engineers.

This will be the subject of an upcoming post.


The solution(s):

1. Teach people how to code by forcing people to debug programs and make trivial modifications to code.

People pick up syntax pretty easily, but struggle more with the problem solving aspects of code. While there are some subtle aspects of syntax, the compiler or interpreter does enough to teach people syntax. The larger challenge is getting people to understand the relationship between their changes and behavior and any single change and the reset of a piece of code.

2. Teach people how to program by getting them to solve actual problems using actual tools, libraries, and packages.

Too often, programming tutorials and examples attempt to be self-contained or unrealistically simple. While this makes sense from a number of perspectives (easier to create, easier to explain, fewer dependency problems for users,) it’s incredibly uncommon and probably leads to people thinking that a lot of programming revolves around re-implementing solutions to solved problems.

I’m not making a real argument about computer science education, or formal engineering training, with which I have very little experience or interest. As contemporary, technically literate, actors in digital systems, programming is a relevant for most people.

I’m convinced that many people do a great deal of work that is effectively programming: manipulating tools, identifying and recording procedures, collecting information about the environment, performing analysis, and taking action based on collected data. Editing macros, mail filtering systems, and spreadsheets are obvious examples though there are others.

Would teaching these people how programming worked and how they could use programming tools improve their digital existences? Possibly.

Would general productivity improve if more people new how to think about automation and were able to do some of their own programming? Almost certainly.

Would having more casual programmers create additional problems and challenges in technology? Yes. These would be interesting problems to solve as well.