Today's Bottleneck

Computers are always getting faster. From the perspective of the casual observer it may seem like every year all of the various specs keep going up, and systems are faster. [1] In truth, progress isn't uniform across all systems and subsystems, and thinking about this progression of technology gives us a chance to think about the constraints that developers [2] and other people who build technology face.

For most of the past year, I've used a single laptop, for all of my computing work, and while it's been great, in this time I lost touch with the comparative speed of systems. No great loss, but I found myself surprised to learn that all computers did not have the same speed: It wasn't until I started using other machines on a regular basis that I remembered that hardware could affect performance.

For most of the past decade, processors have been fast. While some processors are theoretically faster and some have other features like virtualization extensions and better multitasking capacities (i.e. hyperthreading and multi-core systems) the improvements have been incremental at best.

Memory (RAM) manages to mostly keep up with the processors, so there's no real bottleneck between RAM and the processor. Although RAM capacities are growing, at current volumes extra RAM just means services/systems that had to be distributed given RAM density can all run on one server. In general: "ho hum."

Disks are another story all together.

While disks got faster over this period, they didn't get much faster during this period, and so for a long time disks were the bottle neck in computing speed. To address this problem, a number of things changed:

  • We designed systems for asynchronous operation.

Basically, folks spilled a lot of blood and energy to make sure that systems could continue to do work while waiting for the disk to reading or writing data. This involves using a lot of event loops, queuing systems, and so forth.

These systems are really cool, the only problem is that it means that we have to be smarter about some aspects of software design and deployment. This doesn't fix the tons of legacy sitting around, or the fact that a lot of tools and programmers are struggling to keep up.

  • We started to build more distributed systems so that any individual spinning disk is responsible for writing/reading less data.

  • We hacked disks themselves to get better performance.

    There are some ways you can eek out a bit of extra performance from spinning disks: namely RAID-10, hardware RAID controllers, and using smaller platters. RAID approaches use multiple drives (4) to provide simple redundancy and roughly double performance. Smaller platters require less movement of the disk arm, and you get a bit more out of the hardware.

    Now, with affordable solid state disks (SSDs,) all of these disk related speed problems are basically moot. So what are the next bottlenecks for computers and performance:

  • Processors. It might be the case that processors are going to be the slow to develop bottleneck. There are a lot of expectations on processors these days: high speed, low power consumption, low temperature, high amount of parallelism (cores and hyperthreading.) But these expectations are necessarily conflicting.

    The main route to innovation is to make the processors themselves smaller, which does increase performance and helps control heat and power consumption, but there is a practical limit to the size of a processor.

    Also, no matter how fast you make the processor, it's irrelevant unless the software is capable of taking advantage of the feature.

  • Software.

    We're still not great at building software with asynchronous components. "Non-blocking" systems do make it easier to have systems that work better with slower disks. Still, we don't have a lot of software that does a great job of using the parallelism of a processor, so it's possible to get some operations that are slow and will remain slow because a single threaded process must grind through a long task and can't share it.

  • Network overhead.

    While I think better software is a huge problem, network throughput could be a huge issue. The internet endpoints (your connection) has gotten much faster in the past few years. That's a good thing, indeed, but there are a number of problems:

  • Transfer speeds aren't keeping up with data growth or data storage, and if that trend continues, we're going to end up with a lot of data that only exists in one physical location, which leads to catastrophic data loss.

    I think we'll get back to a point where moving physical media around will begin to make sense. Again.

  • Wireless data speeds and architectures (particularly 802.11x, but also wide area wireless,) have become ubiquitous, but aren't really sufficient for serious use. The fact that our homes, public places, and even offices (in some cases) aren't wired correctly to be able to provide opportunities to plug in will begin to hurt.

Thoughts? Other bottlenecks? Different reading of the history?

[1]By contrast, software seems like its always getting slower, and while this is partially true, there are additional factors at play, including feature growth, programmer efficiency, and legacy support requirements.
[2]Because developers control, at least to some extent, how everyone uses and understands technology, the constrains on the way they use computers id important to everyone.

ThinkPad x220 Review

My Decision

Throughout this spring I've been eagerly waiting for the announcement and arrival of the new X-series laptops from Lenovo. I've been incredibly happy with every Thinkpad I've ever had, and while my existing laptop--a very swell T510--has been great, it was time:

  • I needed a system with a bit more power. The power of my existing system was being to frustrate me. Things took too long to compile, I was having some annoying networking processing issues, and to make matters worse...

  • The thing was huge. I think 15 inch laptops are a great size for doing actual work, and I'm not getting rid of this one, but it's not the kind of thing I want to lug on my back. Which I was doing a lot.

  • I needed more redundancy. Most of my work in the world--writing, hacking, communicating--happens with a computer. While my data is backed up (never well enough, of course, but it's there,) I worry more about the case where I'm stranded for a period of time without a working system.

    This facilitates not only piece of mind, but also makes it possible to do things like: upgrade the T510 from 32 to 64 bits. (Don't ask.)

  • In the long run, the older laptop might need to go to R. who's personal system bit the dust a few months ago.

What Happened

But, when the new x230s came out and I found myself unimpressed. The revision got a different keyboard and I adore the old keyboard. To make matters worse the screen on the new model wasn't any better than the one on the old: the pixel density is somewhat crappy.

In light of this, and mostly for the older keyboard, I decided to buy the older model. In short: it's great.

I bought the RAM and hard drive aftermarket, and replaced them before booting the first time. Having 16 gigs of RAM is pretty much amazing, and I'm sold on the notion that SSDs are now a must for most common personal computing work.

Incidentally I discovered that this computer is about the same weight as the 13 inch Macbook Air (and I have the larger battery), for those of you keeping score at home. And way beefier. Thicker obviously, but still...

Point by Point

Pros:

  • The keyboard is the same great Thinkpad keyboard we've always had. I'm sure eventually I'll give in and learn to enjoy the new keyboard, but for now, I'm going to stick with the old.
  • It's way fast. Because, the speed of my old computer defined "the speed of computers," in my mind, it was kind of nifty to learn that computers had actually gotten faster.
  • It's way small. Turns out, if I'm lugging a sub-3 pound laptop around, I can totally use my awesome shoulder bag. I also don't feel like my wrist is going to give out if I need to walk 30 feet holding the laptop in one hand.

Cons:

  • The screen could be so much better than it is, and there's really no excuse. It's not enough of a deal breaker for me, but...
  • That's really it. I think 12 inch wide screen laptops don't have quite enough wrist-rest area on them, but that's really an unavoidable problem: if you have a wide secreen (and thus a full keyboard,) the wrist area is short and narrow. If you have a more square screen and a squished keyboard, then you have enough wrist area. One adjusts.

Computing Literacy Project

I'm working on the final touches of a treatise on Systems Administration that I've mentioned in passing here before. I hope to have this project up on a web-server near you (near me?) in a few months. Because I've had a lot of fun working on this project, I decided that it would be cool to do a similar project on another topic dear to my technical interest: Computing Literacy.

Computing literacy? Isn't that a little *early nineties* for you and or everyone?

Well yes, I suppose, but don't think that the improvements that have aided the adoption of technology in the last 20 years have necessarily raised the amount of computing literacy. In fact, it would not surprise me if all of the abstraction and friendly user interfaces make it more difficult for users to understand how their computers actually work.

Because their computers play such an important role in everyone's life and work, I suspect that there are a lot of folks who don't know how these things work but want to learn. And the truth is that there aren't a lot of resources around for people who aren't already tinkerers and hackers.

This is where I fit in: I can write a book-like object that provides useful information in an easy to understand way that technical people can provide their friends and family say things like "what's a database?" or "what do you mean compiled?" or "what's a server?" or "what is this file system thing?"

Are you so arrogant as to think that you can add something new to this subject?

I'm not going to develop new technology here, or suddenly make things easier to understand, but I think there's a niche for computer users between people like me, who download putty on other people's computers to ssh into their VPS server which hosts a VPN that their laptops are connected to, to ssh into their laptops to kill misbehaving X11 sessions (no really, I just did that,)* and most people who just know how to open MS word and send emails, and browse the web.

And I think that giving folks who are technically creative, intuitive, and curious a way to learn about their computers and technology would be great. Particularly in a way that doesn't assume too much prior knowledge, or an interest in complicated math.

Ok, so you're convinced that it's a good idea, what's going to be special about this?

First, while the material will not be so complicated or so novel, I think the presentation may be. Additionally, I envision this document as a useful reference for describing and defining basic computing concepts, to support more technical blog posts and articles.

Finally , I think it would be fun to do this book-like object in a more iterative style, relative to the Systems Administration book: I'll sketch out the basics, put a disclaimer at the top about links breaking and then publish it, and publish changes and make the whole thing accessible by git..

Sound cool to anyone?


So there you have it.

Expect to hear more in the late summer or fall...

9 Awesome Git Tricks

I'm sure that most "hacker bloggers" have probably done their own "N Git Tricks," post at this point. But git is one of those programs that has so much functionality and everyone uses it differently that there is a never ending supply of fresh posts on this topic. My use of git changes enough that I could probably write this post annaully and come up with a different 9 things. That said here's the best list right now.

See Staged Differences

The git diff command shows you the difference between the last commit and the state of the current working directory. That's really useful and you might not use it as much as you should. The --cached option shows you just the differences that you've staged.

This provides a way to preview your own patch, to make sure everything is in order. Crazy useful. See below for the example:

git diff --cached

Eliminate Merge Commits

In most cases, if two or more people publish commits to a shard repository, and everyone commits to remote repositories more frequently then they publish changes, when they pull, git has to make "meta commits" that make it possible to view a branching (i.e. "tree-like") commit history in a linear form. This is good for making sure that the tool works, but it's kind of messy, and you get histories with these artificial events in them that you really ought to remove (but no one does.) The "--rebase" option to "git pull" does this automatically and subtally rewrites your own history in such a way as to remove the need for merge commits. It's way clever and it works. Use the following command:

git pull --rebase

There are caveats:

  • You can't have uncommitted changes in your working copy when you run this command or else it will refuse to run. Make sure everything's committed, or use "git stash"
  • Sometimes the output isn't as clear as you'd want it to be, particularly when things don't go right. If you don't feel comfortable rescuing yourself in a hairy git rebase, you might want to avoid this one.
  • If the merge isn't clean, there has to be a merge commit anyway I believe.

Amend the Last Commit

This is a recent one for me..

If you commit something, but realized that you forgot to save one file, use the "--amend" switch (as below) and you get to add whatever changes you have staged to the previous commit.

git commit --amend

Note: if you amend a commit that you've published, you might have to do a forced update (i.e. git push -f) which can mess with the state of your collaborators and your remote repository.

Stage all of Current State

I've been using a versing of this function for years now as part of my download mail scheme. For some reason in my head, it's called "readd." In any case, the effect of this is simple:

  • If a file is deleted from the working copy of the repository, remove it (git rm) from the next commit.
  • Add all changes in the working copy to the next commit.
git-stage-all(){
   if [ "`git ls-files -d | wc -l`" -gt "0" ]; then; git rm --quiet `git ls-files -d`; fi
   git add .
}

So the truth of the matter is that you probably don't want to be this blasé about commits, but it's a great time saver if you use the rm/mv/cp commands on a git repo, and want to commit those changes, or a have a lot of small files that you want to process in one way and then snapshot the tree with git.

Editor Integration

The chances are that your text editor has some kind of git integration that makes it possible to interact with git without needing to drop into a shell.

If you use something other than emacs I leave this as an exercise for the reader. If you use emacs, get "magit," possibly from your distribution's repository, or from the upstream.

As an aside you probably want to add the following to your .emacs somewhere.

(setq magit-save-some-buffers nil)
(add-hook 'before-save-hook 'delete-trailing-whitespace)

Custom Git Command Aliases

In your user account's "~/.gitconfig" file or in a per-repository ".git/config" file, it's possible to define aliases that add bits of functionality to your git command. This is useful defining shortcuts, combinations, and for triggering arbitrary scripts. Consider the following:

[alias]
all-push  = "!git push origin master; git push secondary master"
secondary = "!git push secondary master"

Then from the command line, you can use:

git secondary
git all-push

Git Stash

"git stash" takes all of the staged changes and stores them away somewhere. This is useful if you want to break apart a number of changes into several commits, or have changes that you don't want to get rid of (i.e. "git reset") but also don't want to commit. "git stash" puts staged changes onto the stash and "git stash pop" applies the changes to the current working copy. It operates as a FILO stack (e.g. "First In, Last Out") stack in the default operation.

To be honest, I'm not a git stash power user. For me it's just a stack that I put patches on and pull them off later. Apparently it's possible to pop things off the stash in any order you like, and I'm sure I'm missing other subtlety.

Everyone has room for growth.

Ignore Files

You can add files and directories to a .gitignore file in the top level of your repository, and git will automatically ignore these files. One "ignore pattern" per line, and it's possible to use shell-style globing.

This is great to avoid accidentally committing temporary files, but I also sometimes put entire sub-directories if I need to nest git repositories within git-repositories. Technically, you ought to use git's submodule support for this, but this is easier. Here's the list of temporary files that I use:

.DS_Store
*.swp
*~
\#*#
.#*
\#*
*fasl
*aux
*log

Host Your Own Remotes

I've only once accidentally said "git" when I meant "github" (or vice versa) once or twice. With github providing public git-hosting services and a great compliment of additional tooling, it's easy forget how easy it is to host your own git repositories.

The problem is that, aside from making git dependent on one vendor, this ignores the "distributed" parts of git and all of the independence and flexibility that comes with that. If you're familiar with how Linux/GNU/Unix works, git hosting is entirely paradigmatic.

Issue the following commands to create a repository:

mkdir -p /srv/git/repo.git
cd /srv/git/repo.git
git init --bare

Edit the .git/config file in your existing repository to include a remote block that resembles the following:

[remote "origin"]
fetch = +refs/heads/*:refs/remotes/origin/*
url = [username]@[hostname]:/srv/git/repo.git

If you already have a remote named origin, change the occurrence of the word remote in the above snippet with the name of your remote. (In multi-remote situations, I prefer to use descriptive identifier like "public" or machine's hostnames.)

Then issue "git push origin master" on the local machine, and you're good. You can us a command in the following form to clone this repository at any time.

git clone [username]@[hostname]:/srv/git/repo.git

Does anyone have git tricks that they'd like to share with the group?

The Future of File Organization and Security

I was having a conversation with a (now former) coworker (a while ago) about the future of shared file systems, unstructured organization and management, and access control. What follows are a collection of notes and thoughts on the subject that have stuck with me.

Let's start with some general assumptions, premises, ideas:

  • File system hierarchies are dead or dying. To have a useful file system hierarchy the following qualities are essential:

  • Every piece of data needs to belong in one location and only one location.

  • Every container (e.g. directory or folder) needs to hold at least two objects.

  • Hierarchy depth ought to be minimized. Every system can use two levels. After the second level, each additional level should only be added a very large number of additional objects are added to the system. If you have 3 functional levels and less than 1000 objects, you might be in trouble.

    As you might imagine, this is very difficult to achieve, and the difficulty is compounded by huge amounts of legacy systems, and the fact that "good enough is good enough," particularly given that file organization is secondary to most people's core work.

    While there are right ways to build hierarchical structure for file system data, less structure is better than more structure, and I think that groups will tend toward less over time.

  • Access control is a lost cause. Legacy data and legacy practices will keep complex ACL-based systems for access control in place for a long time, but I think it's pretty clear that for any sort of complex system, access control isn't an effective paradigm. In some ways, access control is the last really good use of file system hierarchies. Which is to say, by now the main use of strong containers (as opposed to tags) is access control.

    I don't think that "enterprise content management"-style tools are there, yet. I suspect that the eventual solution to "how do I control access to content" will either: be based on a an cryptography key system which will control access and file integrity, or there will be a class of application, a la ECMS, with some sort of more advanced abstracted file system interface that's actually use-able.

I'm betting on encryption.

  • Tagging and search are the ways forward. In many cases, the location of files in hierarchy help determine the contents of those files. If there are no hierarchies then you need something more useful and more flexible to provide this level of insight.

  • Great search is a necessity. Luckily it's also easy. Apache Solr/Lucene, Xapian, and hell Google Search Appliances make great search really easy.

  • Some sort of tagging system. In general, only administrators should be able to create tags, and I think single tag-per object (i.e. categories) versus multiple tags per object should be configurable on a collection-by-collection.

    Tag systems would be great for creating virtualized file system interfaces, obviating the need for user-facing links, and leveraging existing usage patterns and interfaces. It's theoretically possible to hang access control off of tag systems but that's significantly more complicated.

    One of the biggest challenges with tag systems is avoiding recapitulating the problems with hierarchical organization.

The most difficult (and most interesting!) problem in this space is probably the access control problems. The organizational practices will vary a lot and there aren't right and wrong answers. This isn't true in the access control space.

Using public key infrastructure to encrypt data may be an effective access control method. It's hard replicate contemporary access control in encryption schemes. Replicating these schemes may not be desirable either. Here are some ideas:

  • By default all files will be encrypted such that only the creator can read it. All data can then be "world readable," as far as the storage medium and underlying file systems are concerned.

  • The creator can choose to re-encrypt objects such that other users and groups of users can access the data. For organizations this might mean a tightly controlled central certificate authority-based system. For the public internet, this will either mean a lot of duplicated encrypted data, or a lot of key chains.

  • We'll need to give up on using public keys as a method of identity testing and verification. Key signing is cool, but at present it's complex, difficult to administer, and presents a significant barrier to entry. Keys need to be revocable, particularly group keys within organizations.

    For the public internet, a some sort of social capital or network analysis based certification system will probably emerge to supplement for strict-web-of-trust based identity testing.

  • If all data is sufficiently encrypted, VPNs become obsolete, at least as methods for securing file repositories. Network security is less of a concern when content is actually secure. Encryption overhead, for processing isn't a serious concern on contemporary hardware.

Thoughts?

Is Dropbox the Mobile File System Standard

I've started using Dropbox on my Android devices recently (and my laptop as a result, [1]) and I'm incredibly impressed with the software and with the way that this service is a perfect example of the kind of web services that we need to see more of. While I have some fairly uninteresting concerns about data security and relying on a service that I'm not administrating personally, I think it's too easy to get caught up the implications of where the data lives and forget what the implications of having "just works," file syncing between every computer.

I used to think that the thing that kept mobile devices from being "real" was the fact that they couldn't sell "post-file system" computer use. I'm not sure that we're ready to do away with the file system metaphor yet. I think Dropbox is largely successful because it brings files back and makes them available in a way that makes sense for mobile devices.

The caveat is that it provides a file system in a way that makes sense in the context for these kinds of "file systemless" platforms. Dropbox provides access to files, but in a way that doesn't require applications (or users) to have a firm awareness of "real files. Best of all, Dropbox (or similar) can handle all of the synchronization, so that every application doesn't need to have its own system.

This might mean that Dropbox is the first functionally Unix-like mobile application. I think (and hope) that Dropbox's success will prove to be an indicator for future development. Not that there will be more file syncing services, but that mobile applications and platforms will have applications that "do one thing well," and provide a functionality upon which other applications can build awesome features.


This isn't to say that there aren't other important issues with Dropbox. Where your data lives does matter, who controls the servers that your data lives on is important. Fundamentally, Dropbox isn't doing anything technologically complicated. When I started writing the post, I said "oh, it wouldn't be too hard to get something similar set up," and while Dropbox does seem like the relative leader, it looks like there is a fair amount of competition. That's probably a good thing.

So despite the concerns about relying on a proprietary vendor and about trusting your data on someone else's server, data has to go somewhere. As long as users have choices and options, and there are open ways of achieving the same ends, I think that these issues are less important than many others.

[1]To be fair, I'm using it to synchronize files to the Android devices, and not really to synchronize files between machines: I have a server for simple file sharing, and git repositories for the more complex work. So it's not terribly useful for desktop-to-desktop sharing, But for mobile devices? Amazing.

Tablet Interfaces and Intuition

I've been using FBReaderJ to read .epub files on my tablet recently, and I discovered a nitfty feature: you can adjust the screen's brightness by dragging your finger up or down the left side of the screen. Immediately this felt like discovering a new keybinding or a new function in emacs that I'd been wishing for a while time. Why, I thought, aren't there more tricks like this?

The iPhone (and the iPad by extension) as well as Android make two major advances over previous iterations of mobile technology. First, they're robust enough to run "real" programs written in conventional programming environment. Better development tools make for better applications and more eager developers (which also makes for better applications.) Second, the interfaces are designed to be used with fingers rather than stylus (thanks to capacitive touch screens) and the design aesthetic generally reflects minimalist values and simplicity. The mobile applications of today's app stores would not work if they were visually complex and had multi-tiered menus, and hard to activate buttons.

The tension between these two features in these platforms makes it difficult to slip nifty features into applications. Furthermore, th economy of application market places does not create incentives for developers to build tools with enduring functionality. The .epub reader I mentioned above is actually free software. [1] I write a couple of posts a while back on innovation (one and two) that address the relationship between free software and technological development but that's beside the point.

Given this, there are two major directions that I see tablet interfaces moving toward:

1. Tablet interfaces will slowly begin to acquire a more complete gestural shorthand and cross-app vocabulary that will allow us to become more effective users of this technology. Things like Sywpe are part of this, but I think there are more.

2. There will be general purpose systems for tablets that partially or wholly expect a keyboard, and then some sort of key-command system will emerge. This follows from my thoughts in the "Is Android the Future of Linux?" post.

I fully expect that both lines of development can expand in parallel.

[1]I also found the base configuration of FBReader (for the tablet, at least) to be horrible, but with some tweaking, it's a great app.

Interfaces in Enterprise Software

This post is a continuation of my human solution to IT and IT policy issues series. This post discusses a couple of ideas about "enterprise" software, and its importance the kind of overall analysis of technology that this posts (and others on this site) engage in. In many ways this is a different angle on some of the same questions addressed in my "Caring about Java" post: boring technologies are important, if not outright interesting.

There are two likely truths about software that make sense upon reflection, but are a bit weird when you think about it:

  1. The majority of software is used by a small minority of users. This includes software that's written for and used by other software developers, infrastructure, and the applications which are written for "internal use." This includes various database, CRM, administrative tools, and other portals and tools that enterprise uses.
  2. Beautiful and intuitive interfaces are only worth constructing if your software has a large prospective userbase or if you're writing software where a couple of competing products share a set of common features. Otherwise there's no real point to designing a really swanky user interface.

I'm pretty sure that these theories hold up pretty well, and are reasonably logical. The following conclusions are, I think, particularly interesting:

  • People, even non-technical users, adjust to really horrible user interfaces that are non-intuitive all the time.

  • We think that graphical user interfaces are required for technological intelligibility, while the people who design software use GUIs as minimally as possible, and for the vast majority of software the user interface is the weakest point.

    The obvious questions then, is: why don't we trust non-technical users with command lines? Thoughts?