We'll Always Have Debian

I know I just wrote a long piece about Arch Linux and for most things I've pretty much switched to Arch Linux as my primary, day to day, distribution. In fact, when an Arch Linux issue comes up at work my coworkers call me first. And I suppose it's well earned. But if you were to ask me what my favorite Linux distribution project was, I'd probably say Debian as often as not.

I run a lot of Debian, straight up, unmolested (mostly) Debian Stable. There are a lot of practical reasons for this: it's stable, I have faith that it's going to work and do what I need it to. Aside from keeping on top of normal security issues, the system is stable and doesn't require attention to keep up to date. And, in nearly every case the package manager can be trusted to do the right thing. There are also a ton of little niceties in the distribution: debconf, the management tools for Apache, and the shear diversity of the packages. It all adds up.

I mean, I have gripes with some things that Debian does, but they're always little. I find myself asking "Why didn't you enable mod_rewrite by default? Really?" or "Would it have killed you to include software that was less than 3 years old in this release?" But never "Why is this broken by default?"

With projects like Ubuntu getting press, attention, and energy (and money!) I can't help but think that the outsider might think of Debian as being a bit... put upon? Or not good enough in it's pure form? The Ubuntu folks are pretty good about talking about their Debian roots, and it's totally clear to anyone who really takes a good look at Ubuntu that most of its awesomeness is due to being Debian-derived. Even if that isn't terribly clear from the outside.

I also really enjoy the ways in which Debian has managed to grow and sustain itself, and create something that is so magnificent in scope. The Linux Kernel project is huge, the desktop projects are massive in terms of what they carry under their umbrellas. Distribution projects that start from nothing and control and build the entire stack are, I think, particularly intense because of the shear size of the project.

This of course holds true for all distribution projects, and doesn't make Debian particularly special, I suppose. The thing is that Debian's coverage is massive compared to other tools. Arch provides a great framework for an operating system, and makes it really easy to do a number of things, but there are nowhere near as many packages nor as many contributors. Ubuntu is, by contrast a great project but is mostly a process of "tuning" Debian into a system that's more targeted for specific applications. Again these aren't criticisms, but it does make Debian more of an impressive proposition.

And I guess, because of this, even though most of the time when I interact with a Linux system it isn't actually Debian, I almost automatically categorize myself as a "Debian person."

Shrug.

technology as infrastructure, act three

Continued from, Technology as Infrastructure, Act Two.

Act Three

All my discussions of "technology as infrastructure" thus far have been fairly high level. Discussions of particular business strategies of major players (eg. google and amazon), discussions approaches to "the cloud," and so forth. As is my way, however, I've noticed that the obvious missing piece of this puzzle is how users--like you and me--are going to use the cloud. How thinking about technology as infrastructure changes the way we interact with our technology, and other related issues.

One of my introductory interludes was a new use-case that I've developed for myself: I run my chat clients on a server, and then using GNU screen which is an incredibly powerful, clever, and impossible to describe application. I've written about it before, but lets just describe it's functionality as such:

Screen allows users to begin a persistent (terminal/shell/console) session on one computer, and then "detatch" and continue that session on another machine where the session runs--virtually--indistinguishable from "native sessions."

So my chat programs are running on a server "inside of" a screen session and when I want to talk to someone, I trigger something on my local machine that connects to that screen session, and a second later, the program is up and running just as I left it.

Screen can of course, be used locally (and I do use it in this mode every waking moment of my day) but there's something fundamentally different about how this specific use case affects the way I think about my connection.

This is just one, and one very geeky, example of what infrastructural computing--the cloud--is all about. We (I) can talk till we're (I'm) blue in the face, but I think the interesting questions arise not from thinking about how the infrastructure and the software will develop, but rather from thinking about what this means to people on the ground.

At a(n apparently) crucial moment in the development of "the cloud" my personal technological consumption went from "quirky but popular and mainstream" to fiercely independent, hackerish, and free-software-based. As a result, my examples in this area may not be concretely helpful in figuring out the path of things to come.

I guess the best I can do, at the moment is to pose a series of questions, and we'll discuss the answers, if they seem apparent in comments:

  • Does "the cloud" provide more--on any meaningful way--than a backup service? It seems like the key functionality that cloud services provide is hosting for things like email and documents, that is more reliable than saving and managing backups for the ordinary consumer>
  • Is there functionality in standards and conventions that are underutilized in desktop computing that infrastructural approaches could take advantage without building proprietary layers on-top of java-script and HTTP?
  • Is it more effective to teach casual user advanced computing techniques (ie. using SSH) or to develop solutions that make advanced infrastructural computing easier for casual users (ie. front ends for git, more effective remote-desktop services).
  • Is it more effective for connections to "the cloud" to be baked into current applications (more or less the current approach) or to bake connections to the cloud into the operating system (eg. mounting infrastructural resources as file systems)
  • Is the browser indeed the prevailing modality, or simply the most convenient tool for network interaction.
  • Do we have enough conceptual experience with using technology to collaborate (eg. wikis, source control systems like git, email) to be able to leverage the potential of the cloud, in ways that reduce total workloads rather than increase said workloads?
  • Does infrastructural computing grow out of the problem of limited computing power (we might call this "vertical complexity") or a management problem of computing resources in multiple contexts (eg. work, home, laptop, desktop, cellphone; we might call this "horizontal complexity") And does this affect the kind of solutions that we are able to think about and use?

Perhaps the last question isn't quite user-centric, but I think it leads to a lot of interesting solutions about possible technologies. In a lot of ways the most useful "cloud" tool that I use, is Google's Blackberry sync tool which keeps my calendar and address book synced (perfectly! so much that I don't even notice) between my computer, the phone, and the web. Git, for me solves the horizontal problem. I'm not sure that there are many "vertical problems," other than search and data filtering, but it's going to be interesting to think about.

In any case, I look forward to discussing the answers and implications of these issues with you all, so if you're feeling shy, don't, and leave a comment.

Cheers!

technology as infrastructure, act two

Continued from, Technology as Infrastructure, Act One.

Act Two

Cnet's Matt Assay covering this post by RedMonk's Stephen O'Grady suggests that an "open source cloud" is unlikely because superstructure (hardware/concrete power) matters more than infrastructure (software)--though in IT "infrastructure" means something different, so go read Stephen's article.

It's my understanding that, in a manner of speaking, open source has already "won" this game. Though google's code is proprietary, it runs on a Linux/java-script/python platform. Amazon's "cloud" (EC2) runs on Xen (the open source virtualization platform) and nearly all of the operating system choices are linux based. (Solaris and Windows, are options).

I guess the question of "what cloud" would seem trite at this point, but I think clarifying "which cloud" is crucial at this point, particularly with regards to openness. There seem to be several:

  • Cloud infrastructure. Web servers, hosting, email servers. Traditionally these are things an institution ran their own servers for, these days that same institution might run their servers on some sort of virtualized hardware for which there are many providers.

    How open? Open. There are certainly proprietary virtualization tools (VMware, windows-whatever, etc.), and you can vitalize windows, and I suppose HP-UX and AIX are getting virtualized as well. But Linux-based operating systems are likely virtualized at astonishing rates compared to non-open source OSes. And much of the server infrastructure (sendmail, postfix/exim, Apache, etc.) is open source at some point.

    In point of fact, this cloud is more or less the way it's always been and is, I'd argue, open-source's "home turf."

  • Cloud applications: consumer. This would be stuff like Gmail, flickr, wikipedia, twitter, facebok, ubuntuONE, googe docs, google wave, and other "application services" targeted at non-commercial/enterprise consumers and very small groups of people. This cloud consists of entirely software, provided as services and is largely dominated by google, and other big players (Microsoft, yahoo, etc.)

    How open? Not very. This space looks very much like the desktop computing world looked in the mid-90s. Very proprietary, very closed, the alternatives are pretty primitive, and have a hard time doing anything but throwing rocks at the feet of the giant (google.)

  • Cloud applications: enterprise. This would be things like SalesForce (a software-as-a-service CRM tool.) and other SaaS application. I suppose google-apps-for-domains falls under this category, as does pretty much anything that uses the term SaaS.

    How open? Not very. SaaS is basically Proprietary Software: The Next Generation as the business model is based on the exclusivity of rights over the source code. At the same time, in most sectors there are viable open source projects that are competing with the proprietary options: SugarCRM, Horde, Squirrel Mail, etc.

  • Cloud services: enterprise. This is what act one covered or eluded to, but generally this covers things like PBX systems, all the stuff that runs corporate intranets, groupware applications (some of which are open source), collaboration tools, internal issue tracking systems, shared storage systems.

    How open? Reasonably open. Certainly there's a lot of variance here, but for the most part, but Asterisk for PBX-stuff, there are a number of open source groupware applications. Jira/perforce/bitkeeper aren't open source, but Trac/SVN/git are. The samba project kills in this area and is a drop in replacement for Microsoft's file-sharing systems.

The relationship, between open source and "the cloud," thus, depends a lot on what you're talking about. I guess this means there needs to be an "act three," to cover specific user strategies. Because, regardless of which cloud you use, your freedom has more to do with practice than it does with some inherent capability of the software stack.

technology as infrastructure, act one

Act One

This post is inspired by three converging observations:

1. Matt posted a comment to a previous post: that read:

"Cloud" computing. Seriously. Do we really want to give up that much control over our computing? In the dystopian future celebrated by many tech bloggers, computers will be locked down appliances, and we will rely on big companies to deliver services to us.

2. A number of podcasts that I listened to while I drove to New Jersey produced/hosted/etc. by Michael Cote for RedMonk that discussed current events and trends in "Enterprise-grade Information Technology," which is a world, that I'm only beginning to scratch the surface of.

3. Because my Internet connection at home is somewhat spotty, and because it makes sense have an always on (and mobile) connection to IRC for work, I've started running my chat clients (mcabber and irssi) inside of a gnu screen session on my server.


My specific responses:

1. Matt's right, from a certain perspective. There's a lot of buzz-word-heavy, venture capital driven, consumer targeted "cloud computing tools" which seem to be all about getting people to use web-based "applications," and give up autonomy in exchange for data that may be more available to us because it's stored on someones network.

Really, however, I think this isn't so much a problem with "networked computing," as it is with both existing business models for information technology, and an example of the worst kind of cloud computing. And I'm using Matt's statement as a bit of a straw man, as a lot of the things that I'm including under the general heading of "cloud computing," aren't really what Matt's talking about above.

At the same time I think there is the cloud that Matt refers to: the Google/Microsoft/Startup/Ubuntu One/etc. cloud, and then there's all the rest of distributed/networked/infrastructural computing which isn't new or sexy, but I think is really the same as the rest of the cloud.

2. The "enterprise" world thinks about computers in a much different way than I ever do. Sometimes this is frustrating: the tendrils of proprietary software are strongest here, and enterprise folks care way too much about Java. In other aspects it's really fascinating, because technology becomes an infrastructural resource, rather than a concrete tool which accomplishes a specific task.

Enterprise hardware and software exists to provide large corporate institutions the tools to manage large amounts of data/projects/data/communications/etc.

This is, I think on some level, the real cloud. This "technology-as-infrastructure" thing.

3. In an elaboration of the above, I outsourced a chunk of my computing to "the cloud." I could run those applications locally, and I haven't given up that possibility, but one needs a network connection to use a chat client, so the realm of possibilities where I would want to connect to a chat server, but wouldn't be able to connect to my server, is next to impossible (particularly because some of the chat servers run on my hardware.).


I guess the point I'm driving at is: maybe this "cloud thing" isn't about functionality, or websites, or software, or business models, but rather about the evolution of our computing needs from providing a set of tools and localized resources to providing infrastructure.

And that the shift isn't so much about the technology: in point of fact running a terminal application in a screen session over SSH isn't a cutting edge technology by any means, but rather about how we use the technology to support what it is we do.

Or am I totally off my rocker here?

on package management

I was writing my post on distribution habits and change, and I realized that I some elaboration on the concept of package management was probably in order. This is that elaboration.

Most linux--and indeed UNIX, at this point--systems have some kind of package management:

Rather than provide an operating as one-monolithic and unchanging set of files, distributions with package management provide systems with some sort of database, and common binary file format that allows users to install (and install) all software in a clear/standardized/common manner. All software in a Linux system (generally) is thus, covered by these package managers, which also do things like tracking the way that some packages depend on other packages, and making sure that the latest versions of a package are installed.

The issue, is that there are lots of different ways to address the above "problem space," and a lot of different goals that operating system designers have when designing package management and selecting packages. For instance: how do we integrate programs into the rest of our system? Should we err on the side of the cutting edge, or err on the side of stability? Do we edit software to tailor it to our system/users or provide more faithful copies of "upstream sources"? These are all questions that operating system/distribution/package managers must address in some way, and figuring out how a giving Linux distribution deals with this is, I think, key to figuring out which system is the best for you, though to be fair, it's an incredibly hard set of questions to answer.

The thing about package management, is that whatever ideologies you choose with regards to what tools you use, what packages to include and how to maintain packages, the following is true: all software should be managed by the package management tools without exception. Otherwise, it becomes frighteningly easy for new versions of software to "break" old non-managed versions of a piece of software with overlapping file names, by overwriting or deleting old files, by loading one version of a program when you mean to load another version, by making it nearly impossible to remove all remnants of an old piece of software, and so forth, or just by making it hard to know when a piece of software needs to be updated to a new version for security fixes or some such.

I hope that helps.

why arch linux rocks

So, long story short, I've been working a lot with ArchLinux in the last few days, getting it setup, and starting to use this peculiar little distribution. While I will surely be blogging more about Arch in the coming days, I think a brief list of first impressions are in order.

  1. I share values with the Arch Developers.

    This is, I think, a factor of "choosing a Linux (or BSD) distribution" that is really hard to understand or explain. In part because the values that distinguish distributions are hard sometimes hard to suss out, particularly if you're on the outside looking in. This explains the phenomena of "distro hopping"

    My sense of the "Arch" philosophy/approach is largerly what this post is about, but in summary: arch is lightweight and minimal, Arch expects users to be sophisticate and intelligent (Arch would rather tell you how something works, so you can do it "right," than try and save you from yourself and do it in a way that might be wrong.) Arch is a community project, and isn't reliant on commercial interests, and arch is firmly dedicated to free software ideas.

    How does this compare to other distributions you've heard of. Arch is community oriented/originated like slackware and Debian; Arch is lightweight like debian-netinst and Gentoo; Arch is minimal like Damn Small Linux (though not quite that minimal) and the other tiny-Linuxes; Arch is based on binary packages like Debian and Fedora/RedHat/CentOS; Arch uses linux, but takes inspiration from the BSDs in terms of system architecture; Arch uses a rolling release cycle like Debian testing branch and Gentoo.

  2. It doesn't dumb anything down, and doesn't expect users to be either experts *or* total beginners.

    I think the term they use is "Intermediate" or "Advanced Beginner" but in any case, but in any case I think the approach is good. Provide configuration in it's most basic and straightforward form, and rather than try to make the system easier to configure, document and hope that straightforward configuration setup will be easier to manage in the long run than a more convoluted, but "easy" set up.

    Basically Arch expects and assumes that complexity and difficulty are the same, and opposed and that simplicity and ease of use are similarly connected.

  3. Arch values and promotes minimalism.

    This comes from a few different aspects of Arch but in general, the avoidance of complexity in the configuration, and the "blank slate" aspect of the installation process combine to create a system that is minimal and that is almost entirely agnostic with regards to what you might want to do with the system.

    Where as many linux-based systems are designed for specific tasks (eg. mythbuntu; medibuntu; linux mint; crunch linux, etc.) and include software by default that supports this goal. Arch in contrast, install no (or very little) software by default, and can function well for a wide range of potential uses, from the fully featured desktop to the minimalistic headless server install.

  4. The Arch Wiki Rocks.

    I've been thinking about wikis and what makes a wiki "work" rather than "not work," and I'm beginning to think that the ArchLinux Wiki is another example of a wiki that works.

    I used to think that wikis powered by the MediaWiki engine were always bad: they look too much like wikipedia (and are reasonably hard to customize) and as a result people tend to treat them like wikipedia which caries all sorts baggage from the tradition 19th century encyclopedic projects and colonialism, and fails to capture some of the brilliance and effectiveness of wikis outside of the world of wikipedia (and the MediaWiki engine by association.)

    So despite this, the ArchLinux wiki is actually really good and provides helpful instructions for nearly everything to do with Arch. It looks good, and the more I read it all of the cool discursive/opinion-based modality that I enjoy the most about wikis is present on the Arch Wiki.

  5. Archies are really geeky and great, and their interests and tendencies are reflected in the packages provided by the system:

Allow me to justify this with a few anecdotes:

  • Arch includes a "snapshot package" from emacs-23 in the main repository (you have to add another debian repository to get this in debian).
  • There is a great cross over between Awesome--my window manager of nchoice--and Arch, so there are good up to date packages of Awesome.
  • Uzbl, (eg. useable) a super minimalistic, web-kit based browser is developed on/for Arch.
  • As I was getting my first virtual machine setup, I did a bit of distro hopping to see what would work best. I decided to use virtualbox (because it's nearly free software, and reasonably full featured) and I had a hell of a time getting other OSs to work right inside of the virtual box, but it appears that other Archies have had the same thought, and there were pretty good explanations on the wiki and it just worked.

How cool is that? I don't think arch is for everyone, but, if any of what I've talked about today sounds interesting/appealing, give it a shot. Also, my experiences with running it under Virtual Box have been generally favorable, so if that's more your speed, give it a shot.

Onward and Upward!

distribution habits and change

Here's another one for the "new workstation series."

Until now my linux-usage has been very Debian based. It's good, it's stable, and the package management system is really intensely wonderful. I was happy. And then I was somewhat less happy. Ubuntu--the distribution that I'd been using on my desktop.

Don't get me wrong, Ubuntu is great and I'd gladly recommend it to other people, but... with time, I've found that the system feels clunky. This is hard to describe, but a growing portion of the software I run isn't included in the normal Ubuntu repositories, some services are hard to turn off/manage correctly (the display manager, various superfluous gnome-parts,) and I've had some ubuntu-related kernel instability.

My server, of course, runs Debian Lenny without incident. There's something really beautiful about that whole stability thing that Debian does. I considered running Debian (lenny/testing) distributions on my desktops because of the wonderful experience I had Lenny, and so I tried running various flavors of Debian and I either found that for the software I wanted to run things were either too stale or too unstable. This is totally to be expected, as Debian's singular goal is stability, and getting a fresher/less stable operating system that's based on Debian is always going to be a difficult proposition.

In the past I've dragged my feet with regards to upgrading operating systems because I take a "don't fix it if it ain't broke," approach to maintaining computers, and all my systems worked so, so until I was faced with this work computer--despite my dissatisfaction--I'd never really seriously considered the mechanics of changing distributions, much less the prospect of having to interact with a linux distribution without the pleasures and joys of apt-get.

But then I got this new work station and...

I switched. At least for one system. So far. ArchLinux uses a system called "pacman" for package management. Pacman is really nifty, but it's different from apt: it's output is a bit clearer, it's just as fast and "smart," and the packages are fresh.

And then there's the whole business of the way Arch approaches management; Arch uses a 'rolling release system" where rather than release a version of an operating system, that has a given set of packages at a given moment, Arch packages are released when they're "ready" on a package-by-package basis (with an awareness toward interactions between packages,) and pacman has a system for easily installing software that isn't in the main repositories as packages. (Which makes upgrading and removal of said packages later much easier.)

This sounds complex, maybe too complex, but some how, it's not. When I started thinking about writing this post, I thought, "how do I convey how totally strange and different this is from the Debian way." By the time I got around to actually writing this post, I've settled into a place of stability and I must confess that I don't notice it very much. It's wild, and it just works.

I was going to go on this whole schpeal about how even though the functional differences between one "flavor" of GNU/Linux and another are pretty minimal, it's interesting to see how different the systems can "feel," in practice. Here's a brief list of what I've noticed:

  • I find the fact that the flags that control operations with pacman to be non-intuitive, and frankly I'm a bit annoyed that they're case sensitive so that: pacman -S is different from pacman -s which leads to a lot of typos.

  • I've yet to set up a machine in Arch that uses wireless. I'm just wary, mostly, of having to set up the network stuff "by hand" in Arch, given how finicky these things can be in general.

  • The ABS (arch build system, for installing packages that aren't in the main arch repositories,) took some getting used to, I think this is more about learning how to use a new program/tool and the fact that the commands are a bit weird.

    Having said that, I really like the way the package building scripts just work and pull from upstream sources, and even, say, use git to download the source.

  • I'm impressed with how complete the system is. Debian advertises a huge number of packages and prides itself on its completeness (and it is, I'm not quibbling,) but I run into programs where I have a hard time getting the right version, or the software plain old isn't in the repository. I've yet to find something that isn't in the repository. (Well, getting a version of mutt with the sidebar patch was a bit rough, and I haven't installed the urlview package yet, but that's minor.)

    I think this is roughly analogous to the discussions that python/ruby people have with perl people, about actual vs. the advertised worth of the CPAN. (eg. CPAN is great and has a lot of stuff, but it suffers from being unedited and its huge number of modules is more an artifact of time rather than actual value) So that, while Debian (and CPAN) have more "stuff" than their competitors, in many cases the competitors can still succeed with less "stuff," because their "stuff" has value because it's more edited and they can choose the 80% of stuff that satisfies 94% of need, rather than having 90% of the stuff that satisfies 98% of need. Diminishing returns and all that.

    It's lightweight and smooth. While the hardware of my work computer is indeed impressive, particularly by my own standards, the hardware of the "virtual machine" isn't particularly impressive. And it's still incredibly peppy. Lightweight for the win.

Next up? More reasons Arch Rocks.

See you later!

Multiple Computers and Singular Systems

Here's another episode in my "work workstation" series of posts about setting up my new computer for work, and related thoughts on evolving computer setups. First, some history:

My tendency and leading desire is to make my working environment as consistent as possible. For a long time I was a one-laptop, kind of guy. I had a PowerBook, it did everything just the way I wanted it to, and when ever I needed to do something digitally, I had my computer with me, and I didn't have to worry about other people's computers being configured wrong. It meant that I worked better/smarter/more effectively, and I was happy.

When the PowerBook died, particularly as "my work" became intertwined with my computer, it became clear that I needed a bit more: a computer I could stretch out on, both in terms of things like media (music/video/etc) and in terms of screen space. Concurrently, I also discovered/became addicted to the Awesome window manager, and this has been a great thing for how I use computers, but the end result of this transition was that I had to manage (and needed to use) a couple of machines on a fairly regular basis.

Basically I have a set of applications and tools that all of my systems have installed on them, either their configurations are all standard or I store a copy of the configuration file in a git repository that I link all of the machines to. My work is all stored in git repositories that I sync between machines as needed. It works pretty well, and it means that aside from hardware constraints its not so much that I have multiple machines, as it is that I have different instances of the same machine.

Next: the implications...

I think above all, I'm a Unix guy. UNIX is a modular system that I would describe as being based on a certain kind of modularity, I've also worked out practices for myself that allow me to keep my application configurations synced between machines. Most of the time configurations don't change, but sometimes they do, and when that happens all I have to do is sync up a git repository.

The second implication is that I set up and work with my systems with some notion of stability. While I must confess that I'm not entirely pleased with the way ubuntu has my system desktop and laptop running, it is stable and reliable, and I'm wary of changing things around for a setup that would be functionally more or less the same, but a bit more parsimonious on the back end. I maybe be a huge geek and a hacker type, but I'm a writer and reader first, and although while I'm blathering on about my setup it might seem like all I do is tweak my systems, the writing and reading are really more "my thing."