Documentation isn't Content

See "Why The World Is Ready For Dexy" and "Dexy and Literate Documentation" as well as the technical writing section section of the tychoish wiki for some background to this post.

Let's establish some basics. Content, as I think of it, in the context of new/web/digital media, is all of the stuff we read and right on the web. Documentation are those texts which supports the use and creation of technical tools, and explains technical concepts. While obviously read literally, documentation is content, but I think the way we've come to understand other kinds of digital content provides an incomplete basis for understanding how technical texts are published and consumed on line.

Consider the following assumptions that we can make about most forms of content on the web:

  • The basic unit of content is pretty short. 1000-1500 tops is the upper boundary for most blog posts and articles, and while some kinds of content can sneak by with slightly longer units--particularly in well structured contexts--these are exceptions.
  • Most content on the web is time-sensitive. While everything gets archived, the focus of publication is often on volume, which increases the chance of producing something that "goes viral" and gets a lot of attention. All other things being equal, the most successful on-line publishers are the ones with the shortest publication processes and the most regular publication cadences. After a short period of time, what's in the site archives is probably largely irrelevant.
  • Given the way that some content competes with other content, success is often determined by specialization, and the tightness of focus. It's easier to be the loudest voice in a very small room than it is to be the loudest voice in a large room or a lot of small rooms. Content, thus, needs to be very focused and address very niche interests.

in contrast:

  • Documentation texts tend to be pretty long, and while there are some "quick reference"-scoped texts, and some very complete texts that are quite long, on average documentation is substantially longer than "content." This means it needs to be produced differently, and we can expect different usage patterns.

  • In general people don't read documentation. This isn't just that people don't "rtfm," but that if generally people's interaction with a piece of documentation begins with a specific question or problem. They don't say "oh, that manual for $xvz product looks interesting, i'll read it," and i think the "i should read the documentation for $uvw before i begin doing $task," is much less common than we'd like to think.

    People read documentation very tactically. So it's important that documentation exist and be complete, but we should have no illusions that people read any documentation from beginning to end as "clean slates."

  • Documentation is always already very tightly focused, unlike content. While some technical publishers may publish "second hand documentation," and thus be able to focus on documenting different aspects of the user experience, most documentation producers must aim to cover as much as possible, and allow users to find and take advantage of whatever information that is most useful for them.

As a result, it's absolutely crucial that we don't think of and produce documentation as being crucial. I think treating documentation as something that needs to be compiled. is probably the first step in "doing right" by documentation. Build tools like dexy, similarly, are great because they let writers and developers produce documentation in ways that make sense.

If you write or produce documentation--and better content as well--I'm interested in hear what you think about these issues. Onward and Upward!

Dexy and Literate Documentation

See "Why The World Is Ready For Dexy" for the lead into this post. The short version: most tools for building documentation are substandard, and most attempts at "fixing documentation processes" are flawed. But there's this new project called "Dexy" that is doing something that is pretty exciting.

Basically, Dexy is a text filtering framework, you write documentation, code samples, and code, and then you tell Dexy how to stitch everything together, and bingo. It's success, or potential success, is built around its simplicity and flexibility.

This model Dexy proposes something called "Literate Documentation," which is a cool concept, which expands upon the notion of "literate programming," both concepts require a little bit of unpacking.

Literate programing is the idea that code, documentation, and all specifications should be contained in one file, with blocks of machine readable code and human readable text should be interleaved with each other. Literate programming tools, then take this "mega source" and build programs that do cool things. There are a number of literate programming tools, and some notable programs that are written in this manner, but it's not particularly popular: code and text tend to flow in different ways, and a manageable literate programming text, is often not particularly maintainable software.

Literate documentation, on the other hand, as implemented by Dexy is documentation where the documentation is compiled from an amalgamation of text and code which can be run and tested at build time. You write code snippets and documentation snippets, and a tool like Dexy takes all of it, runs the code and stitches together a document out of all the pieces. Then, anytime you need to make a change to the code, or the text, you rerun Dexy and the documentation mostly tests itself. Good deal.

I think it's not yet obvious if Literate Documentation will actually be a "thing." It's a great idea but, like literate programming, it's unclear of how this kind of practice will actually catch on, and how useful/feasible writing documentation will be in this manner. "Dexy the method" may or may not find greater acceptance, because "Literate Documentation" may depend on developers writing documentation. At the same time I think that "Dexy the tool," is certainly a valuable contribution to the field of technical writing. Nevertheless, I think there are some important things about the way Dexy works that are worth extrapolating.

(Links to `tychoish wiki <http://tychoish.com/readers-guide/>`_ pages concerning `technical writing <http://tychoish.com/technical-writing/>`_ in some state of existence.)

  • Atomic Documentation. Dexy reinforces the idea that documentation should be written in very small units that are self sufficient, and address very small and specific topics, questions, and features. The system which builds and displays documentation should then be able to either usefully present these "atomic" units or stitch more complete documentation together from these units. This makes documentation easier to maintain, and arguably makes documentation more valuable for users.
  • Compiled Documentation. The "end-product" Documentation should be statically compiled, unlike (most) web-based content that is dynamically generated. This allows writers and teams to verify the quality of the text prior to publication, and allows the "build system" to automate various quality control tests. Documentation is particularly suited to this kind of display generation because it changes very irregularly (no more than a few times a day, and often much much less often.)
  • Pipes and Filters. the process of publication can--like code--is basically passing text (and examples) through various levels of processing until the arriving at a "final product." Dexy is very explicit about this and provides writers/developers a framework to manage a complex filtering process in a sane manner.

I look forward to thinking about these aspects of documentation and documentation systems, and about how writing texts with Dexy, or in the "Literate Documentation," mode affects the writing process and the shape that texts take. I look forward to hearing your thoughts in the comments or on the wiki pages!

Why The World is Ready for Dexy

At one time or another, I suspect that most programmers and technical writers have attempted to "fix" technical writing in one way or another. It's a big problem space:

  • Everything, or at least many things, need to be documented, because undocumented features and behaviors cause problems that *one really ought not need to review the source code and understand the engineering to fix (potentially) trivial problems, every time the occur.
  • The people who write code are both not suited to the task of writing documentation because writing code and writing documentation are in fact different skills. Also, I think the division of labor makes some sense here.
  • Documentation, like code, requires maintenance, review, and ongoing quality control, as the technology and practice change. That's a lot of work and particularly for large projects, that can be a rather intensive task.
  • Lots of different kinds of documentation are needed, and depending on the specific needs of the user, a basic "unit of documentation," may need to be presented in a number of different ways. There are a number of ways to implement these various versions and iterations, but they all come with various levels of complexity and maintenance requirements.

The obvious thing to do, if you're a programmer, is to write some system that "solves technical writing." This can take the form of a tool for programmers that encourages them to write the documentation as the write the code, or it can take the form of a tool that enforces a great deal of structure for technical writing, to make it "easier" for writers and programmers to get good documentation. Basically "code your way out" of needing technical writers.

You can probably guess how I feel about this kind of approach.

There is definitely a space for tooling that can make the work of technical writing easier, as well as space for tools that make the presentation of documentation clearer and more valuable for users. Tools won't be able to make developers to write, at least not without a serious productivity hit, nor will tools decrease the need for useful documentation.

It's a difficult problem domain. While there is a lot of room for building programs that make it easier to write better documentation, the problem is that the temptation to write too much software is great. Often the problems in the technical writing process, including high barriers to entry, complicated build/publication systems, and difficult to master organizational methods, which are easy to address in programs. Meanwhile, most of these issues can be traced to overly complex build tools and human-centered problems, which are harder to address in code.

And since documentation takes the form of simple text, which seems easy to deal with, developers frustrated by documentation requirements, or technical writing teams, are prone to trying to write something to fix the apparent problem.

Which brings us to the present, where, if you want to write and publish documentation, your choices are:

  • Use a wiki, which isn't documentation but the software generally does a good job of publishing content, and wiki engines mostly don't have arcane structures of their own that might get in the technical writer's way. Downside: it's the wrong tool for the job and it forces writers and editors to maintain style themselves across an entire corpus, which is difficult and eventually counterproductive.
  • Use some other existing content management system. Typically these aren't meant for documentation, they have difficult to use interfaces, because they're meant to power websites and blogs, and they almost impose some sort of structure (like a blog,) which isn't ideal for conveying documentation.
  • Use an XML-based documentation tool-set. This is probably the best option around, at the moment, as these tools were built for the purpose of creating documentation. The main problems are: they're not particularly well suited for generating content for the web (which I think is essential these days) and as near as I can tell they make humans edit XML by hand which I think is always a bad idea.
  • Build your own system from the ground up. Remember text is easy to munge and most of the other options are undesirable. Downside: homegrown projects take a lot of time, they're always a bit more complex than anyone (except the technical writers?) expect, and it's easy to almost finish and that's bad because half-baked documentation systems are most of what get us into this problem in the first place.

So it's a thorny problem and one that lots of people have (and are!) trying to solve. I've been watching a tool called dexy for the last few weeks (months?) and I've been very interested in it's development and the impact that it, and similar tools, might have on my day-to-day work. This post seems to be the first in a series of thoughts about the tools that support technical writing and documentation.

Wikis are not Documentation

It seems I'm writing a minor series on the current status (and possible future direction?) of technical writing and documentation efforts. Both in terms of establishing a foundation for my own professional relevancy, as well as in and for itself because I think documentation has the potential to shape the way that people are able to use technology. I started out with Technical Writing Appreciation and this post will address a few sore points regarding the use of wikis as a tool for constructing documentation.

At the broadest level, I think there's a persistent myth regarding the nature of the wiki and the creation of content in a wiki that persists apart from their potential use in documentation projects. Wiki's are easy to install and create. It is easy to say "I'm making a wiki, please contribute!" It is incredibly difficult to take a project idea and wiki software and turn that into a useful and vibrant community and resource. Perhaps these challenges arise from the fact that wiki's require intense stewardship and attention, and this job usually falls to a very dedicated leader or a small core of lead editors. Also, since authorship on wikis is diffuse and not often credited, getting this kind of leadership and therefore successfully starting communities around wiki projects can be very difficult.

All wikis are like this. At the same time, I think the specific needs of technical documentation makes these issues even more prevalent. This isn't to say that wiki software can't power documentation teams, but the "wiki process" as we might think of it, is particularly unsuited to documentation.

One thing that I think is a nearly universal truth of technical writing is that the crafting of texts is the smallest portion of the effort of making documentation. Gathering information, background and experience in a particular tool or technology is incredibly time consuming. Narrowing all this information down into something that is useful to someone is a considerable task. The wiki process is really great for the evolutionary process of creating a text, but it's not particularly conducive to facilitating the kind of process that documentation must go through.

Wikis basically "here's a simple editing interface without any unnecessary structure: go and edit, we don't care about the structure or organization, you can take care of that as a personal/social problem." Fundamentally, documentation requires an opposite approach, once a project is underway and some decisions have been made, organization isn't the kind of thing that you want to have to manually wrestle, and structure is very necessary. Wikis might be useful content generation and publication tools, but they are probably not suited to supporting the work flow of a documentation project.

What then?

I think the idea of a structured wiki, as presented by twiki has potential but I don't have a lot of experience with it. My day-job project uses an internally developed tool, and a lot of internal procedures to enforce certain conventions. I suspect there are publication, collaboration, and project management tools that are designed to solve this problem, but I'm not particularly familiar with anything specific. In any case, it's not a wiki.

Do you have thoughts? Have I missed something? I look forward to hearing from you in comments!

Creating Useful Archives

I've done a little tweaking to the archives for dialectical futurism recently, including creating a new archive for science fiction and writing and being who I am this has inspired a little of thought regarding the state and use of archives of blogs.

The latest iteration of this blog has avoided the now common practice of having large endless lists of posts organized by publication month or by haphazardly assigned category and tag systems. While these succeed at providing a complete archive of every post written, they don't add any real value to a blog or website. I'm convinced that one feature of successful blogs moving forward will be archives that are curated and convey additional value beyond the content of the site.

Perhaps blogs as containers for a number of posts will end up being to ephemeral than I'm inclined to think about them, and will therefore not require very much in the way of archives, Perhaps, Google's index will be sufficient for most people's uses. Maybe. I remain unconvinced.

Heretofore, I have made archives for tychoish as quasi-boutique pieces: collections of the best posts that address a given topic. This is great from the perspective of thinking about blog posts as a collection of essays, but I've started to think that this may be less less useful if we think of blogs as a collection of resources that people might want to have access to beyond it's initial ephemeral form.

Right now my archives say "see stuff from the past few months, and several choice topics on which I wrote vaguely connected sequences of posts." The problem with the list of posts from the last few months is that beyond date, there's not a lot of useful information beyond the title and the date. The problem with the topical archives is that they're not up to date, their not comprehensive even for recent posts, and there's little "preview" of a given post beyond it's title. In the end I think the possibility of visiting a topical archive looking for a specific post and not finding it is pretty large.

In addition to editorial collecting, I think archives, guides, or indexes of a given body of information ought to, provide some sort of comprehensive method for accessing information. There has to be some middle ground.

I think the solution involves a lot of hand mangling of content, templates, and posts. I'm fairly certain that my current publication system is probably not up for the task without a fair amount of mangling and beating. As much as I want to think that this is an problem in search of the right kind of automation, I'm not sure that's really the case. I'm not opposed to editing things by hand, but it would increase the amount of work in making any given post significantly.

There is, I suspect, no easy solution here.

Strategies for Organizing Wiki Content

I've been trying to figure out wikis for a long time. It always strikes me that the wiki is probably the first truly unique (and successful) textual form of the Internet age. And there's a lot to figure out. The technological innovation of the wiki is actually remarkably straightforward, [1] and while difficult the community building aspects of wikis are straightforward. [2] The piece of the wiki puzzle that I can't nail down in a pithy sentence or two is how to organize information effectively on a wiki.

That's not entirely true.

The issue, is I think that there are a number of different ways to organize content for a wiki, and no one organizational strategy seems to be absolutely perfect, and I've never been able to settle on a way of organizing wiki pages that I am truly happy with. The goals of a good wiki "information architecture" (if I may be so bold) are as follows:

  • Clarity: It should be immediately clear to the readers and writers of a wiki where a page should be located in the wiki. If there's hierarchy, it needs to fit your subject area perfectly and require minimal effort to grok. Because you want people to focus on the content rather than the organization, and we don't tend to focus on organizational systems when they're clear.
  • Simplicity: Wikis have a great number of internal links and can (and are) indexed manually as needed, so as the proprietor of a wiki you probably need to do a lot less "infrastructural work" than you think you need to. Less is probably more in this situation.
  • Intuitive: Flowing from the above, wikis ought to strive to be intuitive in their organization. Pages should answer questions that people have, and then provide additional information out from there. One shouldn't have to dig in a wiki for pages, if there are categories or some sort of hierarchy there pages there shouldn't be overlap at the tips of various trees.

Strategies that flow from this are:

  • In general, write content on a very small number of pages, and expand outward as you have content for those pages (by chopping up existing pages as it makes sense and using this content to spur the creation of new pages.
  • Use one style of links/hierarchy (wikish and ciwiki fail at this.) You don't want people to think: Should this be a camel case link? Should this be a regular one word link? Should this be a multiple word link with dash separated words or underscore separated words? One convention to rule them all.
  • Realize that separate hierarchies of content within a single wiki effectively create separate wikis and sites within a single wiki, and that depending on your software, it can be non-intuitive to link between different hierarchies.
  • As a result: use as little hierarchy and structure as possible. hierarchy creates possibilities where things can go wrong and where confusion can happen. At some point you'll probably need infrastructure to help make the navigation among pages more intuitive, but that point is always later than you think it's going to be.
  • Avoid reflexivity. This is probably generalizable to the entire Internet, but in general people aren't very interested in how things work and the way you're thinking about your content organization. They're visiting your wiki to learn something or share some information, not to think through the meta crap with you. Focus on that.
  • Have content on all pages, and have relatively few pages which only serve to point visitors at other pages. Your main index page is probably well suited as a traffic intersection without additional content, but in most cases you probably only need a very small number of these pass through pages. In general, make it so your wikis have content everywhere.

... and other helpful suggestions which I have yet to figure out. Any suggestions from wiki maintainers?

[1]There are a number of very simple and lightweight wiki engines, including some that run in only a few lines of Perl. Once we had the tools to build dynamic websites (CGI, circa 1993/1994), the wiki became a trivial implementation.
[2]The general Principal of building a successful community edited wiki is basically to pay attention to the community in the early stages. Your first few contributors are very important, and contributions have to be invited and nurtured, and communities don't just happen. In the context of wikis, in addition to supporting the first few contributors, the founders also need to construct a substantive seed of content.

Installing Mcabber .10-rc3 on Debian Lenny

mcabber is console based XMPP or Jabber client. It runs happily within a screen session, its lightweight, and it does all of the basic things that you want from an IM client without being annoying and distracting. For the first time since I started using this software a year or two ago, there's a major release that has some pretty exciting features. So I wanted to install it. Except, there aren't packages for it for Debian Lenny, and I have a standing policy that everything needs to be installed using package management tools so that things don't break down the line.

These instructions are written for Debian 5.0 (Lenny) systems. Your millage may vary for other systems, or other versions of Ubuntu. Begin by installing some dependencies:

apt-get install libncurses5-dev libncursesw5 libncursesw5-dev pkg-config libglib2.0-dev libloudmouth1-dev

The following optional dependencies provide additional features, and may already be installed on your system:

apt-get install libenchant-dev libaspell-dev libgpgme-dev libotr-dev

When the dependencies are installed, issue the following commands to download the latest release into the /opt/ directory, unarchive the tarball, and run the configure script to install mcabber into the /opt/mcabber/ folder so that it is easy to remove later if something stops working.

cd /opt/
wget http://mcabber.com/files/mcabber-0.10.0-rc3.tar.gz
tar -zxvf mcabber-0.10.0-rc3.tar.gz
./configure --prefix=/opt/mcabber

When that process finishes, run the following:

make
make install

Now copy the following /opt/mcabber-0.10-rc3/mcabberrc.example file into your home directory. If you don't already have mcabber configured, you can use the following command to copy the file to your home directory.

cp /opt/mcabber-0.10-rc3/mcabberrc.example ~/.mcabberrc

If you do have an existing mcaber setup, then use the following command to copy the example configuration file to a non-overlapping folder in your home directory

cp /opt/mcabber-0.10-rc3/mcabberrc.example ~/mcabber-config

Edit the ~/.mcabberrc or ~/mcabber-config as described in the config file. Then start mcabber with the following command, if your config file is located at ~/.mcabberrc:

/opt/mcabber/bin/mcabber

If you have your mcabber config located at ~/mcabber-config start mcabber with the following command:

/opt/mcabber/bin/mcabber -f ~/mcabber-config

And you're ready to go. Important things to note:

  1. If something gets, as we say in the biz "fuxed," simply "rm rf    /opt/mcabber/" and reinstall.
  2. Check mcabber for new releases and release candidates. These instructions should work well once there's a final release, at least for Debian Lenny. The release files are located here.
  3. Make sure to stay up to date with new releases to avoid bugs and potential security issues. If you come across bugs, report them to the developers there is also a MUC for the mcabber community here: xmpp:mcabber@conf.lilotux.net.
  4. If you have an additional dependency that I missed in this installation do be in touch and I'll get it added here.
  5. Debian Lenny ships with version 0.9.7 of mcabber. If you don't want to play with the new features and the magic in 0.10, then go for it. If you just want a regular client, install the stable mcabber with the "apt-get install mcabber" command and ignore the rest of this email.

Pragmatic Library Science

Before I got started down my current career path--that would be the information management/work flow/web strategy/technology and cultural analyst path--I worked in a library.

I suppose I should clarify somewhat as the image you have in your mind is almost certainly not accurate, both of what my library was like and of the kind of work I did.

I worked in a research library at the big local (private) university, and I worked not in the part of library where students went to get their books, but in the "overflow area" where the special collections, book preservation unit, and the catalogers all worked. What's more, the unit I worked with had an archival collection of film/media resources from a few documentary film makers/companies, so we didn't really have books either.

Nevertheless it was probably one of the most instructive experiences I've had. There are things about the way Archives work, particularly archives with difficult collections, that no one teaches you in those "how to use the library" and "welcome to library of congress/dewy decimal classification systems" lessons you get in grade school/college. The highlights?

  • Physical and Intellectual Organization While Archives keep track of, and organize all sorts of information about their collections, the organization of this material "on the shelf" doesn't always reflect this.

    Space is a huge issue in archives, and as long as you have a record or "where" things are, there's a lot of incentive to store things in the way that will take up the least amount of space physically. Store photographs, separately from oversized maps, separately from file boxes, separately from video cassettes, separately from CDs (and so forth.)

  • "Series" and intellectual cataloging - This took me a long time to get my head around, but Archivists have a really great way of taking a step back and looking at the largest possible whole, and then creating an ad-hoc organization and categorization of this whole, so as to describe in maximum detail, and make finding particular things easier. Letters from a specific time period. Pictures from another era.

  • An acceptance that perfection can't be had. Perhaps this is a symptom of working with a collection that had only been archived for several years, or working with a collection that had been established with one large gift, rather than as a depository for a working collection. In any case, our goal--it seemed--was to take what we had and make it better: more accessible, more clearly described, easier to process later, rather than to make the whole thing absolutely perfect. It's a good way to think about organizational project.

In fact, a lot of what I did was to take files that the film producers had on their computers and make them useful. I copied disks off of old media, I took copies of files and (in many cases, manually) converted them to use-able file formats, I created index of digital holdings. Stuff like that. No books were harmed or affected in these projects, and yet, I think I was able to make a productive contribution to the project as a whole.

The interesting thing, I think, is that when I'm looking through my own files, and helping other people figure out how to manage all the information--data, really--they have, I find that it all boils down to the same sorts of problems that I worked with in the library: How to balance "work-spaces" with storage spaces. How to separate intellectual and physical organizations. How to create usable catalogs and indices's of a collection. How to lay everything down so that you can, without "hunting around" for a piece of paper lay your hands on everything in your collection in a few moments, and ultimately how to do this without spending very much energy on "upkeep."

Does it make me a dork that I find this all incredibly interesting and exciting?