Data, information, in the digital context is really important. Perhaps
the most important thing. It’s a shame then, I think, that we’re, on
the whole, so bad at managing data and organizing information so that
it is useful to us in the future. I keep starting to write posts on the
topic with clever lead-ins, and within a hundred words I realize that
I’ve bitten off more than I can chew. So I’ll spare the introduction
and get on with the story.
A couple of weeks ago, I copied all of my music off of my backups (from
iTunes and my days as a mac user), and onto my Linux machine. I’d never
really looked at the files in years, becuase of course, iTunes abstracts
all the files away, so when you play “digital music,” you just play
“tracks” rather than having to interact with the realities of the
files themselves. This is incredibly user friendly, and I think there’s
something in the iTunes model that is pretty useful. That is, creating
user interfaces that let users interact with intelligibly bounded data
units rather than with file units makes a lot of sense.
Having said that, what ends up happening is that the abstraction of the
data often means that we’re less in touch with what’s being stored,
and we rely on (often proprietary) tools to keep track of the meta-data
associated with our libraries.
As I was going through my Music Library, which I’m using with
mocp and Rhythm box (minimally, for syncing,
eventually.) I realized that my music was organized in an incredibly ass
backwards way. Many “artists” had a number of folders given various
alternate spellings of their names (with and with out “the” or with
various ampersand forms), which is a trifle frustrating. And as I was
looking over the files I realized that there were things that I
thought I had deleted, but in fact hung around in my directory (this
is a specific flaw with the “are you sure” dialogue in iTunes, but
it’s still an issue).
I’m not done, but I know that the next step: going through the files by
hand will mean that my music files will be much more well organized.
Problems like this arise, largely, when we just rely on the computer to
organize the files itself without input from us. While I like the
“iTunes” way of accessing my music, I expect that my collection of
music files is the kind of thing that I’m going to have around for the
rest of my computer-using/music-listening life, and after only 5 years
my iTunes has stooped being a part of my life. For sure.
I guess the lesson from this is, interfaces for accessing your files
aren’t always the best for organizing the files, and don’t entrust
your organizing responsibility to a script.
Another story: PDF files.
When I’m doing research stuff, I have this way of collecting PDF files
of articles. When I was in school I would make a folder for each class I
took and then throw PDFs into one folder, title them productively
(author[s] - title.pdf
). This worked until I wanted to start reusing
material, or drawing connections between various projects/class. And
then--being a geek--I had projects that weren’t quite class
related, where did they go? Never mind the fact that the file names were
absurdly long.
So I switched to a new system where I keep a BibTeX database of all my
files and name PDFs with their cite keys (which are: authorlastYEAR.pdf;
if there are more than one paper by an author in a year I append
alphabetical characters (eg. a, b, c) to the end in the order that they
come into the database. If there are more than one author I take the
first author/PI).
It took a few weeks of sporadic work get the files into shape, but the
end result of that transformation is the fact that my PDFs are
incredibly useful to me, and I never have to look very hard for any
piece of data.
The lesson is to use your data no matter what the system is and make
sure it’s still working, and then, when needed don’t be afraid to
change strategies. On this level, organization really ought to be
empirical.
In light of these two experiences I have come to the conclusion that
it’s important to really get your hands dirty in the files. While the
abstractions are nice, they allow us to be complacent. Touching your
data, looking at the files, and deploying a system that is simple and
both useful in the present and relevant looking forward is incredibly
important. The particulars beyond that are more vague, still but we’ll
get there in future posts.
Thanks for reading.