Critical Practice

2011-07-07 – tychoish

Being a critic is not simply looking for the points of failure, shortcomings, and breaking points in cultural artifacts (e.g. music, art, literature, software, technology, and so forth.) Criticism is a practice of comparison and rich analysis and a way of understanding cultural production. One might even call criticism a methodology, though “methodologizing” criticism does not give us anything particularly useful, nor does it make any practices or skills more concrete.

Criticism is really the only way that we can understand culture and cultural products. In short, criticism renders culture meaningful.

I wrote the above in response to this “On Being a Critic” post that a long time reader of this site wrote a while ago. Most of the differences between our approaches to criticism derives from technical versus non-technical understandings of critical practice. With that in mind, and in an effort to consolidate some thoughts about methodology, criticism, and theoretical practice, I’d like to provide two theses that define good critical practice, and provide some starting points for “getting it right:”

Criticism is comparative. If you analyze a single thing in isolation, this analysis is not criticism. By contrast, one of the best ways to make poor criticism more powerful is to include more information (data) to strengthen the comparison. Comparison should highlight or help explain the phenomena or objects you are critiquing, but should always serve agenda and goals of the criticism to avoid overloading readers with too much information.
Criticism ought to have its own agenda. It is impossible to avoid bias entirely, and from this impossibility springs criticism’s greatest strength: the power to productively examine and contribute to cultural discourses. While critical essays are perhaps the most identifiable form of criticism, there are others: novels, lectures, films, art, and perhaps even technology itself, can all be (and often are) critical practices in themselves.

Everything else is up for grabs.

Big Data Impact

2011-07-05 – tychoish

I’ve been milling over this post about big data in the IT world for quite a while. It basically says that given large (and growing) data sets, companies that didn’t previously need data researchers suddenly need people to help them use “big data.” Everyone company is a data company. In effect we have an ironic counter example to the effect of automation on the need for labor.

These would be “data managers” have their work cut out for them. Data has value, sure, but unlike software which has value as long as someone knows how to use it,¹ poorly utilized data is just as good as no data. Data researchers need to be able to talk to developers and help figure out what data is worth collecting and what data isn’t. Organizations need someone to determine what data has real value, if only to solve a storage-related problem. Perhaps more importantly data managers would need to guide data usage both technically (in terms of algorithms, and software design) and in terms of being able to understand the abilities and shortfalls of data sets.

There’s a long history of IT specialist positions: database developers, systems administrators, quality assurance engineers, release engineering, and software testing. Typically we divide this between developers and operations folks, but even the development/operations division is something of a misnomer. There are merits to generalism and specialization, but as projects grow, specialization makes sense and data may just be another specialty in a long tradition of software development and IT organization.

Speicailization also makes a lot of sense in the context of data, where having a lot of unusable data adds no value and can potentially subtract value from an organization.

A Step Back

There are two very fundamental points that I’ve left undefined: what “data” am I talking about and what kinds of skills differentiate “data specialists” from other kinds of technicians.

What are big data?

Big data sets are, to my mind, large collections of data, GIS/map based information, “crowd sourced” information, and data that is automatically collected through the course of normal internet activity. Big data is enabled by increasingly powerful databases and the ubiquity of the computing power, which lets developers process data on large scales. For examples: the aggregate data from foursquare and other similar services, comprehensive records of user activity within websites and applications, service monitoring data and records, audit trails of activity on shared file systems, transaction data from credit cards and customers, tracking data from marketing campaigns.

With so much activity online, it’s easier for software developers and users (which is basically everyone, directly or otherwise) to create and collect a really large collection of data regarding otherwise trivial events. Mobile devices and linkable accounts (OpenID, and other single sign-on systems) simplify this process. The thought and hope is all this data equals value and in many circumstances it does. Sometimes, it probably just makes things more complicated.

Data Specialists

Obviously every programmer is a kind of “data specialist” and the last seven or eight years of the Internet has done everything to make every programmer a data specialist. What the Internet hasn’t done is give programers a sense of basic human factors knowledge, or a background in fundamental quantitative psychology and sociology. Software development groups need people who know what kinds of questions data can and cannot answer regardless of what kind or how much data is present.

Data managers, thus would be one of those posistions that sits between/with technical staff and business staff, and perhaps I’m partial to work in this kind of space, because this is very much my Chance. But there’s a lot of work in bridging this divide, and a great deal of value to be realized in this space. And it’s not like there’s a shortage of really bright people who know a lot about data and social science who would be a great asset to pretty much any development team.

Big Data Beyond Software Development

The part of this post that I’ve been struggling over for a long time is the mirror of what I’ve been talking about thus far. In short, do recent advancements in data processing and storage (NoSQL, Map Reduce, etc.) that have primarily transpired amonst startups, technology incubators, and other “Industry” sources have the potential to help acdemic research? Are there examples of academics using data collected from the usage habits of websites to draw conclusions about media interaction, reading habits, cultural particpation/formation? If nothing else are sociologists keeping up with “new/big data” developents? And perhaps most importantly, does the prospect of being able to access and process large and expansive datasets have any affect on the way social scientists work? Hopefully someone who knows more about this than I do will offer answers!

Thankfully there are a number of conventions that make it pretty easy for software designers to be able to write programs that people can use without needing to write extensive documentation. ↩︎

The Structured and Unstructured Data Challenge

2011-07-05 – tychoish

The Debate

Computer programmers want data to be as structured as possible. If you don’t give users a lot of room to do unpredictable things, it’s easier to write software that does cool things. Users on the other hand, want (or think that they want) total control over data and the ability to do whatever they want.

The problem is they don’t. Most digital collateral, even the content stored in unstructured formats, is pretty structured. While people may want freedom, they don’t use it, and in many cases users go through a lot of effort to recreate structure within unstructured forms.

Definitions

Structured data are data that is stored and represented in a tabular form or as some sort of hierarchical tree that is easily parsed by computers. By contrast, unstructured data, are things like files that have data and where all of the content is organized manually in the file and written to durable storage manually.

The astute among you will recognize that there’s an intermediate category, where largely unstructured data is stored in a database. This happens a lot in content management systems, in mobile device applications, and in a lot of note taking and project management applications. There’s also a parallel semi-structured form, where people organize their writing, notes, content in a regular and structured manner even though the tools they’re using don’t require it. They’d probably argue that this was “best practice,” rather than “semi-structured” data, but it probably counts.

The Impact

The less structured content or data is the less computer programs are able to do with the data, and the more people have to work to make the data useful for them. So while we as users want freedom, that freedom doesn’t get us very far and we don’t really use it even when we have it. Relatedly, I think we could read the crux of the technological shift in Web 2.0 as a move toward more structured forms, and the “mash up” as the celebration of a new “structured data.”

The lines around “semi-structured” data are fuzzy. The trick is probably to figure out how to give people just enough freedom so that they don’t feel encumbered by the requirements of the form, but so much freedom that the software developers are unable to do really smart things behind the scene. That’s going to be difficult to figure out how to implement, and I think the general theme of this progress is “people can handle and developers should err on the side of stricture.”

Current Directions

Software like org-mode and twiki are attempts to leverage structure within unstructured forms, and although the buzz around enterprise content management (ECM) has started to die down, there is a huge collection of software that attempts to impose some sort of order on the chaos of unstructured documents and information. ECM falls short probably because it’s not structured enough: it mandates a small amount of structure (categories, some meta-data, perhaps validation and workflow,) which doesn’t provide significant benefit relative to the amount of time it takes to add content to these repositories.

There will be more applications that bridge the structure boundary, and begin to allow users to work with more structured data in a productive sort of way.

On a potentially orthogonal note, I’m working on cooking up a proposal for a LaTeX-based build system for non-technical document production that might demonstrate--at least hypothetically--how much structure can help people do awesome things with technology. I’m calling it “A LaTeX Build System.”

I’d love to hear what you think, either about this “structure question,” or about the LaTeX build system!

Are We Breaking Google?

2011-06-30 – tychoish

Most of the sites I visit these days are: Wikipedia, Facebook, sites written by people I’ve known online since the late 1990s, people who I met online around 2004, and a few sites that I’ve learned about through real life connections, open source, and science fiction writing. That’s about it, it sounds like a lot, and it is, but the collection is pretty static.

As I was writing about my nascent list of technical writing links, I realized that while I’ve been harping on the idea of manually curated links and digital resources for for a single archives for a couple of years now, I’ve not really thought about the use or merits of manually curated links to the internet writ large.

After all you can find anything you need with Google. Right?

I mostly assumed that if I could get people to curate their own content, “browsing” would become more effective, and maybe Google technology could adapt to the evolving social practice?

Though the inner workings of Google are opaque, we know that Google understands the Web by following and indexing the links that we create between pages. If we don’t link, Google doesn’t learn. Worse, if we let software create all the links between pages, then Google starts to break.

Put another way: the real intelligence of Google’s index isn’t the speed and optimization of a huge amount of data--that’s a cumbersome engineering problem--but rather our intelligence derived from the links we make. As our linking patterns change, as all roads begin to lead back to Wikipedia, as everyone tries to “game” Google (at least a little) the pages that inevitably float to the top are pages that are built to be indexed the best.

And Google becomes a self-fulfilling prophecy because whatever page we do find out about and create links to are the links that we’ve found using Google. We’ve spent a lot time thinking about what happens if google becomes evil, to think about what happens to us as Google stops providing new and useful information. We’ve spent considerably less work considering what happens when Google becomes useless.

Make Emacs Better

2011-06-22 – tychoish

I love emacs. I’m also aware that emacs is a really complex piece of software with staggering list of features and functionality. I’d love to see more people use emacs, but the start up and switch cost is nearly prohibitive. I do understand that getting through the “emacs learning curve” is part of what makes the emacs experience so good.

That said, there really ought to be a way to make it easier for people to start using emacs. Think of how much more productive some developers and writers would be if the initial experience of emacs was less overwhelming. And if emacs were easier to use, developers could use emacs as a core (embeded, even) component of text-editing applications, for instance, some sort of specific IDE built with emacs tools, or a documentation creation and editing toolkit built with emacs. I’d go for it, at least.

To my mind there are three major challenges for greater emacs usability. Some of these may be pretty easy to change non-intrusively, others less so. Feedback is, of course, welcome:

1. The biggest problem is that there’s no default configuration. While I appreciate that this provides a neutral substrate for people to customize emacs for themselves, you have to write lisp in order to do pretty much anything in emacs other than write lisp. And customize-mode is well unmentioned, but not particularly usable.

Perhaps one solution to this problem would be to create a facility within emacs to build “distributions,” that come configured for specific kinds of work. That way, emacs can continue to be the way it is, and specialized emacs can be provided and distributed with ease.

2. Improve the customize interface. I like the idea of customize, but I find it incredibly difficult to use and navigate, and end up setting all configuration values manually because that’s easier to keep track of and manage. I’d prefer an option where you configure your emacs instance the way you want (through some sort of conventional menu system), and then have the option of “dumping state” to an arbitrary file that makes a little more sense than the lisp structure that customize produces. Then, as needed, you could load these “state file(s),” But then I’ve never used the menu-bar at all, so perhaps I’m not the best person to design such a system.

This strikes me as a more medium term project, and would make it easier for people who want to modify various basic behaviors and settings. I don’t think that it would need to totally supplant customize, but it might make more sense.

3. Improve and add the ability to extend emacs beyond emacs-lisp. I initially thought emacs-lisp was a liability for emacs adoption and I don’t think this is uncommon, but I’ve since come to respect and understand the utility of emacs lisp. Having said that, I think offering some sort of interopperability between emacs and other languages and interperators, might be a good thing. Ideas like ParrotEmacs and using the Guile VM to run existing emacs-lisp in addition to other new code would be great.

This is a longer term project, of course, but definitely opens emacs up to more people with a much more moderate learning curve.

I’ve been working (slowly) on getting my base configuration into a presentable state that I can push it to a git repository for everyone to see and use, which (at least for me) might start to address problems one and two, but three is outside of the scope of my time and expertise. The truth is that emacs is so great and so close to being really usable for everyone, that a little bit of work on these, and potentially other, enhancements could go a long way toward making emacs better for everyone.

Who’s with me? Let’s talk!

The Death of Blogging

2011-06-17 – tychoish

I think blogging died when two things happened:

1. A blog became a required component in constructing a digital identity, which happened around the time that largely-static personal websites started to disappear. Blogs always dealt in the construction of identities, but until 2004, or so, they were just one tool among many.

2. Having a blog became the best, most efficient way for people to sell things. Blogging became a tool for selling goods and services, often on the basis of the reputation of the writer

As these shifts occurred, blogs stopped being things that individual people had, and started being things that companies created to post updates, do out reach, and “do marketing.” At about the same time, traditional media figured out, at least in part, what makes content work online. The general public has become accustomed to reading content online. The end result is that blogs are advertising and sales vectors, and this makes them much less fun to read.

When blogging was just a thing people did, mostly because it let them present and interact with a group of writers better than they could otherwise, there was vitality: people were interested in reading other people’s blogs and comment threads. This vitality makes it more interesting to write blogs than pretty much any kind of content. The excitement of direct interaction with readers, the vitality of blogging transcends genre, from technical writing and documentation to fiction to news analysis and current events.

The vitality of blogging is what makes blogs so attractive to traditional media and to corporations for marketing purposes, so maybe you can’t have the good without the bad.

Everyone blogs. And perhaps that’s a bit of the problem: too much content means that it’s hard to have a two way conversation between blogs and bloggers. Who has time to read all those words anyway? Blogging is great in part because it’s so democratic: anyone can publish a blog. This isn’t without a dark side: we run the risk of blogging without audience, or without significant interaction with the audience, as a result of the volume of content which threatens the impact of that democracy. But it makes sense, New forms and media don’t solve the problem cultural participation and engagement, they just shift the focus a little bit.

Never Ending

2011-06-15 – tychoish

I’m still not totally settled into my new routine, and I think that’s apparent in the blog. These things happen, and I just realized that this is the third summer in a row with some sort of major life change. Maybe I’ve forgotten how to exist in a summer routine. While I should probably give myself a break, I think it’s more realistic to accept a certain level of disruption as “the new normal,” and figure out how to develop a routine around that. That’s the hope at any rate. So, I’m getting there, slowly.

I’ve posted a number of new rhizomes in the last week. They are:

Security isn’t A Technological Problem, A post in my series about addressing problems in IT as human-issue, that need documentation and training rather than more software.
These Shoes Were Made for Cyborgs, which attempts to limit the potential for overly expansive theorizing of “the cyborg,” in a common but not overly productive manner.
Little Goals and Big Projects, a list of projects that I want to work on. Think mid-year resolutions meet five-year plan, meet time management review.

I’ve also done some maintenance (gardening?) on the wiki and added or edited the following pages:
I imported some comments from Facebook regarding my intellectual-practice post onto the discussion page. These comments are pretty valuable and I’ve found the conversation useful, hopefully you will too. Feel free to add your own comments there.
Similarly, I imported some comments onto the discussion for the Career Pathways post.
In response to one of the comments the Intellectual Practice post, I put together a pedagogy page, including some very rough descriptions of “writing classes I wish I’d taken and would love to teach.”
Not strictly tychoish related, but I revised my personal profile at tychogaren.com to be a bit more up to date and generally less weird/awkward.

Security Isn't a Technological Problem

2011-06-08 – tychoish

Security, of technological resources, isn’t a technological problem. The security of technological resources and information is a problem with people.

There.

That’s not a very ground breaking conclusion, but I think that the effects of what this might mean for people doing security¹ may be more startling.

Beyond a basic standard of “writing and using quality software” and following sane administration practices, the way to resolve security issues is to fix the way people use and understand the implications of their use.

There are tools that help control user behavior to greater or lesser degrees. Things like permissions control, management, auditing, and encryption, but they’re just tools: they don’t solve the human problems and the policy/practice issues that are the core of best security practice. Teaching people how their technology works, what’s possible and what’s not possible, and finally how to control their own data and resources is the key to increasing and providing security services to everyone.

I think of this as the “free software solution,” because it draws on the strengths and methods of free software to shape and enhance people’s user experience and to improve the possible security of the public network as a whole. One of the things that has always drawn me to free software, and one of its least understood properties, deals with the power of source code to create an environment that facilitates education and inquiry. People who regularly use free software, I’d bet, have a better understanding of how technology works than people who don’t, and it’s not because free software users have to deal with less polished software (not terribly true), but has something to do with a different relationship between creators and users of software. I think it would be interesting to take this model and apply it to the “security problem.”

With luck, teaching more people to think about security processes will mean that users will generally understand:

how encryption works, and be more amenable to managing their own cryptography identities and infrastructure. (PGP and SSH)
how networking works on a basic level to be able to configure, set, and review network security. (LAN Administration, NetFilter)
how passwords are stored and used, and what makes strong passwords that are easy to remember and difficult to break.
how to control and consolidate identity systems to minimize social engineering vulnerabilities. (OpenID, OAuth, etc.)

There’s a lot of pretty basic knowledge that I think most people don’t have. At the same time, I think it’s safe to say that most of the difficult engineering questions have been solved regarding security, there’s a bunch of tooling and infrastructure on the part of various services that would make better security practices easier to maintain (i.e. better PGP support in mail clients). In the mean time….

Stay smart.

Security, being a process, rather than a product. Cite. ↩︎

A Step Back#

What are big data?#

Data Specialists#

Big Data Beyond Software Development#

The Debate#

Definitions#

The Impact#

Current Directions#