Cyborg Analysis and Technology Policy

I want to put together a series of posts about cyborg perspectives on industrial IT practices. This post introduces that series.

The "cyborg analysis," makes difficult to ignore that most problems with technology, even seemingly outright "technological problems" are better as understood as problems at the intersection of humans and technology. Ignoring either people or technology, leads to imperfect analysis. While, false divides between "problems with users," and "problems with technology" aren't helpful either, this might be a conceptually useful exercise.

I've been thinking that many of the seemingly technological problems of IT policy are really not technological problems at all, and are better thought of as "people problems." I'm thinking of "problems" like file permissions, access control, group and user management, desktop management, and data organization. Obviously, some problems, like workflow and tool optimization, are seen human problems while others are often targeted as technological problems.

I think of human solutions as those responses to technology issues that address misunderstanding by providing training and education to users. There's also a class of automations that make these human solutions easier to provide: for example, test harnesses of various sorts, code/text validation, and some template systems. Technological solutions, by contrast are those that, through the use of code, enforce best practices and business objectives: web filters, access management systems, and anything that are designed to "break" if business policy are not satisfied.

The divides between solutions are not natural or clear, just as problem domains in IT are rarely neatly or easily "silo'd." There is often a fine line between writing code to automate a process to make users' work easier and writing code to control users and save users from themselves.

This post, and the series that follows it, are thus to explore this tension, and what I suspect are under-realized opportunities for human-solutions. I would like to think that problem domains in the larger IT world, particularly with thorny issues that require lots of code, may be better addressed with more human-centric solutions. Current thoughts revolve around:

  • The practice and method of software design.
  • The development and use of software developed for "internal use," rather than as a software product.
  • Refraining the discussion about digital security to address it from a human angle rather than from a purely technological angle.

As always, I'd like to invite comments and discussion.

Better Task Lists

Just about everyone keeps a task list of some sort, or has at some point. To the casual observer, task list management might seems like a simple problem that could be augmented with a little bit of automation for great effect. Fire up your nearest "app store" and I would bet money that you'll find a at least a few developers that have had the same thought.

For such a seemingly simple engineering problem there is an inordinate amount of really bad software. While this might tempt us to reassess the complexity of the task management problem, I don't think this is really true. What happens, I'm convinced, is that people (i.e. cyborgs) make lists of tasks to solve different problems in their realities, and these different lists often require different automation. So while there are 10-20 basic task management applications, the number of distinct usage profiles exceeds that by several times. That's the theory at any rate.

Archetypes

Allowing for large amount of diversity, there are still a few generally useful task list "archetypes" that we can use to characterize how people use task lists. I just want to enumerate them, here for now. I might move them out to another page, and you should feel free to edit the page (it's a wiki!) if you think I've missed anything!

  • Collaboration Facilitation: Teams need systems to keep track of and work on shared issue queues. These are "bug trackers," and are totally essential when items or "issues:" take a significant time to complete, require the effort/input/awareness of multiple people, and need to be created or added by a number of people.
  • Memory Enhancement: People create todo lists when they're working more things than they can comfortably remember at any one time. The lists tend to be ephemeral, the items can be quickly resolved, but we make these lists so we don't have to remember a long list of things while working. Think post-it notes.
  • Obligation Management: These lists bring us close to calendars, but we keep them to make sure we remember to do required tasks. Often these lists are helpful in helping people make sure that they're "caught up," so that they can enjoy and use free time without interruption or nagging.
  • Task Prioritization: When time is limited, a list of tasks is useful in organizing an order, and making use of available time, so that it's possible to keep track of all open tasks, while also being able to allocate effort and time to tasks in a smart way that accounts for available time, importance, and deadlines. The goal is to get the most crucial things done while also never wondering what one could be doing with a few free moments.
  • Progress Tracking: These lists are less to track things todo and more to track have done. When working on a number of long term and short term projects, a list of what's open, what's been finished, along with a status of where things are is useful to avoid loosing track of projects and tasks.

Based on this, I think we can distill a number of overriding qualities of task lists from these five archetypal use cases. That working list is:

  • item granularity,
  • project-level organization,
  • scheduling and deadlines, and
  • use of priority markers. ## Personal Case Studies

When I had an epic commute my issue was that while I had free time, it was all in 20 or 40 minute segments on trains. The challenge was to be organized and focused enough to really use this time. It was early and while I was awake enough to get work done, I wasn't always awake enough to figure out what needed my attention. And I didn't have enough routine minutia or enough other free time to be able to spend these blocks of time doing minutia. I needed a task list that told me what to do and when to do it. I needed Task prioritization with a little obligation management or something close to that. I chopped every task into the smallest actionable items, which is annoying in creative projects and often not very useful, and then scheduled items out so that I could do 6 or so things a day, and anytime I opened the laptop there was a list of things I could jump into. It worked, more or less.

Now, I have free time. I even have a few hours strung together. The issue is less that I need help filling in every little moment with something to do, and more that I have too much that I could be working on, and I need help figuring out what the status is of ongoing projects are and where I ought to things are and what needs my attention the most. Progress tracking, more or less with a little task prioritization but in a very different way than I'd been doing. it. I've got a lot of things, and I need to be able to see where projects, and what needs attention now. I've not figured out the best solution, but I think less scheduling and bigger conceptual task objects is more the way to go.

Does this way of thinking about things make sense to other people?

Big Data Impact

I've been milling over this post about big data in the IT world for quite a while. It basically says that given large (and growing) data sets, companies that didn't previously need data researchers suddenly need people to help them use "big data." Everyone company is a data company. In effect we have an ironic counter example to the effect of automation on the need for labor.

These would be "data managers" have their work cut out for them. Data has value, sure, but unlike software which has value as long as someone knows how to use it, [1] poorly utilized data is just as good as no data. Data researchers need to be able to talk to developers and help figure out what data is worth collecting and what data isn't. Organizations need someone to determine what data has real value, if only to solve a storage-related problem. Perhaps more importantly data managers would need to guide data usage both technically (in terms of algorithms, and software design) and in terms of being able to understand the abilities and shortfalls of data sets.

There's a long history of IT specialist positions: database developers, systems administrators, quality assurance engineers, release engineering, and software testing. Typically we divide this between developers and operations folks, but even the development/operations division is something of a misnomer. There are merits to generalism and specialization, but as projects grow, specialization makes sense and data may just be another specialty in a long tradition of software development and IT organization.

Speicailization also makes a lot of sense in the context of data, where having a lot of unusable data adds no value and can potentially subtract value from an organization.

A Step Back

There are two very fundamental points that I've left undefined: what "data" am I talking about and what kinds of skills differentiate "data specialists" from other kinds of technicians.

What are big data?

Big data sets are, to my mind, large collections of data, GIS/map based information, "crowd sourced" information, and data that is automatically collected through the course of normal internet activity. Big data is enabled by increasingly powerful databases and the ubiquity of the computing power, which lets developers process data on large scales. For examples: the aggregate data from foursquare and other similar services, comprehensive records of user activity within websites and applications, service monitoring data and records, audit trails of activity on shared file systems, transaction data from credit cards and customers, tracking data from marketing campaigns.

With so much activity online, it's easier for software developers and users (which is basically everyone, directly or otherwise) to create and collect a really large collection of data regarding otherwise trivial events. Mobile devices and linkable accounts (OpenID, and other single sign-on systems) simplify this process. The thought and hope is all this data equals value and in many circumstances it does. Sometimes, it probably just makes things more complicated.

Data Specialists

Obviously every programmer is a kind of "data specialist" and the last seven or eight years of the Internet has done everything to make every programmer a data specialist. What the Internet hasn't done is give programers a sense of basic human factors knowledge, or a background in fundamental quantitative psychology and sociology. Software development groups need people who know what kinds of questions data can and cannot answer regardless of what kind or how much data is present.

Data managers, thus would be one of those posistions that sits between/with technical staff and business staff, and perhaps I'm partial to work in this kind of space, because this is very much my Chance. But there's a lot of work in bridging this divide, and a great deal of value to be realized in this space. And it's not like there's a shortage of really bright people who know a lot about data and social science who would be a great asset to pretty much any development team.

Big Data Beyond Software Development

The part of this post that I've been struggling over for a long time is the mirror of what I've been talking about thus far. In short, do recent advancements in data processing and storage (NoSQL, Map Reduce, etc.) that have primarily transpired amonst startups, technology incubators, and other "Industry" sources have the potential to help acdemic research? Are there examples of academics using data collected from the usage habits of websites to draw conclusions about media interaction, reading habits, cultural particpation/formation? If nothing else are sociologists keeping up with "new/big data" developents? And perhaps most importantly, does the prospect of being able to access and process large and expansive datasets have any affect on the way social scientists work? Hopefully someone who knows more about this than I do will offer answers!

[1]Thankfully there are a number of conventions that make it pretty easy for software designers to be able to write programs that people can use without needing to write extensive documentation.

The Structured and Unstructured Data Challenge

The Debate

Computer programmers want data to be as structured as possible. If you don't give users a lot of room to do unpredictable things, it's easier to write software that does cool things. Users on the other hand, want (or think that they want) total control over data and the ability to do whatever they want.

The problem is they don't. Most digital collateral, even the content stored in unstructured formats, is pretty structured. While people may want freedom, they don't use it, and in many cases users go through a lot of effort to recreate structure within unstructured forms.

Definitions

Structured data are data that is stored and represented in a tabular form or as some sort of hierarchical tree that is easily parsed by computers. By contrast, unstructured data, are things like files that have data and where all of the content is organized manually in the file and written to durable storage manually.

The astute among you will recognize that there's an intermediate category, where largely unstructured data is stored in a database. This happens a lot in content management systems, in mobile device applications, and in a lot of note taking and project management applications. There's also a parallel semi-structured form, where people organize their writing, notes, content in a regular and structured manner even though the tools they're using don't require it. They'd probably argue that this was "best practice," rather than "semi-structured" data, but it probably counts.

The Impact

The less structured content or data is the less computer programs are able to do with the data, and the more people have to work to make the data useful for them. So while we as users want freedom, that freedom doesn't get us very far and we don't really use it even when we have it. Relatedly, I think we could read the crux of the technological shift in Web 2.0 as a move toward more structured forms, and the "mash up" as the celebration of a new "structured data."

The lines around "semi-structured" data are fuzzy. The trick is probably to figure out how to give people just enough freedom so that they don't feel encumbered by the requirements of the form, but so much freedom that the software developers are unable to do really smart things behind the scene. That's going to be difficult to figure out how to implement, and I think the general theme of this progress is "people can handle and developers should err on the side of stricture."

Current Directions

Software like org-mode and twiki are attempts to leverage structure within unstructured forms, and although the buzz around enterprise content management (ECM) has started to die down, there is a huge collection of software that attempts to impose some sort of order on the chaos of unstructured documents and information. ECM falls short probably because it's not structured enough: it mandates a small amount of structure (categories, some meta-data, perhaps validation and workflow,) which doesn't provide significant benefit relative to the amount of time it takes to add content to these repositories.

There will be more applications that bridge the structure boundary, and begin to allow users to work with more structured data in a productive sort of way.

On a potentially orthogonal note, I'm working on cooking up a proposal for a LaTeX-based build system for non-technical document production that might demonstrate--at least hypothetically--how much structure can help people do awesome things with technology. I'm calling it "A LaTeX Build System."

I'd love to hear what you think, either about this "structure question," or about the LaTeX build system!

Are We Breaking Google?

Most of the sites I visit these days are: Wikipedia, Facebook, sites written by people I've known online since the late 1990s, people who I met online around 2004, and a few sites that I've learned about through real life connections, open source, and science fiction writing. That's about it, it sounds like a lot, and it is, but the collection is pretty static.

As I was writing about my nascent list of technical writing links, I realized that while I've been harping on the idea of manually curated links and digital resources for for a single archives for a couple of years now, I've not really thought about the use or merits of manually curated links to the internet writ large.

After all you can find anything you need with Google. Right?

I mostly assumed that if I could get people to curate their own content, "browsing" would become more effective, and maybe Google technology could adapt to the evolving social practice?

Though the inner workings of Google are opaque, we know that Google understands the Web by following and indexing the links that we create between pages. If we don't link, Google doesn't learn. Worse, if we let software create all the links between pages, then Google starts to break.

Put another way: the real intelligence of Google's index isn't the speed and optimization of a huge amount of data--that's a cumbersome engineering problem--but rather our intelligence derived from the links we make. As our linking patterns change, as all roads begin to lead back to Wikipedia, as everyone tries to "game" Google (at least a little) the pages that inevitably float to the top are pages that are built to be indexed the best.

And Google becomes a self-fulfilling prophecy because whatever page we do find out about and create links to are the links that we've found using Google. We've spent a lot time thinking about what happens if google becomes evil, to think about what happens to us as Google stops providing new and useful information. We've spent considerably less work considering what happens when Google becomes useless.

Security Isn't a Technological Problem

Security, of technological resources, isn't a technological problem. The security of technological resources and information is a problem with people.

There.

That's not a very ground breaking conclusion, but I think that the effects of what this might mean for people doing security [1] may be more startling.

Beyond a basic standard of "writing and using quality software" and following sane administration practices, the way to resolve security issues is to fix the way people use and understand the implications of their use.

There are tools that help control user behavior to greater or lesser degrees. Things like permissions control, management, auditing, and encryption, but they're just tools: they don't solve the human problems and the policy/practice issues that are the core of best security practice. Teaching people how their technology works, what's possible and what's not possible, and finally how to control their own data and resources is the key to increasing and providing security services to everyone.

I think of this as the "free software solution," because it draws on the strengths and methods of free software to shape and enhance people's user experience and to improve the possible security of the public network as a whole. One of the things that has always drawn me to free software, and one of its least understood properties, deals with the power of source code to create an environment that facilitates education and inquiry. People who regularly use free software, I'd bet, have a better understanding of how technology works than people who don't, and it's not because free software users have to deal with less polished software (not terribly true), but has something to do with a different relationship between creators and users of software. I think it would be interesting to take this model and apply it to the "security problem."

With luck, teaching more people to think about security processes will mean that users will generally understand:

  • how encryption works, and be more amenable to managing their own cryptography identities and infrastructure. (PGP and SSH)
  • how networking works on a basic level to be able to configure, set, and review network security. (LAN Administration, NetFilter)
  • how passwords are stored and used, and what makes strong passwords that are easy to remember and difficult to break.
  • how to control and consolidate identity systems to minimize social engineering vulnerabilities. (OpenID, OAuth, etc.)

There's a lot of pretty basic knowledge that I think most people don't have. At the same time, I think it's safe to say that most of the difficult engineering questions have been solved regarding security, there's a bunch of tooling and infrastructure on the part of various services that would make better security practices easier to maintain (i.e. better PGP support in mail clients). In the mean time....

Stay smart.

[1]Security, being a process, rather than a product. Cite.

These Shoes Were Made for Cyborgs

"Do eye glasses make us all cyborgs?" Someone asked me a few days ago.

I was annoyed more than anything.

Of course they do. Corrective lenses are a non-biological technology that shape our experience of the world and of our bodies. By this logic, pretty much every tool developed as a product of "technology" (applied science; otherwise known as tinkering with stuff,) renders us cyborgs.

I like the notion that cyborgism is the rule and not the exception in the course of human history, but it makes the conversation about the cyborg moment more banal. A more banal cyborg moment makes it harder to think about the parts that I think are most interesting: the internet, distributed collaboration, free software and open source, and the impact of technology on literature and reading/writing.

As a retort, I said something like, "Perhaps, but if you accept that eye glasses create cyborg beings, then you'd have to accept that shoes also create cyborgs. And the effects of shoes are much more interesting."

Shoes affect how far people can walk, the speed of independent locomotion, they prevent all sorts of awful injuries, and probably lengthen the lifespan as a result. Shoes probably also change our feet and make us dependent upon wearing shoes, and more prone to certain kinds of injuries when barefoot. Fascinating stuff.

All other things being equal, I'm going to stick to internet and the cyborgs resulting from the encounter of humans and that technology.

Packaging Technology Creates Value

By eliminating the artificial scarcity of software, open source software forces businesses and technology developers to think differently about their business models. There are a few ways that people have traditionally built businesses around open free and open source software. There are pros and cons to every business model, but to review the basic ideas are:

  • Using open source software as a core and building a thin layer of proprietary technology on top of the open source core. Sometimes this works well enough (e.g. SugarCRM, OS X,) and sometimes this doesn't seem to work as well (e.g. MySQL, etc.)
  • Selling services around open source software. This includes support contracts, training services, and infrastructure provisioning. Enterprises and other organizations and projects need expertise to make technology work, and the fact that open source doesn't bundle licensing fees with support contracts doesn't make the support (and other services) less useful or needed for open source.
  • Custom development services. Often open source projects provide a pretty framework for a technology, but require some level of customization to fit the needs and requirements of the "business case." The work can be a bit uneven, as with all consulting, but the need a service are both quit real. While the custom code may end up back in the upstream, sometimes this doesn't quite happen for a number of reasons. Custom development obviously overlaps with service and thin-proprietarization, but is distinct: it's not a it doesn't revolve around selling proprietary software, and it doesn't involve user support or systems administration. These distinctions can get blurry in some cases.

In truth, when you consider how proprietary software actually convey value, it's really the same basic idea as the three models above. There's just this minor mystification around software licenses, but other than that, the business of selling software and services around software doesn't vary that much.

James Governor of Red Monk suggests a fourth option: Packaging technology.

The packaging model is likely just an extension of the "services" model, but it draws attention to the ways that companies can create real value not just by providing services and not just by providing a layer of customization, but by spending time attending to the whole experience, rather than the base technology. It also draws some attention to the notion that reputation matters.

I suppose it makes sense: when businesses (and end users) pay for proprietary software, while the exchange is nominally "money" for "license" usage rights, in reality there are services and other sources of value. Thus it is incumbent upon open source developers and users to find all of the real sources of value that can be conveyed in the exchange of money for software, and find ways to support themselves and the software. How hard can it be?