Ikiwiki Tasklist Update

I added a few lines to a script that I use to build my task list, and for the first time ever, I opened a file with code in it, added a feature, tested it, and it worked. Here's the code with enough context so it makes sense (explained later if you don't want to spend the time parsing it:)

ARG=`echo "$@" | sed -r 's/\s*\-[c|p|s]\s*//g'`
WIKI_DIR="`echo $ARG | cut -d " " -f 1`"
if [ "`echo $ARG | cut -d " " -f 2 | grep -c /`" = 1 ]; then
   TODO_PAGE="`echo $ARG | cut -d " " -f 2`"
elif [ "`echo $ARG | cut -d " " -f 2 | grep -c $EXT`" = 1 ]; then
   TODO_PAGE="$WIKI_DIR/`echo $ARG | cut -d " " -f 2`"
else
   TODO_PAGE="$WIKI_DIR/`echo $ARG | cut -d " " -f 2`.$EXT"
fi

This is from the section of the script that processes the arguments and options on the command line. Previously, commands were issued such that:

ikiwiki-tasklist [-c -p -s] [DIR_TO_CRAWL] [OUTPUT TODO FILE NAME]

My goal with the options was to have something that "felt like" a normal command with option switches and had a lot of flexibility. The two fields that followed: however, I didn't provide as much initial flexibility. The directory to crawl for tasks (i.e. "[DIR_TO_CRAWL]") was specified the way it is now, but the output file was 1) assumed to have an extension specified in a variable at the top of the script, 2) automatically placed the output file in the top level of the destination directory.

It worked pretty well, but with the advent of a new job I realized that I needed some compartmentalization. I needed to fully use the tasklist system for personal and professional tasks without getting one set of items mixed in with the other. Being able to have better control of the output is key to having full control over this.

The modification detects if the output file looks like a path rather than a file name. If it's senses a path, it creates the task list in the path specified, with no added extension. If a file name specifies the extension, then you won't get ".ext.ext" files. And the original behavior is preserved.


I'm a hacker by inclination: I take code that I find and figure out how to use it. Sometimes I end up writing or writing code, but I'm not really a programmer. My own code, at least until recently has tended to be somewhat haphazard and until now (more or less) I've not felt like I could write code from scratch that was worth maintaining and enhancing in any meaningful way.

Apparently some of that's changed.

I've made a few additional changes to the scripts, but most of these feel more trivial and can be described as "I learned how to write slightly tighter shell scripts. so if you're using it you might want to update: the ikiwiki tasklist page is up to date.

Issue Tracking and the Health of Open Source Software

I read something recently that suggested that the health of an open source project and its community could be largely assessed by reviewing the status of the bug tracker. I'm still trying to track down the citation for this remark. This basically says that vital active projects have regularly updated bugs that are clearly described and that bugs be easy to search and easy to submit.

I'm not sure that free software communities and projects can be so easily assessed or that conventional project management practices are the only meaningful way to judge a project's health. While we're at it, I don't know that it's terribly useful to focus too much attention or importance on project management. Having said that, the emergence of organizational structure is incredibly fascinating, and could probably tolerate more investigation.

As a starting point, I'd like to offer two conjectures:

  • First, that transparent issue tracking is a reasonably effective means of "customer service," or user support. If the bug tracking contains answers to questions that people encounter during use, and provide a way to resolve issues with the software that's productive and helps with support self-service. Obviously some users and groups of users are better at this than others.
  • Second, issue tracking is perhaps the best way to do bottom-up project/product management and planning in the open, particularly since these kinds or projects lack formal procedures and designated roles to do this kind of organizational work.

While the overriding goal of personal task management is to break things into the smallest manageable work units, the overriding goal of issue tracking systems is to track the most intellectually discrete issues within a single project through the development process. Thus, issue tracking systems have requirements that are either much less important in personal systems or actively counter-intuitive for other uses. They are:

  • Task assignment, so that specific issues can be assigned different team members. Ideally this gets a specific developer can "own" a specific portion of the project and actually be able to work and coordinate efforts on the project.
  • Task prioritization, so that less important or crucial issues get attention before "nice to have," items are addressed.
  • Issue comments and additional attached information, to track progress and support information sharing among teams, particularly over long periods of time with asynchronous elements.

While it's nice to be able to integrate tasks and notes (this is really the core of org-mode's strength) issue tracking systems need to be able to accommodate error output and discussion from a team on the best solution, as well as discussion about the ideal solution.

The truth is that a lot of projects don't do a very good job of using issue tracking systems, despite how necessary and important bug trackers. The prefabricated systems can be frustrating and difficult to use, and most of the minimalist systems [1] are hard to use in groups. [2] The first person to write a fully featured, lightweight, and easy to use issue tracking system will be incredibly successful. Feel free to submit a patch to this post, if you're aware of a viable systems along these lines.

[1]I'm thinking about using ikiwiki or org-mode to track issues, but ditz suffers from the same core problem.
[2]Basically, they either sacrifice structure or concurrency features or both. Less structured systems rely on a group of people to capture the same sort of information in a regular way (unlikely) or they capture less information, neither option is tenable. Without concurrency (because they store things in single flat files) people can't use them to manage collaboration, which make them awkward personal task tracking systems.

Caring about Java

I often find it difficult to feign interest the discussion of Java in the post Sun Microsystems era. Don't get me wrong, I get that there's a lot of Java out there, I get that there are a number of technological strengths and advantages that Java has in contrast some other programming platforms. Consider my post about worfism and computer programing for some background on my interest in programing languages and their use.

I apologize if this post is more in the vein of "a number of raw thoughts," rather than an actual organized essay.

In Favor of Java

Java has a lot of things going for it: it's very fast, it runs code in a VM that lets the code execute in a mostly isolated environment which increases reliability and security of the applications that run on the Java Platform. I think of these as "hard features" or technological realities that are presently implemented and available for users.

There are also a number of "soft features," that Java has that inspire people to use it: an extensive and reliable standard library, a large expanse of additional library support for most things, a huge developer community, and it has inclusion in computer science curricula so people are familiar with it. While each of these aspects are relatively minor, and could theoretically apply to a number of different languages and development platforms, they represent a major rationale for it's continued use.

One of the core selling points of Java has long been the fact that because Java runs on a virtual machine that can abstract differences between different operating systems and architectures, it's possible to write and compile code once and then run that "binary" on a number of different machines. The buzzword/slogan for this is "write once, run anywhere." This doesn't fit easily into the hard/soft feature dichotomy I set up above, but it nevertheless and important factor.

Against Java

Teasing out the history of programing language development is probably a better project for another post (or career?), but while Java might have once had a greater set of support for many common programming tasks, I'm not sure that it's sizable standard library and common tooling continues to overwhelm it's peers. At best this is a draw with languages like Perl and Python, but more likely the fact that the JDK is so huge and varied increases incompatibility potentials. And needing to download the whole JDK to run even minimalist Java programs. Other languages have addressed the tooling and library support in different way, and I think the real answer to this problem is write with an eye towards minimalism and make sure that there are really good build systems.

Most of the arguments in favor of Java revolve around the strengths of the Java Virtual Machine, which is the substrate where Java programs run. And it is undeniable that the JVM is an incredibly valuable platform, and every report that I've seen concludes that the JVM is really fast, and the VM model does provide a number of persuasive features (e.g. sandboxing, increased portability, performance gains.) That's cool, but I'm not sure that any of these "hard" features matter these days:

Most programing languages use a VM architecture these days. Raw speed, of the sort that Java has, is less useful than powerful concurrent programing abilities and is offset by the fact that computers themselves are absurdly fast. It's not to say that Java fails because others have been able to replicate the strengths of the Java platform, but it does fail to inspire excitement.

The worth of Java's "cross platform" capabilities are probably negated by service-based computing (the "cloud,") and the fact that cross platform applications, GUI or otherwise, are probably an ill gotten dream anyway.

The more I construct these arguments, I keep circling around the idea that while Java pushed a lot of programmers and language designers to think about what kind of features that programing languages needed. The world of computing and programming has changed in a number of significant ways, and we've learned a lot about the art of designing programming languages in the mean time. I wonder if my lack of enthusiasm (and yours as well, if I may be so bold) has more to do with a set of assumptions about the way programing languages should be that haven't aged particularly well. Which isn't to say that Java isn't useful, or that it is no longer important, merely that it's become uninteresting.

Thoughts?

Objective Whatsis

Ok, confession time, I don't really get object orientation. So in an effort to increase my understanding, I'm going to write some overview and discussion in an effort to understand things a bit better. Hopefully some of you will find this helpful. I've tinkered with programing for a long time. I read a huge chunk of a popular Python introductory text, and I've read a chunk of Practical Common Lisp, but people start talking about objects and I loose track of everything. I think there's something slightly unconventional--at least initially--about object orientation.

We understand procedural programming pretty easily. There's a set of steps that you need to perform, and you tell the computer what they are, and then data gets handed off to the program, it runs, and the steps are performed at the end. Or, conversely, someone calls your program and says "I want data" (perhaps implicitly) and the program says "ok, to go get data, I need to do these things," and then it runs and at the end you see data.

Object orientation turns this sort of sideways and says "lets build a representation of our data (objects) then write code that says what happens to those objects." Ok, that almost makes sense, data happens to your program and you write code to provide behaviors and responses to all of the things that will happen when your program runs. So you feed objects (data) into your program it does its thing in response to those objects and different data (probably) exists on the other end.

I hope I haven't lied yet! To continue...

The thing that always confused me, given my utter lack of background is this whole "methods," and "classes," thing that programmer types launch into almost immediately. To overview:

Objects as I said above, are just another way to think about data. It's a "thing," that the program has to deal with. Classes really just represent the structure of a program. We hear "objects are instances of classes," but this feels sort of backwards, it feels more intuitive to say that classes provide a framework for interacting with objects: they describe the loose "shape" of their objects and then create a place for behaviors to exist. Methods, then, are those behaviors. Often methods "belong" to classes (either literally in the structure of the code, or just conceptually) and they define what happens to objects as the program runs.

Thus the role of an object oriented runtime (or compiler? I think that's the right word for the program that executes the program,) is to take data that comes in, figure out what class (or classes) the object "fits into," and then apply the methods that belong to that class.

Whew! So, how'd I do?

Assuming my understanding is correct, allow to offer the following analysis:

  • By using multiple methods, in given sequence you can reuse code within a class, rather than needing to define and redefine a set of increasingly complex procedures.

  • At the same time there's a much higher start up cost for object oriented code. Because we think about getting things done in programs we--or I--tend to think in terms of procedures, rather than objects, it takes a bit of extra brainpower to do the object oriented way. And for most tasks--which are pretty small--creating classes and creating methods seems like a lot of stuff to have to hold in your head when you're figuring out what needs to done?

    It sort of seems like, in order to do object orientation right, you have to already know what has to happen in the program. Otherwise, classes fail to properly describe the data/methods that you need.

  • Ok, so now that I, more or less, understand what's going on here, might we be better off calling it class-oriented programming? Or "class-centered" programming?

Thoughts?

Analyzing the Work of Open Source

This post covers the role and purpose (and utility!) of analysts and spectators in the software development world. Particularly in the open source subset of that. My inspirations and for this post come from:


In the video Coté says (basically,) open source projects need to be able to justify the "business case" for their project, to explain what's the innovation that this project seeks to provide the world. This is undoubtedly a good thing, and I think we should probably all be able to explore and clearly explain and even justify the projects we care about and work on in terms of their external worth.

Project leaders and developers should be able to explain and justify the greater utility of their software clearly. Without question. At the same time, problems arise when all we focus on is the worth. People become oblivious to how things work, and become unable to successfully participate in informed decisions about the technology that they use. Users, without an understanding of how a piece of technology functions are less able to take full advantage of that technology.

As an aside: One of the things that took me forever to get used to about working with developers is the terms that they describe their future projects. They use the imperative case with much more ease than I would ever consider: "the product will have this feature" and "It will be architected in such a way." From the outside this kind of talk seems to be unrealistic and grandiose, but I've learned that programmers tend to see their projects evolving in real time, and so this kind of language is really more representative of their current state of mind than their intentions or lack of communications skills.

Returning for a moment to the importance of being able to communicate the business case of the projects and technology that we create. As we force the developers of technology to focus on the business cases for the technology they develop we also make it so that the only people who are capable of understanding how software works, or how software is created, are the people who develop software. And while I'm all in favor of specialization, I do think that the returns diminish quickly.

And beyond the fact that this leads to technology that simply isn't as good or as useful, in the long run, it also strongly limits the ability of observers and outsiders ("analysts") to be able to provide a service for the developers of the technology beyond simply communicating their business case to outside world. It restricts all understanding of technology to journalism rather than the sort of "rich and chewy" (anthropological?) understanding that might be possible if we worked to understand the technology itself.

I clearly need to work a bit more to develop this idea, but I think it connects with a couple of previous arguments that I've put forth in these pages one regarding Whorfism in Programming, and also in constructing rich arguments.

I look forward to your input as I develop this project. Onward and Upward!

Why Bother With Lisp?

I'm quite fond of saying "I'm not a programmer or software developer," on this blog, and while I don't think that there's a great chance that I'll be employed as a developer, it's becoming more apparent that the real difference between me and a "real developer" is vanishingly small. Stealth Developer, or something. In any case, my ongoing tooling around with common lisp and more recently the tumble manager project have given me opportunities to think about lisp and to think about why I enjoy it.

This post started when a friend asked me "so should I learn common lisp." And my first response was something to the effect of "no, are you crazy?" or, alternately "well if you really want to." And then I came to my senses and offered a more reasonable answer that I think some of you might find useful.

Let us start by asking "Should You Study Common Lisp?"

Yes! There are a number of great reasons to use Common Lisp:

  • There are a number of good open source implementations of the common lisp language including a couple of very interesting and viable options. They're also stable: SBCL which is among the more recent entrants to this field is more than a decade old.
  • There are sophisticated development tools, notably SLIME (for emacs) which connects and integrates emacs with the lisp process, as well as advanced REPLs (i.e. interactive mode). So getting started isn't difficult.
  • Common Lisp supports many different approaches to programming. Indeed, contemporary "advanced" languages like Ruby and Python borrow a lot from Lisp. So it's not an "archaic" language by any means. Dynamic typing, garbage collection, macros, and so forth.
  • CL is capable of very high performance, so the chance of saying "damn, I wish I wrote this in a faster language," down the road isn't terribly likely. Most implementations run on most platforms of any consequence, which is nice.
  • You're probably tired of hearing that "Learning Lisp will make your a better programmer in any language," but it's probably true on some level.

The reasons to not learn Lisp or to avoid using it are also multiple:

  • "Compiled" Lisp binaries are large compared to similarly functional programs in other languages. While most CL implementations will compile native binaries, they also have to compile in most of themselves.
  • Lisp is totally a small niche language, and we'd be dumb to assume that it's ever going to take off. It's "real" by most measurements, but it's never really going to be popular or widely deployed in the way that other contemporary languages are.
  • Other programmers will think you're weird.

Having said that all of I think we should still start projects in CL, and expand the amount of software that's written in the language. Here's why my next programing project is going to be written in lisp:

  • I enjoy it. I suspect this project like many projects you may be considering is something of an undertaking. Given that I don't want to have to work in an environment that I don't enjoy, simply because it's popular or ubiquitous.
  • Although Lisp isn't very popular, it's popular enough that all of the things that you might want to do in your project have library support. So it's not exactly a wasteland.
  • The Common Lisp community is small, but it's dedicated and fairly close knit. Which means you may be able to get some exposure for your application in the CL community, simply because your app is written in CL. This is a question of scale, but it's easier to stand out in a smaller niche.

Of course there are some advantages to "sticking with the crowd" and choosing a different platform to develop your application in:

  • If you want other people to contribute to your project, it's probably best to pick a language that the people who might be contributing to your application already know.
  • While there are libraries for most common things that you might want to do with Common Lisp, there might not be libraries for very new or very esoteric tasks or interfaces. Which isn't always a problem, but can be depending on your domain.
  • The binary size problem will be an issue if you plan to deploy in limited conditions (we're talking like a 15 meg base size for SBCL, which is a non issue in most cases, but might become an issue.)
  • If you run into a problem, you might have a hard time finding an answer. This is often not the case, but it's a risk.

Onward and Upward!

Where is Java Today?

A few weeks ago a coworker walked into my office to talk about the architecture of a project, complete with diagrams, numbers I didn't grasp (nor really need to,) and the examples of potential off the shelf components that would make up the stack of the application at hand. I asked scores of questions and I think it was a productive encounter. Normal day, really. I seem to be the guy developers come to and pitch ideas to for feedback. Not sure why but I thin think that the experience of talking through a programing or design problem tends to be a productive learning experience for everyone. In any case the details aren't terribly important

What stuck in my head is that an off the self, but non-trivial part of the system was written in Java.

We all inhaled sharply.


I don't know what it is about Java, and I don't think it's just me, but the moment I find out that an application is written in Java, I have a nearly visceral reaction. And I don't think it's just me.

Java earned a terrible reputation in the 90s, because although it was trumped as the next big thing every user facing application in Java sucked: first you had to download a lot of software (and hope that you got the right version of the dependency) and then when you ran the app it took a while to start up and looked like crap. And then your system ground to a halt and the app crashed. But these problems have been fixed: the dependency issue is more clear with the GPLing of Java, GUI bindings for common platforms are a bit stronger, computers have gotten a lot faster, and perhaps most importantly the hopes of using Java as the cross platform application development environment have been dashed. I think it's probably fair to say that most Java these days runs on the server side, so we don't have to interact with it in the same sort of hands on way.

This isn't to say that administering Java components in server operations is without problems: Java apps tend to run a bit hot (in terms of RAM,) and can be a bit finicky, but Java applications seem to fit in a bit better in these contexts, and certainly have been widely deployed here. Additionally, I want to be very clear, I don't want to blame the language for the poor programs that happen to be written in it.

Here are the (hopefully not too leading) questions:

1. Is the "write once run everywhere," thing that Java did in the beginning still relevant, for server-based applications? It's a server application after all, you wouldn't be loosing much by targeting a more concrete native platform.

2. Is the fact that Java is statically typed more of hindrance in terms of programmer time? And will the comparative worth of Java's efficiency wear off as computers continue to get more powerful

Conventional wisdom being that while statically typed apps "run faster," but take longer to develop. This is the argument used by Python/Perl/Ruby/etc proponents, and I don't know how the dynamics of these arguments shift in response to the action of Moore's Law.

3. One of the great selling points of Java is that it executes code in a "managed" environment, which provides some security and safety to the operator of the system. Does the emergence of system-level visualization tools make the sandboxing features of the JVM less valuable?

4. I don't think my experiences are particularly typical, but all of the Java applications I've done any sort of administrative work with have been incredibly resource intensive. This might be a product of the problem domains. Using Java is often like slinging a sledge hammer around, and so many problems these days don't really require a sledge hammer.

5. At this point, the amount of "legacy" Java code in use is vast. I sometimes have trouble understanding if Java current state is the result of all of the tools that have already been invested in the platform or the result of actually interesting and exciting developments in the platform. Like Clojure. Is Clojure (as an example,) popular because Lisp is cool again and people have finally come to their senses (heh, unlikely) or because it's been bootstrapped by java and provides a more pain free coding experience for Java developers?

Anyone have ideas on these points? Questions that you think I'm missing?

Bash Loops

I was talking with bear probably two years ago, about programing and how I'm not really a programmer, but I understand what's going on when programmers talk, and that any time I got close to code, I sort of kludged things together until it worked. This was probably long enough ago, that I was just on the cusp of getting into using Linux full time and being a command line guru.

Of shell scripting, he said something that left something of an impression on me. Something like, "The great thing about the shell, is once you figure out how to do something you never have to figure out how to do it again because you just make it into a script and run it again when you need to."

Which, now seems incredibly straightforward, but it blew my mind at the time. The best thing, I think, about using computers in the way that I now tend to, is that any time I run across a task that is in anyway repetitive I can save it as a macro (in a non technical sense of the word macro) and then call it back in the future. Less typing, less reading over help text, more doing things that matter.

One thing that got me for a while, was the "loop" in bash. I had a hell of a time making them work. And then a few weekends a go I had a task that required a loop, and I wrote one on the command line, and it worked on the first time through. Maybe I've learned something after all. For those of you who want to learn how to build a loop in shell scripting, lets take the following form:

for [variable] in [command]; do

   [command using] $[variable];

done

Typically these are all mashed up onto one line, which can be confusing. Conventionally [variable] is just the letter i, for "item." Note that the semi colons are crucial, and I think the bacticks are as well (I'd not leave them out,) but they might not be required.

So the loop I wrote. I noticed that there were a number of attempted SSH logins against my server, and while these sorts of SSH probes aren't a huge risk... better to not risk it. So I wanted to add rules to the firewall to block these IP addresses. Here's what I came up with:

for i in
  `sudo egrep 'Invalid user.*([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' /var/log/auth.log -o | \
  egrep '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' -o | sort | uniq`;

do

   `sudo iptables -I INPUT -s $i -j DROP`;

done

Basically, search the /var/log/auth.log for invalid login attempts, and return only the string captured by the regex. Send this to another egrep command which strips this down to the IP address. Then put the IP addresses in order, and throw out duplicates. Every item in the resulting list is then added to an iptables rule that blocks access. Done. QED.

It's inefficient, sure, but not that inefficient. And it works. Mostly this just cleans up logs, and I suppose using something like fail2ban would work just as well, but I'm not sure what kind of added security benefit that would offer, and besides it wouldn't make me feel quite so smart.

I hope this is helpful for you all.