Non-Trad Software Engineer

It happened gradually, and it wasn't entirely an intentional thing, but at some point I became a software engineer. While a lot of people become software engineers, many of them have formal backgrounds in engineering, or have taken classes or done programs to support this retooling (e.g. bootcamps or programming institutes.)

I skipped that part.

I wrote scripts from time to time for myself, because there were things I wanted to automate. Then I was working as a technical writer and had to read code that other people had written for my job. Somewhere in there I was responsible for managing the publication workflow, and write a couple of build systems.

And then it happened.

I don't think it's the kind of thing that is right for everyone, but I was your typical, nerdy/bookish kid who wasn't great in math class, and I suspect that making software is the kind of thing that a lot of people could do. I don't think that my experience is particularly replicable, but I have learned a number of useful (and important) things, and I realize as I've started writing more about what I'm working on now, I realize that I've missed some of the fundamentals [1]

Formal education in programming, from what I've been able to gather strikes me as really weird: there are sort of two main ways of teaching people about software and computer science: Option one is that you start with a very theoretical background that focuses on data structures, the performance of algorithms, or the internals of how core technologies function (operating systems, compilers, databases, etc.) Option two, is that you spend a lot of time learning about (a) programming language and about how to solve problems using programming.

The first is difficult, because the theory [2] is not particularly applicable except invery rare cases and only at the highest level which is easy to back-fill as needed. The second is also challenging, as idioms change between languages and most generic programming tasks are easily delegated to libraries. The crucial skill for programming is the ability to learn new languages and solve problems in the context of existing systems, and developing a curriculum to build those skills is hard.

The topics that I'd like to write about include:

  • Queue behavior, particularly in the context of distributed systems.
  • Observability/Monitoring and Logging, particularly for reasonable operations at scale.
  • build systems and build automation.
  • unit-testing, test automation, and continuous integration.
  • interface design for users and other programmers.
  • maintaining and improving legacy systems.

These are, of course, primarily focused on the project of making software rather than computer science or computing in the abstract. I'm particularly interested (practically) in figuring out what kinds of experiences and patterns are important for new programmers to learn, regardless of background. [3] I hope you all find it interesting as well!

[1]This is, at least in part, because I mostly didn't blog very much during this process. Time being finite and all.
[2]In practice, theoretical insights come up pretty infrequently and are mostly useful for providing shorthand for characterizing a problem in more abstract terms. Most of the time, you're better off intuiting things anyway because programming is predominantly a pragmatic exercise. For the exceptions, there are a lot of nerds around (both at most companies and on the internet) who can figure out what the proper name is for a phenomena and then you can look on wikipedia.
[3]A significant portion of my day-to-day work recently has involved mentoring new programmers. Some have traditional backgrounds or formal technical education and many don't. While everyone has something to learn, I often find that because my own background is so atypical it can be hard for me to outline the things that I think are important, and to identify the high level concepts that are important from more specific sets of experiences.

In Favor of an Application Infrastructure Framework

The byproduct of a lot of my work on Evergreen over the past few years has been that I've amassed a small collection of reusable components in the form of libraries that address important but not particularly core functionality. While I think the actual features and scale that we've achieved for "real" features, the infrastructure that we built has been particularly exciting.

It turns out that I've written about a number of these components already here, even. Though I think, my initial posts were about these components in their more proof-of-concept stage, now (finally!) we're using them all in production so their a bit more hardened.

The first grip is a logging framework. Initially, I thought a high-level logging framework with plug-able backends was going to be really compelling. While configurable back-ends has been good for using grip as the primary toolkit for writing messaging and user-facing alerting, the most compelling feature has been structured logging.

Most of the logging that we do, now, (thanks to grip,) has been to pass structures (e.g. maps) to the logger with key/value data. In combination with log aggregation services/tools (like ELK, splunk, or sumologic,) we can basically take care of nearly all of our application observablity (monitoring) use cases in one stop. It includes easy to use system and golang runtime metrics collection, all using an easy push-based collection, and can also power alert escalation. After having maintained an application using this kind of event driven structured logging system, I have a hard time thinking about running applications without it.

Next we have amboy which is a queue-system. Like grip, all of the components are plug-able, so it support in-memory (ephemeral) queues, distributed queues, dependency graph systems and priority queue implementations as well as a number of different execution models. The most powerful thing that amboy affords us is a single and clear abstraction for defining "background" execution and workloads.

In go it's easy to spin up a go routine to do some work in the background, it's super easy to implement worker pools to parallelize the processing of simple tasks. The problem is that as systems grow, it becomes pretty hard to track this complexity in your own code, and we discovered that our application was essentially bifurcated between offline (e.g. background) and online (e.g. request-driven) work. To address all of this problem, we defined all of the background work as small, independent units of work, which can be easily tested, and as a result there is essentially no-adhoc concurrency in the application except what runs in the queues.

The end result of having a unified way to characterize background work is that scaling the application because much less complicated. We can build new queue implementations, without needing to think about the business logic of the background work itself, and we add capacity by increasing the resources of worker machines without needing to think about the architecture of the system. Delightfully, the queue metaphor is independent of external services, so we can run the queue in memory backed by a heap or hash map with executors running in dedicated go-routines if we want, and also scale it out to use databases or dedicated queue services with additional process-based workers, as needed.

The last component, gimlet, addresses building HTTP interfaces, and provides tools for registering routes, writing responses, managing middleware and authentication, an defining routes in a way that's easy to test. Gimlet is just a wrapper around some established tools like negroni, gorilla/mux, all built on established standard-library foundations. Gimlet has allowed us to unify a bunch of different approaches to these problems, and has lowered the barrier to entry for most of our interfaces.

There are other infrastructural problems still on the table: tools for building inter-system communication and RPC when you can't communicate via a queue or a shared database (I've been thinking a lot about gRPC and protocol buffers for this,) and also about object-mapping and database access patterns, which I don't really have an answer for. [1]

Nevertheless, with the observability, background tasks, and HTTP interface problems well understood at supported, it definitely frees developers to spend more of their time focused core problems of importance to users and the goals of the project. Which is a great place to be.

[1]I built a database migration tool called anser which is mostly focused on integrating migration workflows into production systems so that migrations are part of the core code and can run without affecting production traffic, and while these tools have been useful, I haven't seen a clear path between this project and meaningfully simplifying the way we manage access to data.

Combating Legacy Code

I wrote some notes about to write a post about a software project I worked on a year and a half ago, that I think is pretty cool, but I was on writing hiatus. Even better the specific code in question is now no longer in use. But I think it serves as a useful parable, but I will attempt to reflect.

Go's logging [2] support in standard library works, and it successfully achieves its goals on its own terms. The problem is that it's incredibly simple and lacks a number of features that are standard in most logging systems. [3] So as a result, I'm not surprised that most applications of consequence either use a couple of more fully featured logging packages or end up writing a large number of logging wrappers.

The fact that my project at work was using a special logging library is not particularly surprising, particularly because the project is old for a Go project. The logging library in question is a log4j-inspired package, that had been developed by a different group internally, but was no longer being used by that group. It worked, but there were a host of problems. [4]

I'd also written a logging package myself which was a definite improvement on the state of the art. I had two chief problems:

  • how to convince teammates to make the change,
  • how to make the change without disrupting ongoing work or the functioning of the system which had to be always deploy-able.

Here's what I did...

First, I learned as much as I could about the existing system, it's history and how we used it. I read a lot of code, documentation (such as it was,) and also related bug reports, feature requests, and history.

Second, I implemented wrappers for my system that (mostly) cloned the interfaces for the existing library in my own package. It's called slogger, and it's still there, though I hope to delete it soon. I wanted to make it possible to make the switch [1] in the project initialization without needing to change every last logging statement. [5]

Then, we actually made the change so that logging used the new code internally but wrapped by the old interfaces. I think there were a couple of very obvious bugs early on, but frankly none of them are so memorable that I could describe them any more.

Finally, we went through and updated all of the logging statements. It was a big change, and impacted all of the code, but it happened quite late in the process and there were no bugs, because it was the least interesting or radical part of the project.

And then we had a new logger. It's been great. With the new tool we've been able to easily add support for more structured approaches to logging and collecting log output in a variety of third party services.

In summary:

  • replacing legacy subsystems can be a good way to improve the functionality of your project.
  • change is hard, but there are ways to make changes easier and less disruptive. They often involve doing even more work.*
  • write code to facilitate transitions, and then delete it later.
  • the larger a change is, the less risky it should be. While there are lots of small-and-low risk changes you can make, the inverse should be true as rarely as possible.
[1]Potentially this should have been behind a feature flag, though I think I didn't actually use a feature flag.
[2]This is to say, application logging facility.
[3]This includes filtering by log level, different formatting options, (semi) structured logging, conditional logging, buffering, and other options.
[4]Hilariously something in the way we were using the logger was tripping up the race detector. While the logger did a decent job of providing the file name and line number of the logging statement, it was pretty focused on printing content to a file/standard output.
[5]The short version here is, "interfaces are great."

Going Forward

I wrote a post about moving on from being a technical writer, and I've definitely written some since then about programming and various side projects, but I haven't really done the kind of public reflection on this topic that I've done historically about, many other things.

When I switched to a programming team, I knew some things about computers, and I was a decent Python programmer. The goal, then was to teach myself a second programming language (Go,) and learn how to make "real" software with other people, or on teams with other people. Both of those projects are going well: I think I've become pretty solid as a Go programmer, although, it's hard to say what "real" software is, or if I'm good at making it, but all indications are positive.

This weekend, for various reasons, I've been reviving a project that I did some work on this fall and winter, that I've abandoned for about 6 months. It's been both troubling (there are parts that are truly terrible,) and kind of rewarding to see how much I've grown as a programmer just from looking at the code.

Queue then, I guess, the self reflective interlude.

My reason for wanting to learn--really learn--a second programming language, was to make sure that all the things I knew about system design, algorithms, and data structures was generalizable, and not rooted in the semantics of a specific language or even implementation of that language. I was also interested in learning more about the process of learning new programming languages so that I had some experience with the learning process, which may come in handy in the future.

Learning Go, I think helped me achieve or realize these goals. While I haven't really set out to learn a third language yet, it feels tractable. I've also noticed some changes and differences in some other aspects of my interests.

I used to be really interested in programming qua programming, and I thought a lot about programming languages. While I still can evaluate programming languages, and have my own share of opinions about "the way things work," I'm less concerned with the specific syntax or implementation. I think a lot about build tools, platform support, deployment models, and distributing methods and stories, rather than what it can do or how you have to write it. Or, how you make it ship it and run it.

I've also gotten less interested in UNIX-esque systems administration and operations, which is historically a thing I've been quite interested in. These days, I find myself thinking more about the following kinds of problems:

  • build systems, the tools building software from source files, (and sometimes testing it!) and the ways to do this super efficiently and sensibly. Build systems are quite hard because in a lot of ways they're the point through which your software (as software) interacts with all of the platforms it runs on. Efficient build systems have a huge impact on developer productivity, which is a big interest.
  • developer productivity, this is a big catch all category, but it's almost always true that people are more expensive than computers, so working on tools and features (like better build systems, or automating various aspects of the development process,)
  • continuous integration and deployment, again connected to developer productivity, but taking the "automate building and testing," story to its logical conclusion. CD environments mean you deploy changes much more often, but you also require and force yourself to trust the automated systems and make sure that project leadership and management is just as automated as the development experience.
  • internal infrastructure, as in "internal services and tools that all applications need," like logging, queuing systems, abstractions for persistence, deployment systems, testing, and exposed interfaces (e.g. RPC systems, REST/HTTP, or command line option option parsing). Having good tools for these generic aspects of the application make writing actual features for users easier. I'm also increasingly convinced that the way to improve applications and systems is to improve these lower level components and their interfaces.

Free Software and open source are still important, as is UNIX, but these kinds of developer productivity and automation issues are a level above that. I've changed in the last 5 years, software has changed in the last five years, the way we run software on systems has changed in the last 5 years. I'm super excited to see what kinds of things I can do in this space, and where I end up in 5 years.

I'm also interested in thinking about ways to write about this. I'd written drafts of a number of posts that were about learning how to program, about systems administration, and now that I'm finding and making more time for writing, one of the things I don't really know about is what kind of writing on these topics I'm interested in doing, or how to do it in a way that anyone would be interested in reading.

We shall see. Regardless, I hope that I'm back, now.

Novel Automation

This post is a follow up to the interlude in the /posts/programming-tutorials post, which part of an ongoing series of posts on programmer training and related issues in technological literacy and education.

In short, creating novel automations is hard. The process would have to look something like:

  1. Realize that you have an unfulfilled software need.
  2. Decide what the proper solution to that need is. Make sure the solution is sufficiently flexible to be able to support all required complexity.
  3. Then sit down, open an empty buffer and begin writing code.

Not easy. [1]

Something I've learned in the past few years is that the above process is relatively uncommon for actual working programmers: most of the time you're adding a few lines here and there, testing various changes or adding small features built upon other existing systems and features.

If this is how programming work is actually done, then the kinds of methods we use to teach programmers how to program should hold some resemblance to the actual work that programmers do. As an attempt at a case study, my own recent experience:

I've been playing with Buildbot for a few weeks now for personal curiosity, and it may be useful to automate some stuff for the Cyborg Institute. Buildbot has its merits and frustrations, but this post isn't really about buildbot. Rather, the experience of doing buildbot work has taught me something about programming and about "building things," including:

  • When you set up buildbot, it generates a python configuration file where all buildbot configuration and "programming" goes.

    As a bit of a sidebar, I've been using a base configuration derived from the buildbot configuration for buildbot itself, and the fact that the default configuration is less clean and a big and I'd assumed that I was configuring a buildbot in the "normal way."

    Turns out I haven't, and this hurts my (larger) argument slightly.

    I like the idea of having a very programmatic interface for systems that must integrate with other components, and I really like the idea of a system that produces a good starting template. I'm not sure what this does for overall maintainability in the long term, but it makes getting started and using the software in a meaningful way, much more possible.

  • Using organizing my buildbot configuration as I have, modeled on the "metabuildbot," has nicely illustrated the idea software is just a collection of modules that interact with each other in a defined way. Nothing more, nothing less.

  • Distributed systems are incredibly difficult to get people to conceptualize properly, for anyone, and I think most of the frustration with buildbot stems from this.

  • Buildbot provides an immediate object lesson on the trade-offs between simplicity and terseness on the one hand and maintainability and complexity on the other.

    This point relates to the previous one. Because distributed systems are hard, it's easy to configure something that's too complex and that isn't what you want at all in your Buildbot before you realize that what you actually need is something else entirely.

    This doesn't mean that there aren't nightmarish Buildbot configs, and there are, but the lesson is quite valuable.

  • There's something interesting and instructive in the way that Buildbot's user experience lies somewhere between "an application," that you install and use, and a program that you write using a toolkit.

    It's clearly not exactly either, and both at the same time.

I suspect some web-programming systems may be similar, but I have relatively little experience with systems like these. And frankly, I have little need for these kinds of systems in any of my current projects.

Thoughts?

[1]Indeed this may be why the incidence of people writing code, getting it working and then rewrite it from the ground up: writing things from scratch is an objectively hard thing, where rewriting and iterating is considerably easier. And the end result is often, but not always better.

Programming Tutorials

This post is a follow up to my :doc`/posts/coding-pedagogy` post. This "series," addresses how people learn how to program, the state of the technical materials that support this education process, and the role of programming in technology development.

I've wanted to learn how to program for a while and I've been perpetually frustrated by pretty much every lesson or document I've ever encountered in this search. This is hyperbolic, but it's pretty close to the truth. Teaching people how to program is hard and the materials are either written by people who:

  • don't really remember how they learned to program.

Many programming tutorials were written by these kinds of programmers, and the resulting materials tend to be decent in and of themselves, but they fail to actually teach people how to program if they don't know how to program already.

If you already know how to program, or have learned to program in a few different languages, it's easy so substitute "learning how to program," with "learn how to program in a new language" because that experience is more fresh, and easier to understand.

These kinds of materials will teach the novice programmer a lot about programming languages and fundamental computer science topics, but not anything that you really need to learn how to write code.

  • people who don't really know how to program.

People who don't know how to program tend to assume that you can teach by example, using guided tutorials. You can't really. Examples are good for demonstrating syntax and procedure, and answering tactical questions, but aren't sufficient for teaching the required higher order problem solving skills. Focusing on the concrete aspects of programming syntax, the standard library, and the process for executing code isn't enough.

These kinds of documents can be very instructive, and outsider perspective are quite useful, but if the document can't convey how to solve real problems with code, you'll be hard pressed to learn how to write useful programs from these guides.

In essence, we have a chicken and egg problem.


Interlude:

Even six months ago, when people asked me "are you a programmer?" (or engineer,) I'd often object strenuously. Now, I wave my hand back and forth and say "sorta, I program a bit, but I'm the technical writer." I don't write code on a daily basis and I'm not very nimble at starting to write programs from scratch, but sometimes when the need arises, I know enough to write code that works, to figure out the best solution to fix at least some of the problems I run into.

I still ask other people to write programs or fix problems I'm having, but it's usually more because I don't have time to figure out an existing system that I know they're familiar with and less because I'm incapable of making the change myself.

Even despite these advances, I still find it hard to sit down with a blank buffer and write code from scratch, even if I have a pretty clear idea of what it needs to do. Increasingly, I've begun to believe that this is the case for most people who write code, even very skilled engineers.

This will be the subject of an upcoming post.


The solution(s):

1. Teach people how to code by forcing people to debug programs and make trivial modifications to code.

People pick up syntax pretty easily, but struggle more with the problem solving aspects of code. While there are some subtle aspects of syntax, the compiler or interpreter does enough to teach people syntax. The larger challenge is getting people to understand the relationship between their changes and behavior and any single change and the reset of a piece of code.

2. Teach people how to program by getting them to solve actual problems using actual tools, libraries, and packages.

Too often, programming tutorials and examples attempt to be self-contained or unrealistically simple. While this makes sense from a number of perspectives (easier to create, easier to explain, fewer dependency problems for users,) it's incredibly uncommon and probably leads to people thinking that a lot of programming revolves around re-implementing solutions to solved problems.

I'm not making a real argument about computer science education, or formal engineering training, with which I have very little experience or interest. As contemporary, technically literate, actors in digital systems, programming is a relevant for most people.

I'm convinced that many people do a great deal of work that is effectively programming: manipulating tools, identifying and recording procedures, collecting information about the environment, performing analysis, and taking action based on collected data. Editing macros, mail filtering systems, and spreadsheets are obvious examples though there are others.

Would teaching these people how programming worked and how they could use programming tools improve their digital existences? Possibly.

Would general productivity improve if more people new how to think about automation and were able to do some of their own programming? Almost certainly.

Would having more casual programmers create additional problems and challenges in technology? Yes. These would be interesting problems to solve as well.

Coding Pedagogy

There are two parts to this post: first, the relationship or non-relationship between the ability to write code and technical literacy; and second, the pedagogical methods for teaching people how to program/code.

In some ways, I've been writing about this and related topics for quite a while: see /posts/objective-whatsis for an earlier iteration in this train of thought.

Programming and Technical Literacy

Programmers and other technical folks talk a lot about teaching young people to code as the central part of any young technical person's education and basic computer literacy. Often this grows out of nostalgia for their own experience learning to program, but there are other factors at play. [1]

In some cases, they even start or point to projects like Codecademy. Which are, in truth, really cool ideas, but I think that effectively equating the ability to write code with technical literacy is fraught:

  • There are many different kinds of technical literacy and writing code is really such a small part. Sure code gives us a reasonable way to talk about things like design and architecture, but actually writing code is such a small part of developing technology.

  • Writing code isn't that important, really. In a lot of ways, code is just an implementation detail. Important as a way of describing some concepts pretty quickly, important because it's impossible to iterate on ideas without something concrete to point to, but the implementation isn't nearly as important as the behavior or the interface.

  • For the last ~40 years, code has been the way that people design behavior and specify interfaces for software. While there are a lot of reasons why this predominantly takes the form of code, there's not particular reason that we can't express logic and describe interfaces using other modalities.

    There are many people who are very technically literate and productive who don't write code, and I think that defining literacy as being able to write code, is somewhat short sighted. Also, there is another group of people who are actually programmers who don't think of the things they do as "programming," like people who do crazy things with spreadsheets, most librarians, among others. These non-coding programmers may shy away from programming or are mostly interested in the output of the program they write and less interested in the programming itself.

This is a huge problem. I hope that this /posts/computer-literacy-project that I've been planning will start to address some of these issues, but there's even more work to do.

How to Teach People to Code

(This section of the post derives from and summaries the "How to Teach People to Program" wiki page.)

Most of the way that programming books and courses teach programming are frustrating and somewhat dire, for a few reasons:

  • Most examples in programming books are dumb.
  • Basic computer science/engineering knowledge is fundamental to the way that accomplished programmers think about programming but aren't always required to teach people how to program.
  • Syntax isn't that important, but you can't ignore it either.
  • Slow reveals are really frustrating.
  • The kinds of code that you write when learning to programming bear little resemblance to the actual work that programmers do.

The solutions to these problems are complex and there are many possible solutions. As a starting point:

  • Separate the way you present core concepts (i.e. data structures, typing, functions, classes, etc.) from actual code examples and from actual explanations of the syntax.

    Interlink/cross reference everything, but if you give people the tools to answer their own questions they'll learn what they actually need to know, and you can then do a better job of explaining the syntax, basic concepts, and practical examples.

  • Provide longer examples that aren't contrived.

    Examples don't need to start from first principals, and don't need to be entirely self contained. Programming work rarely starts from first principals (relative,) and is rarely actually self contained. It's foolish, then to use these sorts of pedagogical tools.

Thoughts?

[1]In addition there's a related fear that many people who don't have experience with the technology of the 1980s and 1990s won't have the required technological skills to innovate in another 10 or 20 years.

Emacs Thoughts + Some Lisp

In no particular order:

Org Mode Guilt and a Lisp Function

I have some guilt about having mostly forsaken org-mode, [1] in particular because I was watching Sacha Chua's chat with John Wiegley, and I think both are such nifty hackers, and have done so many things that are pretty darn nifty.

I liked what I heard about johnw's org mode setup so much that I might give it a try again. But in the mean time, I wanted to make my "recompile my tasklist function" to be a bit more clean. The result is follows:

(defun tychoish-todo-compile ()
   (interactive)
   (if (get-buffer "*todo-compile*")
       (progn
          (switch-to-buffer-other-window (get-buffer "*todo-compile*"))
          (recompile))
       (progn
          (compile "make -j -k -C ~/wiki")
          (switch-to-buffer-other-window "*compilation*")
          (rename-buffer "*todo-compile*")))
       (revbufs))

Notables:

  • This is the first time I've used progn which is somewhat embarrassing, but it's a great thing to have in the toolkit now. Link: progn
  • I hadn't realized until now that there wasn't an else-if form in emacs lisp. Weird, but it makes sense.
  • Compilation Mode is pretty much my current favorite thing in emacs.
  • revbufs is this amazing thing that reverts buffers if there aren't local modifications, and also reports to you if a buffer has changed outside of emacs and there are local modifications. So basically "does everything you want without destroying anything and then tells you what you need to do manually." Smart. Simple. Perfect.

I might need to "macro-ize" this, as I have a lot of little compile processes for which I'd like to be able to trigger/maintain unique compile buffers. That's a project for another day.

Emacs Thoughts

I'm even thinking about putting together a post about how, although I'm a diehard emacs user, and I've spent a fair bit of time learning how to do really great things with emacs, there are a lot of vim-ish things in my workflow:

  • I read email with mutt and I've tried to get into GNUS, and I try it again every now and then, but I always find it so unbelievably gnarly. At least the transition. Same with Notmuch, which I like a lot more (in theory,) but I find the fact that Notmuch and mutt have this fundamental misunderstanding about what constitutes a "read" email, to be tragic.

  • I use a crazy ikiwiki + deft + makefile setup for task tracking. As (obliquely) referenced above.

    I might give org another shot, and I've been looking at task warrior, but the sad truth is that what I have works incredibly well for in most cases, and switching is hard.

  • I tend jump to a shell window to do version control and other things, even though I'm familiar with magit and dired, my use of these tools is somewhat spotty.

[1]It's not that I think org-mode sucks, or anything. Far from it, but how I was using org-mode was fundamentally not working for me. I'm thinking about giving it a try again, but we'll see.