How to Choose a Programming Language

2022-06-29 – tychoish

I talk to lots of people about software and programming: people who are trying to make a technical decision for a new project or who are interested in learning something new, and some form of the question “what programming language should I learn?” or “what’s the best language for this new project?” comes up a lot.

These are awful questions, because there is no singular right answer, and in some senses all answers are wrong. This post will be an exploration of some decent answers to this question, and some useful ways to think about the differences between programming languages.

If you already build and maintain software in one programming language, build new components in the same language you already use. Adding new tools and technologies increases maintenance burden for all engineers, and software tends to stick around for a long time, so this cost can stick around for a long time.
Sometimes the software you want to write must target a specific runtime or environment, there’s really only one reasonable choice. The prototypical examples of these are things like: iOS apps (Swift,) Android apps (Kotlin), or things that run in the browser (JavaScript,) although:
Given things like React Native and Electron, it’s reasonable to just write JavaScript for all GUI code, although often this might actually mean TypeScript in practice. While it used to be the case that it made sense to write GUI code in various native tool kits, at this point it seems like it makes sense to just figure out ways of doing it all in JS.
If you already know how to program in one language, and want to learn something new, but don’t have a specific project in mind attempt to learn something that’s quite different from you already know: if you’re comfortable in something like Python, try and learn something like Go or Rust. If you’re primarily a Java programmer, something like JavaScript or Python might be an interesting change of pace.

The same basic ideas applies to selecting languages that will be used by teams: choose a tool that’s complementary to what you’re already doing, and that could provide value.
If you’re more familiar with a few programming languages or don’t feel you need to learn a new language for professional reasons pick something fun and off the wall: Ocaml! Common Lisp! Rust! Haskell! Scheme! Elixir! It doesn’t matter and in these cases you probably can probably learn new languages when you need, the point is to learn something that’s radically different and to help you think about computers and programming in radically different ways.
Choose the language that people working on similar projects are already using. For instance, if you’re doing a lot of data science, using Python makes a lot of sense; if you’re writing tools that you expect banks (say) to use, something that runs on the JVM is a good bet. The idea here is you may be able to find more well developed tools and resources relevant to the kinds of problems you encounter.
When starting a new project and there isn’t a lot of prior art in the area that you’re working, or you want to avoid recapitulating some flaw in the existing tools, you end up having a lot of freedom. In general:
- Think about concurrency and workload characteristics. Is the workload CPU, Network, or IO bound? Is the application multithreaded, or could take advantage of parallelism within processes? There are different kinds of concurrency, and different execution models, so this isn’t always super cut-and-dry: theoretically languages that have “real threads” (C/C++, Java, Rust, Common Lisp, etc.) or a close enough approximation (Go,) are better, but for workloads that are network bound, event-driven systems (e.g. Python’s Tornado and Node.J’s) work admirably.
- How will you distribute and run the application? There are some languages that can provide static binaries that include all of their dependencies for distribution, which can simplify some aspects of distribution and execution process, but for software that you control the runtime (e.g. services deployed on some kind of container based-platform,) it might matter less.
- Are there strong real-time requirements? If so, and you’re considering a garbage collected language, make sure that the GC pauses aren’t going to be a problem. It’s also the case that all GCsare not the same, so having a clear idea of what the tolerances
- Is this software going to be maintained by a team, and if so, what kind of tools will they need in order to succeed and be productive. Would static typing help? What’s the developer tooling and experience like? Are there libraries that you’d expect to need that are conspicuously missing?

Have fun! Build cool things!

The Emacs Daemon GTK Bug, A Parable

2022-03-01 – tychoish

There’s this relatively minor Emacs bug that I’ve been aware of for a long time, years. The basic drift is that on Linux systems, when running with GTK/Emacs as a daemon, and the X11 session terminates for any reason the Emacs daemon terminates. Emacs daemons are great: you start Emacs once, and it keeps running independently of what ever windows you have open. You can leave files open in Emacs buffers and not have move between different projects with minimal context switching costs.

First of all, emacs’s daemon mode is weird. I can’t think of another application that starts as a daemon (in the conventional UNIX double-forking manner,) and then a client process runs and spawns GUI (potentially) windows. If there are other applications that work this way, there aren’t many.

Nevertheless, being able to restart the window manager without loosing the current state of your Emacs session is one of the chief reasons to run Emacs in daemon mode, so this bug has always been a bit irksome. Also since it’s real, and for sure a thing, why has it taken so long to address? Lets dig a little bit deeper.

There are two GNOME bugs related to this:

What’s happening isn’t interesting or complicated: Emacs calls an API, which behaves differently than Emacs expects and needs, but not (particularly) differently than GNOME expects or needs. Which means GNOME has little incentive to fix the bug--if they even could without breaking other users of this API.

Emacs can’t fix the problem on their own, without writing a big hack around GNOME components, which wouldn’t be particularly desirable or viable, and because this works fine with the other toolkit (and is only possible in some situations,) it doesn’t feel like an Emacs bug.

We have something of a stalemate. Both party thinks the other is at fault. No one is particularly incentivized to fix the problem from their own code, and there is a work around,¹ albeit a kind of gnarly one.

This kind of issue feels, if not common, incredibly easy for a project--even one like emacs--to stumble into and quite easy to just never resolve. This kind of thing happens, in some form, very often and boundaries between libraries make it even more likely.

On the positive side, It does seem like there’s recent progress on the issue, so it probably won’t be another 10 years before it gets fixed, but who knows.

To avoid this problem either: don’t use GUI emacs windows and just use the terminal (fairly common, and more tractable as terminal emulators have improved a bunch in the past few years,) or use the Lucid GUI toolkit, which doesn’t depend on GTK at all. The lucid build is ugly (as the widgets don’t interact with GTK settings,) but its light weight and doesn’t suffer the ' ↩︎

How to Become a Self-Taught Programmer

2021-03-30 – tychoish

i am a self taught programmer. i don’t know that i’d recommend it to anyone else there are so many different curricula and training programs that are well tested and very efficacious. for lots of historical reasons, many programmers end up being all or mostly self taught: in the early days because programming was vocational and people learned on the job, then because people learned programming on their own before entering cs programs, and more recently because the demand for programmers (and rate of change) for the kinds of programming work that are in the most demand these days. and knowing that it’s possible (and potentially cheaper) to teach yourself, it seems like a tempting option.

this post, then, is a collection of suggestions, guidelines, and pointers for anyone attempting to teach themselves to program:

focus on learning one thing (programming language and problem domain) at a time. there are so many different things you could learn, and people who know how to program seem to have an endless knowledge of different things. knowing one set of tools and one area (e.g. “web development in javascript,” or “system administration in python,") gives you the framework to expand later, and the truth is that you’ll be able to learn additional things more easily once you have a framework to build upon.
when learning something in programming, always start with a goal. have some piece of data that you want to explore or visualize, have a set of files that you want to organize, or something that you want to accomplish. learning how to program without a goal, means that you don’t end up asking the kinds of questions that you need to form the right kinds of associations.
programming is all about learning different things: if you end up programming for long enough you’ll end up learning different languages, and being able to pick up new things is the real skill.
being able to clearly capture what you were thinking when you write code is basically a programming super power.
programming is about understanding problems¹ abstractly and building abstractions around various kinds of problems. being able break apart these problems into smaller core issues, and thinking abstractly about the problem so that you can solve both the problem in front of you and also solve it in the future are crucial skills.
collect questions or curiosities as you encounter them, but don’t feel like you have to understand everything, and use this to guide your background reading, but don’t feel like you have to hunt down the answer to every term you hear or see that you don’t already know immediately. if you’re pretty rigorous about going back and looking things up, you’ll get a pretty good base knowledge over time.
always think about the users of your software as you build, at every level. even if you’re building software for your own use, think about the future version of yourself that will use that software, imagine that other people might use the interfaces and functions that you write and think about what assumptions they might bring to the table. think about the output that your program, script, or function produces, and how someone would interact with that output.
think about the function as the fundamental building block of your software. lower level forms (i.e. statements) are required, but functions are the unit where meaning is created in the context of a program. functions, or methods depending, take input (arguments, ususally, but sometimes also an object in the case of methods) and produce some output, sometimes with some kind of side-effect. the best functions:
- clearly indicate side-effects when possible.
- have a mechanism for reporting on error conditions (exceptions, return values,)
- avoid dependencies on external state, beyond what is passed as arguments.
- are as short as possible.
- use names that clearly describe the behavior and operations of the function.
if programming were human language (english,) you’d strive to construct functions that were simple sentences and not paragraph’s, but also more than a couple of words/phrases, and you would want these sentences to be clear to understand with limited context. if you have good functions, interfaces are more clear and easier to use, code becomes easier to read and debug, and easier to test.
avoid being too weird. many programmers are total nerds, and you may be too, and that’s ok, but it’s easier to learn how to do something if there’s prior art that you can learn from and copy. on a day-to-day basis, a lot of programming work is just doing something until you get stuck and then you google for the answer. If you’re doing something weird--using a programming language that’s less widely used, or in a problem space that is a bit out of mainstream, it can be harder to find answers to your problems.

Notes

I use the term “problem” to cover both things like “connecting two components internally” and also more broadly “satisfying a requirement for users,” and programming often includes both of these kinds of work. ↩︎

Does Anyone Actually Want Serverless?

2021-02-23 – tychoish

Cloud computing, and with it most of tech, has been really hot on the idea of “serverless” computing, which is to say, services and applications that are deployed, provisioned, and priced separate from conventional “server” resources (memory, storage, bandwidth.) The idea is that we can build and expose ways of deploying and running applications and services, even low-level components like “databases” and “function execution”, in ways that mean that developers and operators can avoid thinking about computers qua computers.

Serverless is the logical extension of “platform as a service” offerings that have been an oft missed goal for a long time. You write high-level applications and code that is designed to run in some kind of sandbox, with external services provided in some kind of ala carte model via integrations with other products or services. The PaaS, then, can take care of everything else: load balancing incoming requests, redundancy to support higher availability, and any kind of maintains on the lower level infrastructure. Serverless is often just PaaS but more: provide a complete stack of services to satisfy needs (databases, queues, background work, authentication, caching, on top of the runtime,) and then change the pricing model to be based on request/utilization rather than time or resources.

Fundamentally, this allows the separation of concerns between “writing software,” and “running software,” and allows much if not all of the running of software to be delegated to service providers. This kind of separation is useful for developers, and in general runtime environments seem like the kind of thing that most technical organizations shouldn’t need to focus on: outsourcing may actually be good right?

Well maybe.

Let’s be clear, serverless platforms primarily benefit the provider of the services for two reasons:

serverless models allow providers to build offerings that are multi-tenant, and give provider the ability to reap the benefit of managing request load dynamically and sharing resources between services/clients.
utilization pricing for services is always going to be higher than commodity pricing for the underlying components. Running your on servers (“metal”) is cheaper than using cloud infrastructure, over time, but capacity planning, redundancy, and management overhead, make that difficult in practice. The proposition is that while serverless may cost more per-unit, it has lower management costs for users (fewer people in “ops” roles,) and is more flexible if request patterns change.

So we know why the industry seems to want serverless to be a thing, but does it actually make sense?

Maybe?

Makers of software strive (or ought to) make their software easy to run, and having very explicit expectations about the runtime environment, make software easier to run. Similarly, being able to write code without needing to manage the runtime, monitoring, logging, while using packaged services for caching storage and databases seems like a great boon.

The downsides to software producers, however, are plentiful:

vendor lock-in is real, not just because it places your application at the mercy of an external provider, as they do maintenance, or as their API and platform evolves on their timeframe.
hosted systems, mean that it’s difficult to do local development and testing: either every developer needs to have their own sandbox (at some expense and management overhead), or you have to maintain a separate runtime environment for development.
application’s cannot have service levels which exceed the service level agreements of their underlying providers. If your serverless platform has an SLA which is less robust than the SLA of your application you’re in trouble.
when something breaks, there are few operational remedies available. Upstream timeouts are often not changeable and most forms of manual intervention aren’t available.
pricing probably only makes sense for organizations operating at either small scale (most organizations, particularly for greenfield projects,) but is never really viable for any kind of scale, and probably doesn’t make sense in any situation at scale.
some problems and kinds of software just don’t work in a serverless model: big data sets that exceed reasonable RAM requirements, data processing problems which aren’t easily parallelizable, workloads with long running operations, or workloads that require lower level network or hardware access.
most serverless systems will incur some overhead over dedicated/serverfull alternatives and therefore have worse performance/efficiency, and potentially less predictable performance, especially in very high-volume situations.

Where does that leave us?

Many applications and bespoke tooling should probably use serverless tools. Particularly if your organization is already committed to a specific cloud ecosystem, this can make a lot of sense.
Prototypes, unequivocally make sense to rely on off-the-shelf, potentially serverless tooling, particularly for things like runtimes.
If and when you begin to productionize applications, find ways to provide useful abstractions between the deployment system and the application. These kinds of architectural choices help address concerns about lock-in and making it easy to do development work without dependencies.
Think seriously about your budget for doing operational work, holistically, if possible, and how you plan to manage serverless components (access, cost control, monitoring and alerting, etc.) in connection with existing infrastructure.

Serverless is interesting, and I think it’s interesting to say “what if application development happened in a very controlled environment with a very high level set of APIs.” There are clearly a lot of cases where it makes a lot of sense, and then a bunch of situations where it’s clearly a suspect call. And it’s early days, so we’ll see in a few years how things work out. In any case, thinking critically about infrastructure is always a good plan.

The Kubernetes Cloud Mainframe

2021-02-18 – tychoish

I made a tongue-in-cheek comment on twitter a while back that, k8s is just the contemporary API for mainframe computing., but as someone who is both very skeptical and very excited about the possibilities of kube, this feels like something I want to expand upon.

A lot of my day-to-day work has some theoretical overlap with kube, including batch processing, service orchestration, and cloud resource allocation. Lots of people I encounter are also really excited by kube, and its interesting to balance that excitement with my understanding of the system, and to watch how Kubernetes(as a platform) impacts the way that we develop applications.

I also want to be clear that my comparison to mainframes is not a disparagement, not only do I think there’s a lot of benefit to gain by thinking about the historic precedents of our paradigm. I would also assert that the trends in infrastructure over the last 10 or 15 years (e.g. virtualization, containers, cloud platforms) have focused on bringing mainframe paradigms to a commodity environment.

Observations

clusters are static ususally functionally. I know that the public clouds have autoscaling abilities, but having really elastic infrastructure requires substantial additional work, and there are some reasonable upper-limits in terms of numbers of nodes, which makes it hard to actually operate elastically. It’s probably also the case that elastic infrastructure has always been (mostly) a pipe dream at most organizations.
some things remain quite hard, chiefly in my mind:
- autoscaling, both of the cluster itself and of the components running within the cluster. Usage patterns are don’t always follow easy to detect patterns, so figuring out ways to make infrastructure elastic may take a while to converse or become common. Indeed, VMs and clouds were originally thought to be able to provide some kind of elastic/autoscaling capability, and by and large, most cloud deployments do not autoscale.
- multi-tenancy, where multiple different kinds of workloads and use-cases run on the same cluster, is very difficult to schedule for reliably or predictably, which leads to a need to overprovision more for mixed workloads.
kubernettes does not eliminate the need for an operations team or vendor support for infrastructure or platforms.
decentralization has costs, and putting all of the cluster configuration in etcd imposes some limitations, mostly around performance. While I think decentralization is correct, in many ways for Kubernetes, applications developers may need systems that have lower latency and tighter scheduling abilities.
The fact that you can add applications to an existing cluster, or host a collection of small applications is mostly a symptom of clusters being over provisioned. This probably isn’t bad, and it’s almost certainly the case that you can reduce the overprovisioning bias with kube, to some degree.

Impact and Predictions

applications developed for kubernettes will eventually become difficult or impossible to imagine or run without kubernettes. This has huge impacts on developer experience and test experience. I’m not sure that this is a problem, but I think it’s a hell of a dependency to pick up. This was true of applications that target mainframes as well.
Kubernetes will eventually replace vendor specific APIs for cloud infrastructure for most higher level use cases.
Kubernetes will primarily be deployed by Cloud providers (RedHat/IBM, Google, AWS, Azure, etc.) rather than by infrastructure teams.
Right now, vendors are figuring out what kinds of additional services users and applications need to run in Kubernetes, but eventually there will be another layer of tooling on top of Kubernetes:
- logging and metrics collection.
- deployment operations and configuration, particularly around coordinating dependencies.
- authentication and credential management.
- low-latency offline task orchestration.
At some point, we’ll see a move multi-cluster orchestration, or more powerful tools approach to workload isolation within a single cluster.

Conclusion

Kubernetes is great, and it’s been cool to see how, really in the last couple of years, it’s emerged to really bring together things like cloud infrastructure and container orchestration. At the same time, it (of course!) doesn’t solve all of the problems that developers have with their infrastructure, and I’m really excited to see how people build upon Kubernetes to achieve some of those higher level concerns, and make it easier to build software on top of the resulting platforms.

Continuous Integration is Harder Than You Think

2021-02-02 – tychoish

I’ve been working on continuous integration systems for a few years, and while the basic principle of CI is straightforward, it seems that most CI deployments are not. This makes sense: project infrastructure is an easy place to defer maintenance during the development cycle, and projects often prioritize feature development and bug fixing over tweaking the buildsystem or test infrastructure, but I almost think there’s something more. This post is a consideration of what makes CI hard and perhaps provide a bit of unsolicited advice.

The Case for CI

I suppose I don’t really have to sell anyone on the utility or power of CI: running a set of tests on your software regularly allows developers and teams to catch bugs early, and saves a bucket of developer time, and that is usually enough. Really, though, CI ends up giving you the leverage to solve a number of really gnarly engineering problems:

how to release software consistently and regularly.
how to support multiple platforms.
how to manage larger codebases.
anything with distributed systems.
how to develop software with larger numbers of contributors.

Doing any of these things without CI isn’t really particularly viable, particularly at scale. This isn’t to say, that they “come free” with CI, but that CI is often the right place to build the kind of infrastructure required to manage distributed systems problems or release complexity.

Buildsystems are Crucial

One thing that I see teams doing some times is addressing their local development processes and tooling with a different set of expectations than they do in CI, and you can totally see and understand how this happens: the CI processes always start from a clean environment, and you often want to handle failures in CI differently than you might handle a failure locally. It’s really easy to write a shell script that only runs in CI, and then things sort of accumulate, and eventually there emerge a class of features and phenomena that only exist for and because of CI.

The solution is simple: invest in your buildsystem,¹ and ensure that there is minimal (or no!) indirection between your buildsystem and your CI configuration. But buildsystems are hard, and in a lot of cases, test harnesses aren’t easily integrated into build systems, which complicates the problem for some. Having a good build system isn’t particularly about picking a good tool, though there are definitely tradeoffs for different tools, the problem is mostly in capturing logic in a consistent way, providing a good interface, and ensuring that the builds happen as efficiently as possible.

Regardless, I’m a strong believer in centralizing as much functionality in the buildsystem as possible and making sure that CI just calls into build systems. Good build systems:

allow you to build or rebuild (or test/subtest) only subsets of work, to allow quick iteration during development and debugging.
center around a model of artifacts (things produced) and dependencies (requires-type relationships between artifacts).
have clear defaults, automatically detect dependencies and information from the environment, and perform any required set up and teardown for the build and/or test.
provide a unified interface for the developer workflow, including building, testing, and packaging.

The upside, is that effort that you put into the development of a buildsystem pay dividends not just for managing to complexity of CI deployments, but also make the local development stable and approachable for new developers.

T-Shaped Matrices

There’s a temptation with CI systems to exercise your entire test suite with a comprehensive and complete range of platforms, modes, and operations. While this works great for some smaller projects, “completism” is not the best way to model the problem. When designing and selecting your tests and test dimensions, consider the following goals and approaches:

on one, and only one, platform run your entire test suite. This platform should probably be very close to the primary runtime of your environment (e.g. when developing a service that runs on Linux service, your tests should run in a system that resembles the production environment,) or possibly your primary development environment.
for all platforms other than your primary platform, run only the tests that are either directly related to that runtime/platform (e.g. anything that might be OS or processor specific,) plus some small subset of “verification” or acceptance tests. I would expect that these tests should easily be able to complete in 10% of the time of a “full build,”
consider operational variants (e.g. if your product has multiple major-runtime modes, or some kind of pluggable sub-system) and select the set of tests which verifies these modes of operations.

In general the shape of the matrix should be t-shaped, or “wide across” with a long “narrow down.” The danger more than anything is in running too many tests, which is a problem because:

more tests increase the chance of a false negative (caused by the underlying systems infrastructure, service dependencies, or even flakey tests,) which means you risk spending more time chasing down problems. Running tests that provide signal is good, but the chance of false negatives is a liability.
responsiveness of CI frameworks is important but incredibly difficult, and running fewer things can improve responsiveness. While parallelism might help some kinds of runtime limitations with larger numbers of tests, this incurs overhead, is expensive.
actual failures become redundant, and difficult to attribute failures in “complete matrices.” A test of certain high level systems may pass or fail consistently along all dimensions creating more noise when something fails. With any degree of non-determinism or chance of a false-negative, running tests more than once just make it difficult to attribute failures to a specific change or an intermittent bug.
some testing dimensions don’t make sense, leading to wasted time addressing test failures. For example when testing an RPC protocol library that supports both encryption and authentication, it’s not meaningful to test the combination of “no-encryption” and “authentication,” although the other three axes might be interesting.

The ultimate goal, of course is to have a test matrix that you are confident will catch bugs when they occur, is easy to maintain, and helps you build confidence in the software that you ship.

Conclusion

Many organizations have teams dedicated maintaining buildsystems and CI, and that’s often appropriate: keeping CI alive is of huge value. It’s also very possible for CI and related tools to accrued complexity and debt in ways that are difficult to maintain, even with dedicated teams: taking a step back and thinking about CI, buildsystems, and overall architecture strategically can be very powerful, and really improve the value provided by the system.

Canonically buildsystems are things like makefiles (or cmake, scons, waf, rake, npm, maven, ant, gradle, etc.) that are responsible for converting your source files into executable, but the lines get blurry in a lot of languages/projects. For Golang, the go tool plays the part of the buildsystem and test harness without much extra configuration, and many environments have a pretty robust separation between building and testing. ↩︎

Get More Done

2021-01-21 – tychoish

It’s really easy to over think the way that we approach our work and manage our own time and projects. There are no shortage of tools, services, books, and methods to organizing your days and work, and while there are a lot of good ideas out there, it’s easy to get stuck fiddling with how you work, at the expense of actuallying getting work done. While I’ve definitely thought about this a lot over time, for a long time, I’ve mostly just done things and not really worried much about the things on my to-do list.¹

I think about the way that I work similarly to the way that I think about the way I work with other people. The way you work alone is different from collaboration, but a lot of the principles of thinking about big goals, and smaller actionable items is pretty transferable.

My suggestions here are centered around the idea that you have a todo list, and that you spend a few moments a day looking at that list, but actually I think the way I think about my work is really orthogonal to any specific tools. For years, most of my personal planning has revolved around making a few lists in a steno pad once or twice a day,² though I’ve been trying to do more digital things recently. I’m not sure I like it. Again, tools don’t matter.

Smaller Tasks are Always Better

It’s easy to plan projects from the “top down,” and identify the major components and plan your work around those components, and the times that I run in to trouble are always the times when my “actionable pieces” are too big. Smaller pieces help you build momentum, allow to move around to different areas as your attention and focus change, and help you use avalible time effectively (when you want.)

It’s easy to find time in-between meetings, or while the pasta water is boiling, to do something small and quick. It’s also very easy to avoid starting something big until you have a big block of unfettered time. The combination of these factors makes bigger tasks liabilities, and more likely to take even longer to complete.

Multi-Task Different Kinds of Work

I read a bunch of articles that suggest that the way to be really productive is to figure out ways of focusing and avoiding context switches. I’ve even watched a lot of coworkers organize their schedules and work around these principles, and it’s always been something of a mystery for me. It’s true that too much multi-tasking and context switching can lead to a fragmented experience and make some longer/complicated tasks harder to really dig into, but it’s possible to manage the costs of context switching, by breaking apart bigger projects into smaller projects and leaving notes for your (future) self as you work.

Even if you don’t do a lot of actual multitasking within a given hour or day of time, it’s hard to avoid really working on different kinds of projects on the scale of days or weeks, and I’ve found that having multiple projects in flight at once actually helps me get more done. In general I think of this as the idea that more projects in flight means that you finish things more often, even if the total number of projects completed is the same in the macro context.

Regardless, different stages of a project require different kind of attention and energy and having a few things in flight increases the chance that when you’re in the mood to do some research, or editing, or planning, you have a project with that kind of work all queued up. I prefer to be able to switch to different kinds of work depending on my attention and mood. In general my work falls into the following kinds of activities:

planning (e.g. splitting up big tasks, outlining, design work,)
generative work (e.g. writing, coding, etc.)
organizational (email, collaboration coordination, user support, public issue tracking, mentoring, integration, etc.)
polishing (editing, writing/running tests, publication prepping,)
reviewing (code review, editing, etc.)

Do the Right Things

My general approach is “do lots of things and hope something sticks,” which makes the small assumption that all of the things you do are important. It’s fine if not everything is the most important, and it’s fine to do things a bit out of order, but it’s probably a problem if you do lots of things without getting important things done.

So I’m not saying establish a priority for all tasks and execute them in strictly that priority, at all. Part of the problem is just making sure that the things on your list are still relevant, and still make sense. As we do work and time passes, we have to rethink or rechart how we’re going to complete a project, and that reevaluation is useful.

Prioritization and task selection is incredibly hard, and it’s easy to cast “prioritization” in over simplified terms. I’ve been thinking about prioritization, for my own work, as being a decision based on the following factors:

deadline (when does this have to be done: work on things that have hard deadlines or expected completion times, ordered by expected completion date, to avoid needing to cram at the last moment.)
potential impact (do things that will have the greatest impact before lesser impact, this is super subjective, but can help build momentum, and give you a chance to decide if lower-impact items are worth doing.)
time availability fit (do the biggest thing you can manage with the time you have at hand, as smaller things are easier to fit in later,)
level of understanding (work on the things that you understand the best, and give yourself the opportunity to plan things that you don’t understand later. I sometimes think about this as “do easy things first,” but that might be too simple.)
time outstanding (how long ago was this task created: do older things first to prevent them from becoming stale.)
number of things (or people) that depend on this being done (work on things that will unblock other tasks or collaborators before things that don’t have any dependencies, to help increase overall throughput.)

Maintain a Pipeline of Work

Productivity, for me, has always been about getting momentum on projects and being able to add more things. For work projects, there’s (almost) always a backlog of tasks, and the next thing is ususally pretty obvious, but sometimes this is harder for personal projects. I’ve noticed a tendency in myself to prefer “getting everything done” on my personal todo list, which I don’t think particularly useful. Having a pipleine of backlog of work is great:

there’s always something next to do, and there isn’t a moment when you’ve finished and have to think about new things.
keeping a list of things that you are going to do in the more distant future lets you start thinking about how bigger pieces fit together without needint to starting to work on that.
you can add big things to your list(s) and then break them into smaller pieces as you make progress.

As an experiment, think about your todo list, not as a thing that you’d like to finish all of the items, but as list that shouldn’t be shorter than a certain amount (say 20 or 30?) items with rate of completion (10 a week?) though you should choose your own numbers, and set goals based on what you see yourself getting done over time.

Though, to be clear, I’ve had the pleasure and benefit of working in an organization that lives-and-dies by a bug tracking system, with a great team of folks doing project management. So there are other people who manage sprints, keep an eye on velocity, and make sure that issues don’t get stuck. ↩︎
My general approach is typically to have a “big projects” or “things to think about” list and a “do this next list”, with occasional lists about all the things in a specific big project. In retrospect these map reasonable well to SCRUM/Agile concepts, but it also makes sense. ↩︎

Staff Engineering

2020-09-27 – tychoish

In August of 2019 I became a Staff Engineer, which is what a lot of companies are calling their “level above Senior Engineer” role these days. Engineering leveling is a weird beast, which probably a post onto itself. Despite my odd entry into a career in tech, my path in the last 4 or 5 years has been pretty conventional; however, somehow, despite having an increasingly normal career trajectory, explaining what I do on a day to day basis has not gotten easier.

Staff Engineers are important for scaling engineering teams, but lots of teams get by with out them, and unlike more junior engineers who have broadly similar job roles, there are a lot of different ways to be a Staff Engineer, which only muddies things. This post is a reflection on some key aspects of my experience organized in to topics that I hope will be useful for people who may be interested in becoming staff engineers or managing such a person. If you’re also a Staff Engineer and your experience is different, I wouldn’t be particularly surprised.

Staff Engineers Help Teams Build Great Software

Lots of teams function just fine without Staff Engineers and teams can build great products without having contributors in Staff-type roles. Indeed, because Staff Engineers vary a lot, the utility of having more senior individual contributors on a team depends a lot of the specific engineer and the team in question: finding a good fit is even harder than usual. In general, having Senior Technical leadership can help teams by:

giving people managers more space and time to focus on the team organization, processes, and people. Particularly in small organizations, team managers often pick up technical leadership.
providing connections and collaborations between groups and efforts. While almost all senior engineers have a “home team” and are directly involved in a few specific projects, they also tend to have broader scope, and so can help coordinate efforts between different projects and groups.
increasing the parallelism of teams, and can provide the kind of infrastructure that allows a team to persue multiple streams of development at one time.
supporting the career path and growth of more junior engineers, both as a result of direct mentoring, but also by enabling the team to be more successful by having more technical leadership capacity creates opportunities for growth for everyone on the team.

Staff Promotions Reflect Organizational Capacity

In addition to experience and a history of effusiveness, like other promotions, getting promoted to Staff Engineer is less straight forward than other promotions. This is in part because the ways we think about leveling and job roles (i.e. to describe the professional activities and capabilities along several dimensions for each level,) become complicated when there are lots of different ways to be a Staff Engineer. Pragmatically, these kind of promotions often depend on other factors:

the existence of other Staff Engineers in the organization make it more likely that there’s an easy comparison for a candidate.
past experience of managers getting Staff+ promotions for engineers. Enginering Managers without this kind of experience may have difficulty creating the kinds of opportunities within their organizations and for advocating these kinds of promotions.
organizational maturity and breadth to support the workload of a Staff Engineer: there are ways to partition teams and organizations that preclude some of the kinds of higher level concerns that justify having Staff Engineers, and while having senior technical leadership is often useful, if the organization can’t support it, it won’t happen.
teams with a sizable population of more junior engineers, particularly where the team is growing, will have more opportunity and need for Staff Engineers. Teams that are on the balance more senior, or are small and relatively static tend to have less opportunity for the kind of broadly synthetic work that tends to lead to Staff promotions.

There are also, of course, some kinds of technical achievements and professional characteristics that Staff Engineers often have, and I’m not saying that anyone in the right organizational context can be promoted, exactly. However, without the right kind of organizational support and context, even the most exceptional engineers will never be promoted.

Staff Promotions are Harder to Get Than Equivalent Management Promotions

In many organizations its true that Staff promotions are often much harder to get than equivalent promotions to peer-level management postions: the organizational contexts required to support the promotion of Engineers into management roles are much easier to create, particularly as organizations grow. As you hire more engineers you need more Engineering Managers. There are other factors:

managers control promotions, and it’s easier for them to recapitulate their own career paths in their reports than to think about the Staff role, and so more Engineers tend to be pushed towards management than Senior IC roles. It’s also probably that meta-managers benefit organizationally from having more front-line managers in their organizations than more senior ICs, which exacerbates this bias.
from an output perspective, Senior Engineers can write the code that Staff Engineers would otherwise write, in a way that Engineering Management tends to be difficult to avoid or do without. In other terms, management promotions are often more critical from the organization’s perspective and therefore prioritized over Staff promotions, particularly during growth.
cost. Staff Engineers are expensive, often more expensive than managers particularly at the bottom of the brackets, and it’s difficult to imagine that the timing of Staff promotions are not impacted by budgetary requirements.

Promoting a Staff Engineer is Easier than Hiring One

Because there are many valid ways to do the Staff job, and so much of the job is about leveraging context and building broader connections between different projects, people with more organizational experience and history often have an advantage over fresh industry hires. In general:

Success as a Staff Engineer in one organization does not necessarily translate to success at another.
The conventions within the process for industry hiring, are good at selecting junior engineers, and there are fewer conventions for more Senior roles, which means that candidates are not assessed for skills and experiences that are relevant to their day-to-day while also being penalized for (often) being unexceptional at the kind of problems that junior engineering interviews focus on. While interview processes are imperfect assessment tools in all cases, they’re particularly bad at more senior levels.
Senior engineering contributors have a the potential to have huge impact on product development, engineering outcomes, all of which requires a bunch of trust on the part of the organization, and that kind of trust is often easier to build with someone who already has organizational experience

This isn’t to say that it’s impossible to hire Staff engineers, I’m just deeply dubious of the hiring process for these kinds of roles having both interviewed for these kinds of roles and also interviewed candidates for them. I’ve also watched more than one senior contributor not really get along well with a team or other leadership after being hired externally, and for reasons that end up making sense in retrospect. It’s really hard.

Staff Engineers Don’t Not Manage

Most companies have a clear distinction between the career trajectories of people involved in “management” and senior “individual contributor” roles (like Staff Engineers,) with managers involved in leadership for teams and humans, with ICs involved in technical aspects. This seems really clear on paper but incredibly messy in practice. The decisions that managers make about team organization and prioritization have necessary technical implications; while it’s difficult to organize larger scale technical initiatives without awareness of the people and teams. Sometimes Staff Engineers end up doing actual management on a temporary basis in order to fill gaps as organizations change or cover parental leave

It’s also the case that a huge part of the job for many Staff Engineer’s involves direct mentorship of junior engineers, which can involve leading specific projects, conversations about career trajectories and growth, as well as conversations about specific technical topics. This has a lot of overlap with management, and that’s fine. The major differences is that senior contributors share responsibility for the people they mentor with their actual managers, and tend to focus mentoring on smaller groups of contributors.

Staff Engineers aren’t (or shouldn’t be!) managers, even when they are involved in broader leadership work, even if the specific engineer is capable of doing management work: putting ICs in management roles, takes time away from their (likely more valuable) technical projects.

Staff Engineers Write Boring and Tedious But Easy Code

While this is perhaps not a universal view, I feel pretty safe in suggesting that Staff Engineers should be directly involved in development projects. While there are lots of ways to be involved in development: technical design, architecture, reviewing code and documents, project planning and development, and so fort, I think it’s really important that Staff Engineers be involved with code-writing, and similar activies. This makes it easy to stay grounded and relevant, and also makes it possible to do a better job at all of the other kinds of engineering work.

Having said that, it’s almost inevitable that the kinds of contribution to the code that you make as a Staff Engineer are not the same kinds of contributions that you make at other points in your career. Your attention is probably pulled in different directions. Where a junior engineer can spend most of their day focusing on a few projects and writing code, Staff Engineers:

consult with other teams.
mentor other engineers.
build long and medium term plans for teams and products.
breaking larger projects apart and designing APIs between components.

All of this “other engineering work” takes time, and the broader portfolio of concerns means that more junior engineers often have more time and attention to focus on specific programming tasks. The result is that the kind of code you end up writing tends to be different:

fixing problems and bugs in systems that require a lot of context. The bugs are often not very complicated themselves, but require understanding the implication of one component with regards to other components, which can make them difficult.
projects to enable future development work, including building infrastructure or designing an interface and connecting an existing implementation to that interface ahead of some larger effort. This kind of “refactor things to make it possible to write a new implementation.”
writing small isolated components to support broader initiatives, such as exposing existing information via new APIs, and building libraries to facilitate connections between different projects or components.
projects that support the work of the team as a whole: tools, build and deployment systems, code clean up, performance tuning, test infrastructure, and so forth.

These kinds of projects can amount to rather a lot of development work, but they definitely have their own flavor. As I approached Staff and certainly since, the kind of projects I had attention for definitely shifted. I actually like this kind of work rather a lot, so that’s been quite good for me, but the change is real.

There’s definitely a temptation to give Staff Engineers big projects that they can go off and work on alone, and I’ve seen lots of teams and engineers attempt this: sometimes these projects work out, though more often the successes feel like an exception. There’s no “right kind” of way to write software as a Staff Engineer, sometimes senior engineer’s get to work on bigger “core projects.” Having said that, if a Staff Engineer is working on the “other engineering” aspects of the job, there’s just limited time to do big development projects in a reasonable time frame.

Notes#

Observations#

Impact and Predictions#

Conclusion#

The Case for CI#

Buildsystems are Crucial#

T-Shaped Matrices#

Conclusion#

Smaller Tasks are Always Better#

Multi-Task Different Kinds of Work#

Do the Right Things#

Maintain a Pipeline of Work#

Staff Engineers Help Teams Build Great Software#

Staff Promotions Reflect Organizational Capacity#

Staff Promotions are Harder to Get Than Equivalent Management Promotions#

Promoting a Staff Engineer is Easier than Hiring One#

Staff Engineers Don’t Not Manage#

Staff Engineers Write Boring and Tedious But Easy Code#

Related Reading#

Notes

Observations

Impact and Predictions

Conclusion

The Case for CI

Buildsystems are Crucial

T-Shaped Matrices

Conclusion

Smaller Tasks are Always Better

Multi-Task Different Kinds of Work

Do the Right Things

Maintain a Pipeline of Work

Staff Engineers Help Teams Build Great Software

Staff Promotions Reflect Organizational Capacity

Staff Promotions are Harder to Get Than Equivalent Management Promotions

Promoting a Staff Engineer is Easier than Hiring One

Staff Engineers Don’t Not Manage

Staff Engineers Write Boring and Tedious But Easy Code

Related Reading