How to Choose a Programming Language

I talk to lots of people about software and programming: people who are trying to make a technical decision for a new project or who are interested in learning something new, and some form of the question "what programming language should I learn?" or "what's the best language for this new project?" comes up a lot.

These are awful questions, because there is no singular right answer, and in some senses all answers are wrong. This post will be an exploration of some decent answers to this question, and some useful ways to think about the differences between programming languages.

  • If you already build and maintain software in one programming language, build new components in the same language you already use. Adding new tools and technologies increases maintenance burden for all engineers, and software tends to stick around for a long time, so this cost can stick around for a long time.

  • Sometimes the software you want to write must target a specific runtime or environment, there's really only one reasonable choice. The prototypical examples of these are things like: iOS apps (Swift,) Android apps (Kotlin), or things that run in the browser (JavaScript,) although:

  • Given things like React Native and Electron, it's reasonable to just write JavaScript for all GUI code, although often this might actually mean TypeScript in practice. While it used to be the case that it made sense to write GUI code in various native tool kits, at this point it seems like it makes sense to just figure out ways of doing it all in JS.

  • If you already know how to program in one language, and want to learn something new, but don't have a specific project in mind attempt to learn something that's quite different from you already know: if you're comfortable in something like Python, try and learn something like Go or Rust. If you're primarily a Java programmer, something like JavaScript or Python might be an interesting change of pace.

    The same basic ideas applies to selecting languages that will be used by teams: choose a tool that's complementary to what you're already doing, and that could provide value.

  • If you're more familiar with a few programming languages or don't feel you need to learn a new language for professional reasons pick something fun and off the wall: Ocaml! Common Lisp! Rust! Haskell! Scheme! Elixir! It doesn't matter and in these cases you probably can probably learn new languages when you need, the point is to learn something that's radically different and to help you think about computers and programming in radically different ways.

  • Choose the language that people working on similar projects are already using. For instance, if you're doing a lot of data science, using Python makes a lot of sense; if you're writing tools that you expect banks (say) to use, something that runs on the JVM is a good bet. The idea here is you may be able to find more well developed tools and resources relevant to the kinds of problems you encounter.

  • When starting a new project and there isn't a lot of prior art in the area that you're working, or you want to avoid recapitulating some flaw in the existing tools, you end up having a lot of freedom. In general:

    • Think about concurrency and workload characteristics. Is the workload CPU, Network, or IO bound? Is the application multithreaded, or could take advantage of parallelism within processes? There are different kinds of concurrency, and different execution models, so this isn't always super cut-and-dry: theoretically languages that have "real threads" (C/C++, Java, Rust, Common Lisp, etc.) or a close enough approximation (Go,) are better, but for workloads that are network bound, event-driven systems (e.g. Python's Tornado and Node.J's) work admirably.
    • How will you distribute and run the application? There are some languages that can provide static binaries that include all of their dependencies for distribution, which can simplify some aspects of distribution and execution process, but for software that you control the runtime (e.g. services deployed on some kind of container based-platform,) it might matter less.
    • Are there strong real-time requirements? If so, and you're considering a garbage collected language, make sure that the GC pauses aren't going to be a problem. It's also the case that all GCsare not the same, so having a clear idea of what the tolerances
    • Is this software going to be maintained by a team, and if so, what kind of tools will they need in order to succeed and be productive. Would static typing help? What's the developer tooling and experience like? Are there libraries that you'd expect to need that are conspicuously missing?

Have fun! Build cool things!

How to Become a Self-Taught Programmer

i am a self taught programmer. i don't know that i'd recommend it to anyone else there are so many different curricula and training programs that are well tested and very efficacious. for lots of historical reasons, many programmers end up being all or mostly self taught: in the early days because programming was vocational and people learned on the job, then because people learned programming on their own before entering cs programs, and more recently because the demand for programmers (and rate of change) for the kinds of programming work that are in the most demand these days. and knowing that it's possible (and potentially cheaper) to teach yourself, it seems like a tempting option.

this post, then, is a collection of suggestions, guidelines, and pointers for anyone attempting to teach themselves to program:

  • focus on learning one thing (programming language and problem domain) at a time. there are so many different things you could learn, and people who know how to program seem to have an endless knowledge of different things. knowing one set of tools and one area (e.g. "web development in javascript," or "system administration in python,") gives you the framework to expand later, and the truth is that you'll be able to learn additional things more easily once you have a framework to build upon.

  • when learning something in programming, always start with a goal. have some piece of data that you want to explore or visualize, have a set of files that you want to organize, or something that you want to accomplish. learning how to program without a goal, means that you don't end up asking the kinds of questions that you need to form the right kinds of associations.

  • programming is all about learning different things: if you end up programming for long enough you'll end up learning different languages, and being able to pick up new things is the real skill.

  • being able to clearly capture what you were thinking when you write code is basically a programming super power.

  • programming is about understanding problems [1] abstractly and building abstractions around various kinds of problems. being able break apart these problems into smaller core issues, and thinking abstractly about the problem so that you can solve both the problem in front of you and also solve it in the future are crucial skills.

  • collect questions or curiosities as you encounter them, but don't feel like you have to understand everything, and use this to guide your background reading, but don't feel like you have to hunt down the answer to every term you hear or see that you don't already know immediately. if you're pretty rigorous about going back and looking things up, you'll get a pretty good base knowledge over time.

  • always think about the users of your software as you build, at every level. even if you're building software for your own use, think about the future version of yourself that will use that software, imagine that other people might use the interfaces and functions that you write and think about what assumptions they might bring to the table. think about the output that your program, script, or function produces, and how someone would interact with that output.

  • think about the function as the fundamental building block of your software. lower level forms (i.e. statements) are required, but functions are the unit where meaning is created in the context of a program. functions, or methods depending, take input (arguments, ususally, but sometimes also an object in the case of methods) and produce some output, sometimes with some kind of side-effect. the best functions:

    • clearly indicate side-effects when possible.
    • have a mechanism for reporting on error conditions (exceptions, return values,)
    • avoid dependencies on external state, beyond what is passed as arguments.
    • are as short as possible.
    • use names that clearly describe the behavior and operations of the function.

    if programming were human language (english,) you'd strive to construct functions that were simple sentences and not paragraph's, but also more than a couple of words/phrases, and you would want these sentences to be clear to understand with limited context. if you have good functions, interfaces are more clear and easier to use, code becomes easier to read and debug, and easier to test.

  • avoid being too weird. many programmers are total nerds, and you may be too, and that's ok, but it's easier to learn how to do something if there's prior art that you can learn from and copy. on a day-to-day basis, a lot of programming work is just doing something until you get stuck and then you google for the answer. If you're doing something weird--using a programming language that's less widely used, or in a problem space that is a bit out of mainstream, it can be harder to find answers to your problems.

Notes

[1]I use the term "problem" to cover both things like "connecting two components internally" and also more broadly "satisfying a requirement for users," and programming often includes both of these kinds of work.

Principles of Test Oriented Software Development

I want to like test-driven-development (TDD), but realistically it's not something that I ever actually do. Part of this is because TDD, as canonically described is really hard to actually pratice: TDD involves writing tests before writing code, writing tests which must fail before the implementation is complete or correct, and then using the tests to refactor the code. It's a nice idea, and it definitely leads to better test coverage, but the methodology forces you to iterate inefficiently on aspects of a design, and is rarely viable when extending existing code bases. Therefore, I'd like to propose a less-dogmatic alternative: test-oriented-development. [1]

I think, in practice, this largely aligns with the way that people write software, and so test oriented development does not describe a new way of writing code or of writing tests, but rather describes the strategies we use to ensure that the code we write is well tested and testable. Also, I think providing these strategies in a single concrete form will present a reasonable and pragmatic alternative to TDD that will make the aim of "developing more well tested software" more achievable.

  1. Make state explicit. This is good practice for all kinds of development, but generally, don't put data in global variables, and pass as much state (configuration, services, etc.) into functions and classes rather than "magicing" them.
  2. Methods and functions should be functional. I don't tend to think of myself as a functional programmer, as my tendencies are not very ideological, in this regard, but generally having a functional approach simplifies a lot of decisions and makes it easy to test systems at multiple layers.
  3. Most code should be internal and encapsulated. Packages and types with large numbers of exported or public methods should be a warning sign. The best kinds of tests can provide all desired coverage, by testing interfaces themselves,
  4. Write few simple tests and varry the data passed to those tests. This is essentially a form of "table driven testing," where you write a small sequence of simple cases, and run those tests with a variety of tests. Having test infrastructure that allows this kind of flexibility is a great technique.
  5. Begin writing tests as soon as possible. Orthodox TDD suggests that you should start writing tests first, and I think that this is probably one of the reasons that TDD is so hard to adopt. It's also probably the case that orthodox TDD emerged when prototyping was considerably harder than it is today, and as a result TDD just feels like friction, because it's difficult to plan implementations in a test-first sort of way. Having said that, start writing tests as soon as possible.
  6. Experiment in tests. Somehow, I've managed to write a lot of code without working an interactive debugger into my day-to-day routine, which means I do a lot of debugging by reading code, and also by writing tests to try and replicate production phenomena in more isolated phenomena. Writing and running tests in systems is a great way to learn about them.
[1]Sorry that this doesn't lead to a better acronym.

How to Write Performant Software and Why You Shouldn't

I said a thing on twitter that I like, and I realized that I hadn't really written (or ranted) much about performance engineering, and it seemed like a good thing to do. Let's get to it.

Making software fast is pretty easy:

  • Measure the performance of your software at two distinct levels:

    • figure out how to isolate specific operations, as in unit test, and get the code to run many times, and measure how long the operations take.
    • Run meaningful units of work, as in integration tests, to understand how the components of your system come together.

    If you're running a service, sometimes tracking the actual timing of actual operations over time, can also be useful, but you need a lot of traffic for this to be really useful. Run these measurements regularly, and track the timing of operations over time so you know when things actually chair.

  • When you notice something is slow, identify the slow thing and make it faster. This sounds silly, but the things that are slow usually fall into one of a few common cases:

    • an operation that you expected to be quick and in memory, actually does something that does I/O (either to a disk or to the network,)
    • an operation allocates more memory than you expect, or allocates memory more often than you expect.
    • there's a loop that takes more time than you expect, because you expected the number of iterations to be small (10?) and instead there are hundreds or thousands of operations.

    Combine these and you can get some really weird effects, particularly over time. An operation that used to be quick gets slower over time, because the items iterated over grows, or a function is called in a loop that used to be an in-memory only operation, now accesses the database, or something like that. The memory based ones can be trickier (but also end up being less common, at least with more recent programming runtimes.)

Collect data, and when something gets slower you should fix it.

Well no.

Most of the time slow software doesn't really matter. The appearance of slowness or fastness is rarely material to user's experience or the bottom line. If software gets slower, most of the time you should just let it get slower:

  • Computers get faster and cheaper over time, so most of the time, as long as your rate of slow down is slow and steady over time, its usually fine to just ride it out. Obviously big slow downs are a problem, but a few percent year-over-year is so rarely worth fixing.

    It's also the case that runtimes and compilers are almost always getting faster, (because compiler devlopers are, in the best way possible, total nerds,) so upgrading the compiler/runtime regularly often offsets regular slow down over time.

  • In the vast majority of cases, the thing that makes software slow is doing I/O (disks or network,) and so your code probably doesn't matter and so what your code does is unlikely to matter much and you can solve the problem by changing how traffic flows through your system.

    For IX (e.g. web/front-end) code, the logic is a bit different, because slow code actually impacts user experience, and humans notice things. The solution here, though, is often not about making the code faster, but in increasingly pushing a lot of code to "the backend," (e.g. avoid prossing data on the front end, and just make sure the backend can always hand you exactly the data you need and want.)

  • Code that's fast is often harder to read and maintain: to make code faster, you often have to be careful and avoid using certain features of your programming language or runtime (e.g. avoiding ususing heap allocations, or reducing the size of allocations by encoding data in more terse ways, etc,) or by avoiding libraries that are "slower," or that use certain abstractions, all of which makes your code less conventional more difficult to read, and harder to debug. Programmer time is almost always more expensive than compute time, so unless it's big or causing problems, its rarely worth making code harder to read.

    Sometimes, making things actually faster is actually required. Maybe you have a lot of data that you need to get through pretty quickly and there's no way around it, or you have some classically difficult algorithm problem (graph search, say,), but in the course of generally building software this happens pretty rarely, and again most of the time pushing the problem "up" (from the front end to the backend and from the backend to the database, similar,) solves whatever problems you might have.

  • While there are obviously pathological counter-examples, ususally related to things that happen in loops, but a lot of operations never have to be fast because they sit right next to another operation that's much slower:

    • Lots of people analyze logging tools for speed, and this is almost always silly because all log messages either have to be written somewhere (I/O) and generally there has to be something to serialize messages (a mutex or functional equivalent,) somewhere because you want to write only one message at a time to the output, so even if you have a really "fast logger" on its own terms, you're going to hit the I/O or the serializing nature of the problem. Use a logger that has the features you have and is easy to use, speed doesn't matter.
    • anything in HTTP request routing and processing. Because request processing sits next to network operations, often between a database as well as to the client, any sort of gain by using a "a faster web framework," is probably immeasurable. Use the ones with the clearest feature set.

Common Lisp Grip, Project Updates, and Progress

Last week, I did a release, I guess, of cl-grip which is a logging library that I wrote after reflecting on common lisp logging earlier. I wanted to write up some notes about it that aren't covered in the read me, and also talk a little bit4 about what else I'm working on.

cl-grip

This is a really fun and useful project and it was really the right size for a project for me to really get into, and practice a bunch of different areas (packages! threads! testing!) and I think it's useful to boot. The read me is pretty comprehensive, but I thought I'd collect some additional color here:

Really at it's core cl-grip isn't a logging library, it's just a collection of interfaces that make it easy to write logging and messaging tools, which is a really cool basis for an idea, (I've been working on and with a similar system in Go for years.)

As result, there's interfaces and plumbing for doing most logging related things, but no actual implementations. I was very excited to leave out the "log rotation handling feature," which feels like an anachronism at this point, though it'd be easy enough to add that kind of handler in if needed. Although I'm going to let it stew for a little while, I'm excited to expand upon it in the future:

  • additional message types, including capturing stack frames for debugging, or system information for monitoring.
  • being able to connect and send messages directly to likely targets, including systemd's journal and splunk collectors.
  • a collection of more absurd output targets to cover "alerting" type workloads, like desktop notifications, SUMP, and Slack targets.

I'm also excited to see if other people are interested in using it. I've submitted it to Quicklisp and Ultralisp, so give it a whirl!

See the cl-grip repo on github.

Eggqulibrium

At the behest of a friend I've been working on an "egg equilibrium" solver, the idea being to provide a tool that can given a bunch of recipes that use partial eggs (yolks and whites) can provide optimal solutions that use a fixed set of eggs.

So far I've implemented some prototypes that given a number of egg parts, attempt collects recipes until there are no partial eggs in use, so that there are no leftovers. I've also implemented the "if I have these partial eggs, what can I make to use them all." I've also implemented a rudimentary CLI interface (that was a trip!) and a really simple interface to recipe data (both parsing from a CSV format and an in memory format that makes solving the equilibrium problem easier.)

I'm using it as an opportunity to learn different things, and find a way to learn more about things I've not yet touched in lisp (or anywhere really,) so I'm thinking about:

  • building a web-based interface using some combination of caveman, parenscript, and related tools. This could include features like "user submitted databases," as well as links to the sources the recpies, in addition to the basic "web forms, APIs, and table rendering."
  • storing the data in a database (probably SQLite, mostly) both to support persistence and other more advanced features, but also because I've not done database things from Lisp at all.

See the eggquilibrium repo on github it's still pretty rough, but perhaps it'll be interesting!'

Other Projects

  • Writing more! I'm trying to be less obsessive about blogging, as I think it's useful (and perhaps interesting for you all too.) I've been writing a bunch and not posting very much of it. My goal is to mix sort of grandiose musing on technology and engineering, with discussions of Lisp, Emacs, and programming projects.
  • Working on producing texinfo output from cl-docutils! I've been toying around with the idea of writing a publication system targeted at producing books--long-form non-fiction, collections of essays, and fiction--rather than the blogs or technical resources that most such tools are focused on. This is sort of part 0 of this process.
  • Hacking on some Common Lisp projects, I'm particularly interested in the Nyxt and StumpWM.

A Common Failure

I've been intermittently working on a common lisp library to produce a binary encoding of arbitrary objects, and I think I'm going to be abandoning the project. This is an explanation of that decision and an reflection on my experience.

Why Common Lisp?

First, some background. I've always thought that Common Lisp is a language with a bunch of cool features and selling points, but unsurprisingly, I've never really had the experience of writing more than some one-off bits of code in CL, which isn't surprising. This project was a good experience for really digging into writing and managing a conceptually larger project which was a good kick in the pants to learn more.

The things I like:

  • the implementations of the core runtime are really robust and high quality, and make it possible to imagine running your code in a bunch of different contexts. Even though it's a language with relatively few users, it feels robust in a way. The most common implementations also have ways of producing fully self contained static binaries (like Go, say), which makes the thought of distributing software seem reasonable.
  • quicklisp, a package/library management tool is new (in the last decade or so,) has really raised the level of the ecosystem. It's not as complete as I'd expect in many ways, but quicklisp changed CL from something quaint to something that you could actually imagine using.
  • the object system is really nice. There isn't quite compile time-type checking on the values of slots (attributes) of objects, though you can opt in. My general feeling is that I can pretty easily get the feeling of writing statically typed code with all of the freedom of writing dynamic code.
  • multiple dispatch, and the conceptual approach to genericism, is amazing and really simplifies flow control in a lot of cases. You implement the methods you need, for the appropriate types/objects and then just write the logic you need, and the function call machinery just does the right thing. There's surprisingly little conditional logic, as a result.

Things I don't like:

  • there are all sorts of things that don't quite have existing libraries, and so I find myself wanting to do things that require more effort than necessary. This project to write a binary encoding tool would have been a component in service of a much larger project. It'd be nice if you could skip some of the lower level components, or didn't have your design choices so constrained by gaps in infrastructure.
  • at the same time, the library ecosystem is pretty fractured and there are common tools around which there aren't really consensus. Why are there so many half-finished YAML and JSON libraries? There are a bunch of HTTP server (!) implementations, but really you need 2 and not 5?
  • looping/iteration isn't intuitive and difficult to get common patterns to work. The answer, in most cases is to use (map) with lambdas rather than loops, in most cases, but there's this pitfall where you try and use a (loop) and really, that's rarely the right answer.
  • implicit returns seem like an over sight, hilariously, Rust also makes this error. Implicit returns also make knowing what type a function or method returns quite hard to reason about.

Writing an Encoder

So the project I wrote was an attempt to write really object oriented code as a way to writing an object encoder to a JSON-like format. Simple enough, I had a good mental model of the format, and my general approach to doing any kind of binary format processing is to just write a crap ton of unit tests and work somewhat iteratively.

I had a lot of fun with the project, and it gave me a bunch of key experiences which make me feel comfortable saying that I'm able to write common lisp even if it's not a language that I feel maximally comfortable in (yet?). The experiences that really helped included:

  • producing enough code to really have to think about how packaging and code organization worked. I'd written a function here and there, before, but never something where I needed to really understand and use the library/module/packaging (e.g. systems and libraries.) infrastructure.
  • writing and running lots of tests. I don't always follow test-driven development closely, but writing lots of tests is part of my process, and being able to watch the layers of this code come together was a lot of fun and very instructive.
  • this project for me, was mostly about API design and it was nice to have a project that didn't require much design in terms of the actual functionality, as object encoding is pretty straight forward.

From an educational perspective all of my goals were achieved.

Failure Mode

The problem is that it didn't work out, in the final analysis. While the library I constructed was able to encode and decode objects and was internally correct, I never got it to produce encoding that other implementations of the same specification could reliably read, and the ability to read data encoded by other libraries only worked in trivial cases. In the end:

  • this was mostly a project designed to help me gain competence in a programming language I don't really know, and in that I was successful.
  • adding this encoding format isn't particularly useful to any project I'm thinking of working on in the short term, and doesn't directly enable me to do anything in particular.
  • the architecture of the library would not be particularly performant in practice, as the encoding process didn't deploy a buffer pool of any kind, and it would have been harder than not to back fill that in, and I wasn't particularly interested in that.
  • it didn't work, and the effort to debug the issue would be more substantive than I'm really interested in doing at this point, particularly given the limited utility.

So here we are. Onto a different project!

The Case for Better Build Systems

A lot of my work, these days, focuses on figuring out how to improve how people develop software in ways that reduces the amount of time developers have to spend doing work outside of development and that improves the quality of their work. This post, has been sitting in my drafts folder for the last year, and does a good job of explaining how I locate my work **and* makes a case for high quality generic build system tooling that I continue to feel is compelling.*


Incidentally, it turns out that I wrote an introductory post about buildsystems 6 years ago. Go past me.

Canonically, build systems described the steps required to produce artifacts, as system (graph) of dependencies [1] and these dependencies are between source files (code) and artifacts (programs and packages) with intermediate artifacts all in terms of the files they are or create. Though different development environments, programming languages, and kinds of software have different.

While the canonical "build systems are about producing files," the truth is that the challenge of contemporary _software_ development isn't really just about producing files. Everything from test automation to deployment is something that we can think about as a kind of build system problem.

Let's unwind for a moment. The work of "making software," breaks down into a collection of--reasonably disparate--tasks, which include:

  • collecting requirements (figuring out what people want,)
  • project planning (figuring out how to break larger collections of functionality into more reasonable units.)
  • writing new code in existing code bases.
  • exploring unfamiliar code and making changes.
  • writing tests for code you've recently written, or areas of the code base that have recently chaining.
  • rewriting existing code with functionally equivalent code (refactoring,)
  • fixing bugs discovered by users.
  • fixing bugs discovered by an automated test suite.
  • releasing software (deploying code.)

Within these tasks developers do a lot of little experiments and tests. Make a change, see what it's impact is by doing something like compiling the code, running the program or running a test program. The goal, therefore, of the project of developer productivity projects is to automate these processes and shorten the time it takes to do any of these tasks. In particular the feedback loop between "making a change" and seeing if that change had an impact. The more complex the system that you're developing, with regards to distinct components, dependencies, target platforms, compilation model, and integration's, them the more time you spend in any one of these loops and the less productive you can be.

Build systems are uniquely positioned to suport the development process: they're typically purpose built per-project (sharing common infrastructure,) most projects already have one, and they provide an ideal environment to provide the kind of incremental development of additional functionality and tooling. The model of build systems: the description of processes in terms of dependency graphs and the optimization for local environments means.

The problem, I think, is that build systems tend to be pretty terrible, or at least many suffer a number of classic flaws:

  • implicit assumptions about the build or development environment which make it difficult to start using.
  • unexpressed dependencies on services or systems that the build requires to be running in a certain configuration.
  • improperly configured dependency graphs which end up requiring repeated work.
  • incomplete expression of dependencies which require users to manually complete operations in particular orders.
  • poorly configured defaults which make for overly complex invocations for common tasks.
  • operations distributed among a collection of tools with little integration so that compilation, test automation, release automation, and other functions.

By improving the quality, correctness, and usability of build systems, we:

  • improve the experience for developers every day,
  • make it really easy to optimize basically every aspect of the development process,
  • reduce the friction for including new developers in a project's development process.

I'm not saying "we need to spend more time writing build automation tools" (like make, ninja, cmake, and friends,) or that the existing ones are bad and hard to use (they, by and large are,) but that they're the first and best hook we have into developer workflows. A high quality, trustable, tested, and easy to use build system for a project make development easier, continuous integration easy and maintainable, and ultimately improve the ability of developers to spend more of their time focusing on important problems.

[1]ideally build systems describe directed acylcic graph, though many projects have buildsystems with cyclic dependency graphs that they ignore in some way.

Things I Learned About Queues

I think the queue is a really powerful metaphor for organizing and orchestrating the internal architecture of an application. Once you have a queue, and tasks that are running in that queue, making that system run well requires some attention. This post is very much a sequel to the application framework post.

Factors of your queue implementation and system may impact the applicability of any of these suggestions for your particular application. Additionally, there is lots of work on queue theory so there are formally described properties of queues, and this is really just a collection of informal knowledge that I've collected on this subject. I hope you find it useful!


As the operator of a queue there are two properties: latency, or time to completion, for work in the queue and throughput, or total amount of work completed. These properties are generally trade-offs with each other, and often work to improve throughput will impact latency, and vice versa. It turns out, however, that the theoretical limits of your system's capacity for either latency or throughput are below the actual requirements of your application, so you can generally just focus on improving one area or the other without really feeling like you're trading latency for throughput.

All tasks in the queue should, generally, of similar size in terms of execution time and resource usage. When there are tasks that run slowly or take a long time and tasks that run quickly, you can easily end up in situations where long running tasks group together. Indeed, this isn't just a possibility, but a near certainty. If you can't break work into similar sized units, then you main recourse is to either separate the work into different queues and proportion resources as needed to ensure that both queues are making progress. You generally want to run longer tasks before shorter tasks, but the impact on overall performance depends on other characteristics and the way that your application expects certain kinds of latency and throughput.

Always monitor task runtime (by type,) as well as overall queue depth, and if possible currently running operations. When something goes wrong, or there's an external event that impacts queue performance, you'll need these data to understand the state of your world and debug the underlying issue. Don't wait for something to go wrong to set this up.

Idempotentcy, or the ability of a task to run more than once without chaining the final outcome is a great property in any distributed system, but the more idempotent your operations are the less you have to worry about edge cases where you might run them more than once, particularly around process restarts and deployments. While you generally only want to run things once for efficiency sake, it's important to be able to know that you can run things more than once without causing a crisis.

While it's easy to think about the time a task spends waiting in a queue when tasks are ordered in the queue in a first-in-first-out model, other ordering mechanisms can easily lead to items getting stuck in the queue. Consider the following behaviors:

  • if there are dependencies between tasks, or cases where doing work leads to the creation of more tasks, always run these tasks earlier before other tasks.
  • consider boosting the relative priority of tasks that have been waiting longer relative to other tasks in the queue. If tasks have a priority, and new tasks come in that have higher priority than older tasks, then some lower priority tasks may never get done. While priority of tasks is important, if its important that all tasks get done, balance wait time with priority.
  • alternatively, consider elimiting tasks that have been waiting in the queue for longer than a specified period. These "TTL" for queue items can avoid wasting resources doing work that is not useful.
  • separate desperate priority or types of work into seperate queues to reduce latency. Having additional queues often incurs some per-queue/per-worker resource overhead. When worker infrastructure canbe shared between queues, and both queues are not consistently running at capacity (e.g. have backlogs).
  • quantify the job dispatching overhead. While generally breaking apart larger tasks into smaller execution units improves efficiency, if the overhead of creating, dispatching, and running jobs is a significant portion of a job's runtime, then your system is probably spending too many resources on overhead and you can increase throughput by increasing the overall task size.

There's more, but this definitely covers the basics.