I think the queue is a really powerful metaphor for organizing and orchestrating the internal architecture of an application. Once you have a queue, and tasks that are running in that queue, making that system run well requires some attention. This post is very much a sequel to the application framework post.

Factors of your queue implementation and system may impact the applicability of any of these suggestions for your particular application. Additionally, there is lots of work on queue theory so there are formally described properties of queues, and this is really just a collection of informal knowledge that I've collected on this subject. I hope you find it useful!


As the operator of a queue there are two properties: latency, or time to completion, for work in the queue and throughput, or total amount of work completed. These properties are generally trade-offs with each other, and often work to improve throughput will impact latency, and vice versa. It turns out, however, that the theoretical limits of your system's capacity for either latency or throughput are below the actual requirements of your application, so you can generally just focus on improving one area or the other without really feeling like you're trading latency for throughput.

All tasks in the queue should, generally, of similar size in terms of execution time and resource usage. When there are tasks that run slowly or take a long time and tasks that run quickly, you can easily end up in situations where long running tasks group together. Indeed, this isn't just a possibility, but a near certainty. If you can't break work into similar sized units, then you main recourse is to either separate the work into different queues and proportion resources as needed to ensure that both queues are making progress. You generally want to run longer tasks before shorter tasks, but the impact on overall performance depends on other characteristics and the way that your application expects certain kinds of latency and throughput.

Always monitor task runtime (by type,) as well as overall queue depth, and if possible currently running operations. When something goes wrong, or there's an external event that impacts queue performance, you'll need these data to understand the state of your world and debug the underlying issue. Don't wait for something to go wrong to set this up.

Idempotentcy, or the ability of a task to run more than once without chaining the final outcome is a great property in any distributed system, but the more idempotent your operations are the less you have to worry about edge cases where you might run them more than once, particularly around process restarts and deployments. While you generally only want to run things once for efficiency sake, it's important to be able to know that you can run things more than once without causing a crisis.

While it's easy to think about the time a task spends waiting in a queue when tasks are ordered in the queue in a first-in-first-out model, other ordering mechanisms can easily lead to items getting stuck in the queue. Consider the following behaviors:

  • if there are dependencies between tasks, or cases where doing work leads to the creation of more tasks, always run these tasks earlier before other tasks.
  • consider boosting the relative priority of tasks that have been waiting longer relative to other tasks in the queue. If tasks have a priority, and new tasks come in that have higher priority than older tasks, then some lower priority tasks may never get done. While priority of tasks is important, if its important that all tasks get done, balance wait time with priority.
  • alternatively, consider elimiting tasks that have been waiting in the queue for longer than a specified period. These "TTL" for queue items can avoid wasting resources doing work that is not useful.
  • separate desperate priority or types of work into seperate queues to reduce latency. Having additional queues often incurs some per-queue/per-worker resource overhead. When worker infrastructure canbe shared between queues, and both queues are not consistently running at capacity (e.g. have backlogs).
  • quantify the job dispatching overhead. While generally breaking apart larger tasks into smaller execution units improves efficiency, if the overhead of creating, dispatching, and running jobs is a significant portion of a job's runtime, then your system is probably spending too many resources on overhead and you can increase throughput by increasing the overall task size.

There's more, but this definitely covers the basics.