Knitting Pictures

2021-01-18 – tychoish

I’ve never been really good at the blogging+picture game, and while maybe once upon a time it was technical limitation--taking photos and getting them online was complicated--anymore it’s probably not. To this end, I’ve started a knitting specific Instagram account as a kind of photoblog for knitting things. It’s @gestaltknitting, if you’re interested.

While I took this picture a while ago, I must confess that my knitting basically looks the same now.

The same, not because I’ve made no progress, but because sleeves take a while and it’s just plain knitting, so unless you have a very discerning eye, you might miss the details.

Indeed, I really want my next project to also have a lot of plain knitting with black yarn: I expect the photographs will be captivating. Perhaps it will be enjoyable for people to be able to spot the different patterns of embedded cat hair in the sweaters.

I get that knitting is visual for a lot of people, and I do like a smart looking sweater as much as the next guy, but I’ve always felt somewhat resistant to this view: knitting is about the process and the act more than it is about the product, and so the things that are most exciting aren’t the visuals.

While it’s gotten much easier to take high quality pictures, my intention for this book that I’ve been writing is that it mostly would not be a book with a lot of picture, though we’ll see: If anything, I suspect that diagrams and cartoons may be more effective for this kind of application.

Having said that, it’s nice to see what other people are knitting, and I like the way that the ephemeral nature of instagram stories make it less daunting to post in-progress updates on projects. So I’ve definitely been enjoying that.

We’ll see!

Learning Common Lisp Again

2021-01-18 – tychoish

In a recent post I spoke about abandoning a previous project that had gone off the rails, and I’ve been doing more work in Common Lisp, and I wanted to report a bit more, with some recent developments. There’s a lot of writing about learning to program for the first time, and a fair amount of writing about lisp itself, neither are particularly relevant to me, and I suspect there may be others who might find themselves in a similar position in the future.

My Starting Point

I already know how to program, and have a decent understanding of how to build and connect software components. I’ve been writing a lot of Go (Lang) for the last 4 years, and wrote rather a lot of Python before that. I’m an emacs user, and I use a Common Lisp window manager, so I’ve always found myself writing little bits of lisp here and there, but it never quite felt like I could do anything of consequence in Lisp, despite thinking that Lisp is really cool and that I wanted to write more.

My goals and rational are reasonably simple:

I’m always building little tools to support the way that I use computers, nothing is particularly complex, but it’d enjoy being able to do this in CL rather than in other languages, mostly because I think it’d be nice to not do that in the same languages that I work in professionally.¹
Common Lisp is really cool, and I think it’d be good if it were more widely used, and I think by writing more of it and writing posts like this is probably the best way to make that happen.
Learning new things is always good, and I think having a personal project to learn something new will be a good way of stretching my self as a developer. Most of my development as a programmer has focused on
Common Lisp has a bunch of features that I really like in a programming language: real threads, easy to run/produce static binaries, (almost) reasonable encapsulation/isolation features.

On Learning

Knowing how to program makes learning how to program easier: broadly speaking programming languages are similar to each other, and if you have a good model for the kinds of constructs and abstractions that are common in software, then learning a new language is just about learning the new syntax and learning a bit more about new idioms and figuring out how different language features can make it easier to solve problems that have been difficult in other languages.

In a lot of ways, if you already feel confident and fluent in a programming language, learning a second language, is really about teaching yourself how to learn a new language, which you can then apply to all future languages as needed.

Except realistically, “third languages” aren’t super common: it’s hard to get to the same level of fluency that you have with earlier languages, and often we learn “third-and-later” languages are learned in the context of some existing code base or project4, so it’s hard to generalize our familiarity outside of that context.

It’s also the case that it’s often pretty easy to learn a language enough to be able to perform common or familiar tasks, but fluency is hard, particularly in different idioms. Using CL as an excuse to do kinds of programming that I have more limited experience with: web programming, GUI programming, using different kinds of databases.

My usual method for learning a new programming language is to write a program of moderate complexity and size but in a problem space that I know pretty well. This makes it possible to gain familiarity, and map concepts that I understand to new concepts, while working on a well understood project. In short, I’m left to focus exclusively on “how do I do this?” type-problems and not “is this possible,” or “what should I do?” type-problems.

Conclusion

The more I think about it, the more I realize that when we talk about “knowing a programming language,” inevitably linked to a specific kind of programming: the kind of Lisp that I’ve been writing has skewed toward the object oriented end of the lisp spectrum with less functional bits than perhaps average. I’m also still a bit green when it comes to macros.

There are kinds of programs that I don’t really have much experience writing:

GUI things,
the front-half of the web stack,²
processing/working with ASTs, (lint tools, etc.)
lower-level kind of runtime implementation.

There’s lots of new things to learn, and new areas to explore!

Notes

There are a few reasons for this. Mostly, I think in a lot of cases, it’s right to choose programming languages that are well known (Python, Java+JVM friends, and JavaScript), easy to learn (Go), and fit in with existing ecosystems (which vary a bit by domain,) so while it might the be right choice it’s a bit limiting. It’s also the case that putting some boundaries/context switching between personal projects and work projects could be helpful in improving quality of life. ↩︎
Because it’s 2020, I’ve done a lot of work on “web apps,” but most of my work has been focused on areas of applications including including data layer, application architecture, and core business logic, and reliability/observability areas, and less with anything material to rendering web-pages. Most projects have a lot of work to be done, and I have no real regrets, but it does mean there’s plenty to learn. I wrote an earlier post about the problems of the concept of “full-stack engineering” which feels relevant. ↩︎

Pattern Fragment 0

2021-01-15 – tychoish

I was doing some knitting pattern math,¹ and I thought I’d share it without a lot of context:

Cast on 228 stitches using the “German Twisted” method,², placing a marker half way, after 114 stitches. Knit 2 inches of knit 1 purl 1 ribbing.

After two inches, switch to stocking stitch: knit 21 stitches, increase 1 stitch, place a marker, knit 72 stitches, place a marker, increase 1 stitch, knit 21 more stitches. You should have arrived at the “half way” marker from before.

Over the next half (115 stitches), space out 14 increases. This doesn’t divide evenly, so try: knit 5, increase 1 stitch and then knit 8 stitches, increase 1 stitch 13 times, or in short hand: K5 M1, * K8 M1, repeat from * 13 times, K5).

The “first half” is the back of the sweater and the “second” half is the front. Increase one stitch before and after the markers on the back of the sweater 7 times, every 1.5 or 2 inches (somewhere between 10 or 20 rows,) depending on how you’d like the taper.

Meanwhile³ insert 3 sets of short rows across the back of the sweater, which should get wider. For the first short row stop 3 inches from the edges, for the second 2 inches, and for the last 1 inch. I’d put an inch or two between each short row, maybe half way between the first three increases.

Notes

I’ve not, to be clear, actually knit this yet, though I plan to soon. ↩︎
As in this video, though there are many videos that may be more clear for you. I’m pretty sure that learned this method from Meg Swansen and/or Amy Detjin. ↩︎
I have to say, that the “meanwhile” part of knitting patterns is always my favorite. ↩︎

Reknitting Projects

2021-01-14 – tychoish

I’m presently in the middle of knitting a sweater that I knit and designed years and years ago, with only minor modifications, and I have a number of projects that I’m thinking about that involve “reknitting” past projects. While I don’t think that I’ve peaked, or am out of ideas for knitting, it’s very clear to me that novelty isn’t exactly my guiding principle as a knitter: I enjoy the process and the act above all else, and the pleasure of wearing handknits is (for me) mostly about custom fit and less about novelty or fashion, exactly.

The chance to re-knit things, removes a lot of the questions of a design from the process and not only fix mistakes, but also polish and iterate on a garment with less guess work. It’s also the case that these projects often feel like returning to an old friend, which is incredibly comforting. Some of these projects, on my backlog include:

This basic two-color sweaters (colorblock, I suppose,) that I’m presently knitting and have/will knit again where the lower part of the body is in black--or similar very dark--except for the top 3-4 inches of the body in a contrasting color, matched by the sleeves and the collar, which I try and push into the black section.
Alice Starmore’s Faroe Sweater, from Fishermen’s Sweaters, but scaled to actually fit and maybe with a more fitted shoulder. I’ve also, apparently knit a very heavy weight version of the Norway sweater that I never wore, and they’re such great classic designs that are very fun to knit that knitting them again to modernize them sounds like a fun project.
A round pi shawl in a dark color, with no lace work (including using raised bar increases rather than yarn overs), and a contrasing set of stripes along the outer edge. There’s this stripe pattern that I think of as “Calvin Klein” stripes, but I don’t kno what the origin of that association is, the basic plan is three stripes, two wide stripes in the contrasting color, and a thin stripe of the original color in between, with the wide stripes being 3 times the width of the interior stripe.
I’ve knit two sweaters from Joyce Willams' Latvian Dreams book, the sweater on the cover and one that I knit from several charts, using yarn that ended up pilling a lot. They were delightful to knit: the patterns were originally weaving charts rather than knitting patterns, and thus had a 4-way radial summary symmetry that was just fun to knit. I’d like to try some of these again with better yarn and perhaps use this as a space to explore color work again, but in ways that might be more subtle and also well suited to cardigans and the like.
I’ve knit a handful of sweaters with all-over mitten or stocking patterns from various extant knitting traditions, mostly Scandinavian and Turkish, and I think it would be fun to revisit these patterns.

For and Against Garter Stitch

2021-01-13 – tychoish

I never used to like garter stitch¹ very much, and hadn’t really knit things with a lot of garter stitch. Sure, a scarf here or there in the beginning, and I think I used it for the hem of an early sweater that didn’t turn out particularly well. There are so many clever patterns that use a lot of garter stitch, and I’d never really felt it. While I don’t know that I’m rushing to knit or design patterns out of a lot of garter stitch, I’ve definitely discovered that I’ve softened on it over my hiatus.

My earlier discontent with garter stitch was the combination of:

garter stitch is quite dense, because the fabric pulls in so much vertically, so it takes a lot of yarn and a lot of time, and results in a warmer fabric that I often don’t like'
the vertical pull in of the fabric can get pulled out by blocking or by the weight of the fabric which can be rather uneven.
normal tension irregularity is super apparent.
I’ve never much liked the way that knitting things with rows require you to flip the knitting and I don’t like the way that this can break up the rhythm of the knitting.
the strong horizontal line of the garter ridges always feels awkward to work with.
I always struggled to get a selvage edge that I really liked that wasn’t totally sloppy.

These, however, are tractable problems I realized, and I’ve always used a few garter stitches for selvage on the edge of sock heel flaps. The things that I’ve realized:

garter stitch often works best with very fine yarn, which helps ameliorate the additional bulk, and at least for me, helps provide for more even tension.
the look of garter stitch sideways is quite compelling, for me, and in most cases it won’t stretch out in the same way.
a little bit goes a long way, particularly when embedded in another piece of knitting.
I’ve settled down and find that knitting, rather than slipping, the first stitch and giving the yarn a slight tug when knitting the second stitch leads to a pretty clean edge.
designing with garter stitch is quite compelling, because the ratio of stitches to rows is basically 2:1, because of the way the ridges pull in, you can sort of approach it as “square,” picking up one stitch for every garter ridge lays very flat, so the math is never very complicated.

I’m working on a hat where I knit a ~2 inch wide garter stitch strip to fit around my head and then picked up to knit the crown of the hat along one of the sides of the strip, and along the other to knit a lining. I could have used a provisional cast on, of course, but the strip allowed me to be more confident about sizing, and it ends up being pretty sharp.

I’m not sure I’m going to plan to knit things out of primarily garter stitch, but I’ve definitely softened rather a lot.

The fabric that results from knitting all stitches on both the front and back of the fabric. The fabric is dense, and it grows slowly, because the “ridges” account for two rows of knitting and it pulls in rather a lot. ↩︎

Sweater Measurements

2021-01-12 – tychoish

Hand knitting provides the opportunity to customize sizing and shaping to fit your body (or that of whomever you’re knitting for,) and it’s possible to produce garments that really fit, but even though it’s possible it’s not always easy.

First, measuring a body directly is complicated:

posture impacts the measurements, and it’s difficult to get measurements of the body in the kinds of shapes and positions that you’re likely to hold while wearing the garment.
ease, or the difference between the actual measurement of your body and the actual measurement of the garment, is both subjective and a matter of preference.

For this reason, I normally recommend measuring another sweater that has a fit that you enjoy as a starting point, but there are challenges:

measurements for different styles of sweaters can have different internal proportions: the length of the sleeve depends on the width of the shoulders, and the depth of the armhole
most machine produced garments and conventional knitting patterns are based on typical measurements and proportions which are good as starting points but typically leave something to be desired.

While people’s measurements are broadly similar, and proportional, they’re not the same, so if you have slightly longer arms or shoulders that are a bit more broad or angular, the “average” might be off by an inch or two, which might be enough to care about.

I’d still recommend starting from a garment that you know fits well, and record the garment’s measurements as clearly as possible, but also note modifications separately. The basic idea is lay the garment out as flat as possible and measure the garment which is less likely to move than a person. There are three or four measurements that are really critical:

width of body at across the chest below the arms.
width of the body at the bottom hem/edge.
distance from the middle of the back of the neck to the cuff.
length of the sweater from the top of the shoulder to the bottom hem.

Sleeve length is pretty stable when measured from the bottom of the sleeve (where it joins the body at the underarm) to the cuff, as this avoids the impact of shoulder shape on the sleeve. Measuring arm length from a common point, the middle back of the neck, to the cuff is also a stable way to take this measurement. You may also require additional measurement’s if you want the body of the garment to have contores.

While it’s true that you can deduce other measurements from the four basic measurements, there are other fit considerations that are worth noting: width of the sleeve at/above the cuff and at the shoulder; depth, height, and aperture of the collar; as well as “true” shoulder width. May of these details I’ve figured out empirically and iteratively for myself: it’s sometimes difficult to get these measurements correctly from a model garment.

Better Company

2021-01-08 – tychoish

I’ve been using company mode, which is a great completion framework for years now, and in genreal, it’s phenomenal. For a while, however, I’ve had the feeling that I’m not getting completion options at exactly the right frequency that I’d like. And a completion framework that’s a bit sluggish or that won’t offer completions that you’d expect is a drag. I dug in a bit, and got a much better, because of some poor configuration choices I’d made, and I thought I write up my configuration.

Backend Configuration

Company allows for configurable backends, which are just functions that provide completions, many of which are provided in the main company package, but also provided by many third (fourth?) party packages. These backends, then, are in a list which is stored in the company-backends, that company uses to try and find completions. When you get to a moment when you might want to complete things, emacs+company iterate through this list and build a list of expansions. This is pretty straight forward, at least in principle.

Now company is pretty good at making these backends fast, or trying, particularly when the backend might be irrelevant to whatever you’re currently editing--except in some special cases--but it means that the order of things in the list matters sometimes. The convention for configuring company backends is to load the module that provides the backend and then push the new backend onto the list. This mostly works fine, but there are some backends that either aren’t very fast or have the effect of blocking backends that come later (because they’re theoretically applicable to all modes.) These backends to be careful of are: company-yasnippet, company-ispell, and company-dabbrev.

I’ve never really gotten company-ispell to work (you have to configure a wordlist,) and I’ve never been a dabbrev user, but I’ve definitely made the mistake to putting the snippet expansion near the front of the list rather than the end. I’ve been tweaking things recently, and have settled on the following value for company-backends: :

(setq company-backends '(company-capf
             company-keywords
             company-semantic
             company-files
             company-etags
             company-elisp
             company-clang
             company-irony-c-headers
             company-irony
             company-jedi
             company-cmake
             company-ispell
             company-yasnippet))

The main caveat is that everything has to be loaded or have autoloads registered appropriately, particularly for things like jedi (python,) clang, and irony. The “capf” backend is the integration with emacs' default completion-at-point facility, and is the main mechanism by which lap-mode interacts with company, so it’s good to keep that at the top.

Make it Fast

I think there’s some fear that a completion framework like company could impact the perceived responsiveness of emacs as a whole, and as a result there are a couple of knobs for how to tweak things. Having said that, I’ve always run things more aggressively, because I like seeing possible completions fast, and I’ve never seen any real impact on apparent performance or battery utilization. use these settings: :

(setq company-tooltip-limit 20)
(setq company-show-numbers t)
(setq company-idle-delay 0)
(setq company-echo-delay 0)

Configure Prompts

To be honest, I mostly like the default popup, but it’s nice to be able to look at more completions and spill over to helm when needed. It’s a sometimes thing, but it’s quite nice: :

(use-package helm-company
   :ensure t
   :after (helm company)
   :bind (("C-c C-;" . helm-company))
   :commands (helm-company)
   :init
   (define-key company-mode-map (kbd "C-;") 'helm-company)
   (define-key company-active-map (kbd "C-;") 'helm-company))

Full Configuration

Some of the following is duplicated above, but here’s the full configuration that I run with: :

(use-package company
  :ensure t
  :delight
  :bind (("C-c ." . company-complete)
         ("C-c C-." . company-complete)
         ("C-c s s" . company-yasnippet)
         :map company-active-map
         ("C-n" . company-select-next)
     ("C-p" . company-select-previous)
     ("C-d" . company-show-doc-buffer)
     ("M-." . company-show-location))
  :init
  (add-hook 'c-mode-common-hook 'company-mode)
  (add-hook 'sgml-mode-hook 'company-mode)
  (add-hook 'emacs-lisp-mode-hook 'company-mode)
  (add-hook 'text-mode-hook 'company-mode)
  (add-hook 'lisp-mode-hook 'company-mode)
  :config
  (eval-after-load 'c-mode
    '(define-key c-mode-map (kbd "[tab]") 'company-complete))

  (setq company-tooltip-limit 20)
  (setq company-show-numbers t)
  (setq company-dabbrev-downcase nil)
  (setq company-idle-delay 0)
  (setq company-echo-delay 0)
  (setq company-ispell-dictionary (f-join tychoish-config-path "aspell-pws"))

  (setq company-backends '(company-capf
               company-keywords
               company-semantic
               company-files
               company-etags
               company-elisp
               company-clang
               company-irony-c-headers
               company-irony
               company-jedi
               company-cmake
               company-ispell
               company-yasnippet))

  (global-company-mode))

(use-package company-quickhelp
  :after company
  :config
  (setq company-quickhelp-idle-delay 0.1)
  (company-quickhelp-mode 1))

(use-package company-irony
  :ensure t
  :after (company irony)
  :commands (company-irony)
  :config
  (add-hook 'irony-mode-hook 'company-irony-setup-begin-commands))

(use-package company-irony-c-headers
  :ensure t
  :commands (company-irony-c-headers)
  :after company-irony)

(use-package company-jedi
  :ensure t
  :commands (company-jedi)
  :after (company python-mode))

(use-package company-statistics
  :ensure t
  :after company
  :config
  (company-statistics-mode))

Distributed Systems Problems and Strategies

2021-01-07 – tychoish

At a certain scale, most applications end up having to contend with a class of “distributed systems” problems: when a single computer or a single copy of an application can’t support the required throughput of an application there’s not much to do except to distribute it, and therein lies the problem. Taking one of a thing and making many of the thing operate similarly can be really fascinating, and frankly empowering. At some point, all systems become distributed in some way, to a greater or lesser extent. While the underlying problems and strategies are simple enough, distributed systems-type bugs can be gnarly and having some framework for thinking about these kinds of systems and architectures can be useful, or even essential, when writing any kind of software.

Concerns

Application State

Applications all have some kind of internal state: configuration, runtime settings, in addition to whatever happens in memory as a result of running the application. When you have more than one copy of a single logical application, you have to put state somewhere. That somewhere is usually a database, but it can be another service or in some kind of shared file resource (e.g. NFS or blob storage like S3.)

The challenge is not “where to put the state,” because it probably doesn’t matter much, but rather in organizing the application to remove the assumption that state can be stored in the application. This often means avoiding caching data in global variables and avoiding storing data locally on the filesystem, but there are a host of ways in which application state can get stuck or captured, and the fix is generally “ensure this data is always read out of some centralized and authoritative service,” and ensure that any locally cached data is refreshed regularly and saved centrally when needed.

In general, better state management within applications makes code better regardless of how distributed the system is, and when we use the “turn it off and turn it back on,” we’re really just clearing out some bit of application state that’s gotten stuck during the runtime of a piece of software.

Startup and Shutdown

Process creation and initialization, as well as shutdown, is difficult in distributed systems. While most configuration and state is probably stored in some remote service (like a database,) there’s a bootstrapping process where each process gets enough local configuration required to get that configuration and startup from the central service, which can be a bit delicate.

Shutdown has its own problems set of problems, as specific processes need to be able to complete or safely abort in progress operations.

For request driven work (i.e. HTTP or RPC APIs) without statefull or long-running requests (e.g. many websockets and most streaming connections), applications have to stop accepting new connections and let all in progress requests complete before terminating. For other kinds of work, the process has to either complete in progress work or provide some kind of “checkpointing” approach so that another process can pick up the work later.

Horizontal Scalability

Horizontal scalability, being able to increase the capability of an application by adding more instances of the application rather than creasing the resources allotted to the application itself, is one of the reasons that we build distributed systems in the first place,¹ but simply being able to run multiple copies of the application at once isn’t always enough, the application needs to be able to effectively distribute it’s workloads. For request driven work this is genreally some kind of load balancing layer or strategy, and for other kinds of workloads you need some way to distribute work across the application.

There are lots of different ways to provide loadbalancing, and a lot depends on your application and clients, there is specialized software (and even hardware) that provides loadbalancing by sitting “in front of” the application and routing requests to a backend, but there are also a collection of client-side solutions that work quite well. The complexity of load balancing solutions varies a lot: there are some approaches that just distribute responses “evenly” (by number) to a single backend one-by-one (“round-robin”) and some approaches that attempt to distribute requests more “fairly” based on some reporting of each backend or an analysis of the incoming requests, and the strategy here depends a lot on the requirements of the application or service.

For workloads that aren’t request driven, systems require some mechanism of distributing work to workers, ususally with some kind of messaging system, though it’s possible to get pretty far using a just a normal general purpose database to store pending work. The options for managing, ordering, and distributing the work, is the meat of problem.

Challenges

When thinking about system design or architecture, I tend to start with the following questions.

how does the system handle intermittent failures of particular components?
what kind of downtime is acceptable for any given component? for the system as a whole?
how do operations timeout and get terminated, and how to clients handle these kinds of failures?
what are the tolerances for the application in terms of latency of various kinds of operations, and also the tolerances for “missing” or “duplicating” an operation?
when (any single) node or component of the system aborts or restarts abruptly, how does the application/service respond? Does work resume or abort safely?
what level of manual intervention is acceptable? Does the system need to node failure autonomously? If so how many nodes?

Concepts like “node” or “component” or “operation,” can mean different things in different systems, and I use the terms somewhat vaguely as a result. These general factors and questions apply to systems that have monolithic architectures (i.e. many copies of a single type of process which performs many functions,) and service-based architectures (i.e. many different processes performing specialized functions.)

Solutions

Ignore the Problem, For Now

Many applications run in a distributed fashion while only really addressing parts of their relevant distributed systems problems, and in practice it works out ok. Applications may store most of their data in a database, but have some configuration files that are stored locally: this is annoying, and sometimes an out-of-sync file can lead to some unexpected behavior. Applications may have distributed application servers for all request-driven workloads, but may still have a separate single process that does some kind of coordinated background work, or run cron jobs.

Ignoring the problem isn’t always the best solution in the long term, but making sure that everything is distributed (or able to be distributed,) isn’t always the best use of time, and depending the specific application it works out fine. The important part, isn’t always to distribute things in all cases, but to make it possible to distribute functions in response to needs: in some ways I think about this as the “just in time” approach.

Federation

Federated architectures manage distributed systems protocols at a higher level: rather than assembling a large distributed system, build very small systems that can communicate at a high level using some kind of established protocol. The best example of a federated system is probably email, though there are others.²

Federated systems have more complex protocols that have to be specification based, which can be complicated/difficult to build. Also, federated services have to maintain the ability to interoperate with previous versions and even sometimes non-compliant services, which can be difficult to maintain. Federated systems also end up pushing a lot of the user experience into the clients, which can make it hard to control this aspect of the system.

On the upside, specific implementations and instances of a federated service can be quite simple and have straight forward and lightweight implementations. Supporting email for a few users (or even a few hundred) is a much more tractable problem than supporting email for many millions of users.

Distributed Locks

Needing some kind of lock (for mutual exclusion or mutex) is common enough in programming, and provide some kind of easy way to ensure that only a single actor has access to a specific resource. Doing this within a single process involves using kernel (futexes) or programming language runtime implementations, and is simple to conceptualize, and while the concept in a distributed system is functionally the same, the implementation of distributed locks are more complicated and necessarily slower (both the lock themselves, and their impact on the system as a whole).

All locks, local or distributed can be difficult to use correctly: the lock must be acquired before using the resource, and it must fully protect the resource, without protecting too much and having a large portion of functionality require the lock. So while locks are required sometimes, and conceptually simple, using them correctly is hard. With that disclaimer, to work, distributed locks require:³

some concept of an owner, which must be sufficiently specific (hostname, process identifier,) but that should be sufficiently unique to protect against process restarts, host renaming and collision.
lock status (locked/link) and if the lock has different modes, such as a multi-reader/single-writer lock, then that status.
a timeout or similar mechanism to prevent deadlocks if the actor holding a lock halts or becomes inaccessible, the lock is eventually released.
versioning, to prevent stale actors from modifying the same lock. In the case that actor-1 has a lock and stalls for longer than the timeout period, such that actor-2 gains the lock, when actor-1 runs again it must know that its been usurped.

Not all distributed systems require distributed locks, and in most cases, transactions in the data layer, provide most of the isolation that you might need from a distributed lock, but it’s a useful concept to have.

Duplicate Work (Idempotency)

For a lot of operations, in big systems, duplicating some work is easier and ultimately faster than coordinating and isolating that work in a single location. For this, having idempotent operations⁴ is useful. Some kinds of operations and systems make idempotency easier to implement, and in cases where the work is not idempotent (e.g. as in data processing or transformation,) the operation can be, by attaching some kind of clock to the data or operation.⁵

Using clocks and idempotency makes it possible to maintain data consistency without locks. At the same time, some of the same considerations apply. Having all operations duplicated is difficult to scale so having ways for operations to abort early can be useful.

Consensus Protocols

Some operations can’t be effectively distributed, but are also not safe to duplicate. Applications can use consensus protocols to do “leader election,” to ensure that there’s only one node “in charge” at a time, and the protocol. This is common in database systems, where “single leader” systems are useful for balancing write performance in distributed context. Consensus protocols have some amount of overhead, and are good for systems of a small to moderate size, because all elements of the system must communicate with all other nodes in the system.

The two prevailing consensus protocols are Paxos and Raft--pardoning the oversimplification here--with Raft being a simpler and easier to implement imagination of the same underlying principles. I’ve characterized consensus as being about leader election, though you can use these protocols to allow a distributed system to reach agreement on any manner of operations or shared state.

Queues

Building a fully generalized distributed application with consensus is a very lofty proposition, and commonly beyond the scope of most applications. If you can characterize the work of your system as discrete units of work (tasks or jobs,) and can build or access a queue mechanism within your application that supports workers on multiple processes, this might be enough to support a great deal of your distributed requirements for the application.

Once you have reliable mechanisms and abstractions for distributing work to a queue, scaling the system can be managed outside of the application by using different backing systems, or changing the dispatching layer, and queue optimization is pretty well understood. There are lots of different ways to schedule and distribute queued work, but perhaps this is beyond the scope of this article.

I wrote one of these, amboy, but things like gearman and celery do this as well, and many of these tools are built on messaging systems like Kafka or AMPQ, or just use general purpose databases as a backend. Keeping a solid abstraction between the applications queue and then messaging system seems good, but a lot depends on your application’s workload.

Delegate to Upstream Services

While there are distributed system problems that applications must solve for themselves, in most cases no solution is required! In practice many applications centralize a lot of their concerns in trusted systems like databases, messaging systems, or lock servers. This is probably correct! While distributed systems are required in most senses, distributed systems themselves are rarely the core feature of an application, and it makes sense to delegate these problem to services that that are focused on solving this problem.

While multiple external services can increase the overall operational complexity of the application, implementing your own distributed system fundamentals can be quite expensive (in terms of developer time), and error prone, so it’s generally a reasonable trade off.

Conclusion

I hope this was as useful for you all as it has been fun for me to write!

In most cases, some increase in reliability, by adding redundancy is a strong secondary motivation. ↩︎
xmpp , the protocol behind jabber which powered/powers many IM systems is another federated example, and the fediverse points to others. I also suspect that some federation-like features will be used at the infrastructure layer to coordinate between constrained elements (e.g. multiple k8s clusters will use federation for coordination, and maybe multi-cloud/multi-region orchestration as well…) ↩︎
This article about distributed locks in redis was helpful in summarizing the principles for me. ↩︎
An operation is idempotent if it can be performed more than once without changing the outcome. For instance, the operation “increment the value by 10” is not idempotent because it increments a value every time it runs, so running the operation once is different than running it twice. At the same time the operation “set the value to 10” is idempotent, because the value is always 10 at the end of the operation. ↩︎
Clocks can take the form of a “last modified timestamp,” or some kind of versioning integer associated with a record. Operations can check their local state against a canonical record, and abort if their data is out of date. ↩︎

My Starting Point#

On Learning#

Conclusion#

Notes#

Notes#

Backend Configuration#

Make it Fast#

Configure Prompts#

Full Configuration#

Concerns#

Application State#

Startup and Shutdown#

Horizontal Scalability#

Challenges#

Solutions#

Ignore the Problem, For Now#

Federation#

Distributed Locks#

Duplicate Work (Idempotency)#

Consensus Protocols#

Queues#

Delegate to Upstream Services#

Conclusion#

My Starting Point

On Learning

Conclusion

Notes

Notes

Backend Configuration

Make it Fast

Configure Prompts

Full Configuration

Concerns

Application State

Startup and Shutdown

Horizontal Scalability

Challenges

Solutions

Ignore the Problem, For Now

Federation

Distributed Locks

Duplicate Work (Idempotency)

Consensus Protocols

Queues

Delegate to Upstream Services

Conclusion