Programming in the Common Lisp Ecosystem

I’ve been writing more and more Common Lips recently and while I reflected a bunch on the experience in a recent post that I recently followed up on .

Why Ecosystems Matter

Most of my thinking and analysis of CL comes down to the ecosystem: the language has some really compelling (and fun!) features, so the question really comes down to the ecosystem. There are two main reasons to care about ecosystems in programming languages:

  • a vibrant ecosystem cuts down the time that an individual developer or team has to spend doing infrastructural work, to get started. Ecosystems provide everything from libraries for common tasks as well as conventions and established patterns for the big fundamental application choices, not to mention things like easily discoverable answers to common problems.

    The more time between “I have an idea” to “I have running (proof-of-concept quality) code running,” matters so much. Everything is possible to a point, but the more friction between “idea” and “working prototype” can be a big problem.

  • a bigger and more vibrant ecosystem makes it more tenable for companies/sponsors (of all sizes) to choose to use Common Lisp for various projects, and there’s a little bit of a chicken and egg problem here, admittedly. Companies and sponsors want to be confidence that they’ll be able to efficiently replace engineers if needed, integrate or lisp components into larger ecosystems, or be able to get support problems. These are all kind of intangible (and reasonable!) and the larger and more vibrant the ecosystem the less risk there is.

    In many ways, recent developments in technology more broadly make lisp slightly more viable, as a result of making it easier to build applications that use multiple languages and tools. Things like microservices, better generic deployment orchestration tools, greater adoption of IDLs (including swagger, thrift and GRPC,) all make language choice less monolithic at the organization level.

Great Things

I’ve really enjoyed working with a few projects and tools. I’ll probably write more about these individually in the near future, but in brief:

  • chanl provides. As a current/recovering Go programmer, this library is very familiar and great to have. In some ways, the API provides a bit more introspection, and flexibility that I’ve always wanted in Go.
  • lake is a buildsystem tool, in the tradition of make, but with a few additional great features, like target namespacing, a clear definition between “file targets” and “task targets,” as well as support for SSH operations, which makes it a reasonable replacement for things like fabric, and other basic deployment tools.
  • cl-docutils provides the basis for a document processing system. I’m particularly partial because I’ve been using the python (reference) implementation for years, but the implementation is really quite good and quite easy to extend.
  • roswell is really great for getting started with CL, and also for making it possible to test library code against different implementations and versions of the language. I’m a touch iffy on using it to install packages into it’s own directory, but it’s pretty great.
  • ASDF is the “buildsystem” component of CL, comparable to setuptools in python, and it (particularly the latest versions,) is really great. I like the ability to produce binaries directly from asdf, and the “package-inferred” is a great addition (basically, giving python-style automatic package discovery.)
  • There’s a full Apache Thrift implementation. While I’m not presently working on anything that would require a legit RPC protocol, being able to integrate CL components into larger ecosystem, having the option is useful.
  • Hunchensocket adds websockets! Web sockets are a weird little corner of any stack, but it’s nice to be able to have the option of being able to do this kind of programming. Also CL seems like a really good platform to do
  • make-hash makes constructing hash tables easier, which is sort of needlessly gawky otherwise.
  • ceramic provides bridges between CL and Electron for delivering desktop applications based on web technologies in CL.

I kept thinking that there wouldn’t be good examples of various things, (there’s a Kafka driver! there’s support for various other Apache ecosystem components,) but there are, and that’s great. There’s gaps, of course, but fewer, I think, than you’d expect.

The Dark Underbelly

The biggest problem in CL is probably discoverability: lots of folks are building great tools and it’s hard to really know about the projects.

I thought about phrasing this as a kind of list of things that would be good for supporting bounties or something of the like. Also if I’ve missed something, please let me know! I’ve tried to look for a lot of things, but discovery is hard.

Quibbles

  • rove doesn’t seem to work when multi-threaded results effectively. It’s listed in the readme, but I was able to write really trivial tests that crashed the test harness.
  • Chanl would be super lovely with some kind of concept of cancellation (like contexts in Go,) and while it’s nice to have a bit more thread introspection, given that the threads are somewhat heavier weight, being able to avoid resource leaks seems like a good plan.
  • There doesn’t seem to be any library capable of producing YAML formated data. I don’t have a specific need, but it’d be nice.
  • it would be nice to have some way of configuring the quicklisp client to be able to prefer quicklisp (stable) but also using ultralisp (or another source) if that’s available.
  • Putting the capacity in asdf to produce binaries easily is great, and the only thing missing from buildapp/cl-launch is multi-entry binaries. That’d be swell. It might also be easier as an alternative to have support for some git-style sub-commands in a commandline parser (which doesn’t easily exist at the moment'), but one-command-per-binary, seems difficult to manage.
  • there are no available implementations of a multi-reader single-writer mutex, which seems like an oversite, and yet, here we are.

Bigger Projects

  • There are no encoders/decoders for data formats like Apache Parquet, and the protocol buffers implementation don’t support proto3. Neither of these are particular deal breakers, but having good tools dealing with common developments, lowers to cost and risk of using CL in more applications.
  • No support for http/2 and therefore gRPC. Having the ability to write software in CL with the knowledge that it’ll be able to integrate with other components, is good for the ecosystem.
  • There is no great modern MongoDB driver. There were a couple of early implementations, but there are important changes to the MongoDB protocol. A clearer interface for producing BSON might be useful too.
  • I’ve looked for libraries and tools to integrate and manage aspects of things like systemd, docker, and k8s. k8s seems easiest to close, as things like cube can be generated from updated swagger definitions, but there’s less for the others.
  • Application delievery remains a bit of an open. I’m particularly interested in being able to produce binaries that target other platforms/systems (cross compilation,) but there are a class of problems related to being able to ship tools once built.
  • I’m eagerly waiting and concerned about the plight of the current implementations around the move of ARM to Darwin, in the intermediate term. My sense is that the transition won’t be super difficult, but it seems like a thing.

How to Write Performant Software and Why You Shouldn't

I said a thing on twitter that I like, and I realized that I hadn’t really written (or ranted) much about performance engineering, and it seemed like a good thing to do. Let’s get to it.

Making software fast is pretty easy:

  • Measure the performance of your software at two distinct levels:

    • figure out how to isolate specific operations, as in unit test, and get the code to run many times, and measure how long the operations take.
    • Run meaningful units of work, as in integration tests, to understand how the components of your system come together.

    If you’re running a service, sometimes tracking the actual timing of actual operations over time, can also be useful, but you need a lot of traffic for this to be really useful. Run these measurements regularly, and track the timing of operations over time so you know when things actually chair.

  • When you notice something is slow, identify the slow thing and make it faster. This sounds silly, but the things that are slow usually fall into one of a few common cases:

    • an operation that you expected to be quick and in memory, actually does something that does I/O (either to a disk or to the network,)
    • an operation allocates more memory than you expect, or allocates memory more often than you expect.
    • there’s a loop that takes more time than you expect, because you expected the number of iterations to be small (10?) and instead there are hundreds or thousands of operations.

    Combine these and you can get some really weird effects, particularly over time. An operation that used to be quick gets slower over time, because the items iterated over grows, or a function is called in a loop that used to be an in-memory only operation, now accesses the database, or something like that. The memory based ones can be trickier (but also end up being less common, at least with more recent programming runtimes.)

Collect data, and when something gets slower you should fix it.

Well no.

Most of the time slow software doesn’t really matter. The appearance of slowness or fastness is rarely material to user’s experience or the bottom line. If software gets slower, most of the time you should just let it get slower:

  • Computers get faster and cheaper over time, so most of the time, as long as your rate of slow down is slow and steady over time, its usually fine to just ride it out. Obviously big slow downs are a problem, but a few percent year-over-year is so rarely worth fixing.

    It’s also the case that runtimes and compilers are almost always getting faster, (because compiler devlopers are, in the best way possible, total nerds,) so upgrading the compiler/runtime regularly often offsets regular slow down over time.

  • In the vast majority of cases, the thing that makes software slow is doing I/O (disks or network,) and so your code probably doesn’t matter and so what your code does is unlikely to matter much and you can solve the problem by changing how traffic flows through your system.

    For IX (e.g. web/front-end) code, the logic is a bit different, because slow code actually impacts user experience, and humans notice things. The solution here, though, is often not about making the code faster, but in increasingly pushing a lot of code to “the backend,” (e.g. avoid prossing data on the front end, and just make sure the backend can always hand you exactly the data you need and want.)

  • Code that’s fast is often harder to read and maintain: to make code faster, you often have to be careful and avoid using certain features of your programming language or runtime (e.g. avoiding ususing heap allocations, or reducing the size of allocations by encoding data in more terse ways, etc,) or by avoiding libraries that are “slower,” or that use certain abstractions, all of which makes your code less conventional more difficult to read, and harder to debug. Programmer time is almost always more expensive than compute time, so unless it’s big or causing problems, its rarely worth making code harder to read.

    Sometimes, making things actually faster is actually required. Maybe you have a lot of data that you need to get through pretty quickly and there’s no way around it, or you have some classically difficult algorithm problem (graph search, say,), but in the course of generally building software this happens pretty rarely, and again most of the time pushing the problem “up” (from the front end to the backend and from the backend to the database, similar,) solves whatever problems you might have.

  • While there are obviously pathological counter-examples, ususally related to things that happen in loops, but a lot of operations never have to be fast because they sit right next to another operation that’s much slower:

    • Lots of people analyze logging tools for speed, and this is almost always silly because all log messages either have to be written somewhere (I/O) and generally there has to be something to serialize messages (a mutex or functional equivalent,) somewhere because you want to write only one message at a time to the output, so even if you have a really “fast logger” on its own terms, you’re going to hit the I/O or the serializing nature of the problem. Use a logger that has the features you have and is easy to use, speed doesn’t matter.
    • anything in HTTP request routing and processing. Because request processing sits next to network operations, often between a database as well as to the client, any sort of gain by using a “a faster web framework,” is probably immeasurable. Use the ones with the clearest feature set.

Continuous Integration is Harder Than You Think

I’ve been working on continuous integration systems for a few years, and while the basic principle of CI is straightforward, it seems that most CI deployments are not. This makes sense: project infrastructure is an easy place to defer maintenance during the development cycle, and projects often prioritize feature development and bug fixing over tweaking the buildsystem or test infrastructure, but I almost think there’s something more. This post is a consideration of what makes CI hard and perhaps provide a bit of unsolicited advice.

The Case for CI

I suppose I don’t really have to sell anyone on the utility or power of CI: running a set of tests on your software regularly allows developers and teams to catch bugs early, and saves a bucket of developer time, and that is usually enough. Really, though, CI ends up giving you the leverage to solve a number of really gnarly engineering problems:

  • how to release software consistently and regularly.
  • how to support multiple platforms.
  • how to manage larger codebases.
  • anything with distributed systems.
  • how to develop software with larger numbers of contributors.

Doing any of these things without CI isn’t really particularly viable, particularly at scale. This isn’t to say, that they “come free” with CI, but that CI is often the right place to build the kind of infrastructure required to manage distributed systems problems or release complexity.

Buildsystems are Crucial

One thing that I see teams doing some times is addressing their local development processes and tooling with a different set of expectations than they do in CI, and you can totally see and understand how this happens: the CI processes always start from a clean environment, and you often want to handle failures in CI differently than you might handle a failure locally. It’s really easy to write a shell script that only runs in CI, and then things sort of accumulate, and eventually there emerge a class of features and phenomena that only exist for and because of CI.

The solution is simple: invest in your buildsystem,1 and ensure that there is minimal (or no!) indirection between your buildsystem and your CI configuration. But buildsystems are hard, and in a lot of cases, test harnesses aren’t easily integrated into build systems, which complicates the problem for some. Having a good build system isn’t particularly about picking a good tool, though there are definitely tradeoffs for different tools, the problem is mostly in capturing logic in a consistent way, providing a good interface, and ensuring that the builds happen as efficiently as possible.

Regardless, I’m a strong believer in centralizing as much functionality in the buildsystem as possible and making sure that CI just calls into build systems. Good build systems:

  • allow you to build or rebuild (or test/subtest) only subsets of work, to allow quick iteration during development and debugging.
  • center around a model of artifacts (things produced) and dependencies (requires-type relationships between artifacts).
  • have clear defaults, automatically detect dependencies and information from the environment, and perform any required set up and teardown for the build and/or test.
  • provide a unified interface for the developer workflow, including building, testing, and packaging.

The upside, is that effort that you put into the development of a buildsystem pay dividends not just for managing to complexity of CI deployments, but also make the local development stable and approachable for new developers.

T-Shaped Matrices

There’s a temptation with CI systems to exercise your entire test suite with a comprehensive and complete range of platforms, modes, and operations. While this works great for some smaller projects, “completism” is not the best way to model the problem. When designing and selecting your tests and test dimensions, consider the following goals and approaches:

  • on one, and only one, platform run your entire test suite. This platform should probably be very close to the primary runtime of your environment (e.g. when developing a service that runs on Linux service, your tests should run in a system that resembles the production environment,) or possibly your primary development environment.
  • for all platforms other than your primary platform, run only the tests that are either directly related to that runtime/platform (e.g. anything that might be OS or processor specific,) plus some small subset of “verification” or acceptance tests. I would expect that these tests should easily be able to complete in 10% of the time of a “full build,”
  • consider operational variants (e.g. if your product has multiple major-runtime modes, or some kind of pluggable sub-system) and select the set of tests which verifies these modes of operations.

In general the shape of the matrix should be t-shaped, or “wide across” with a long “narrow down.” The danger more than anything is in running too many tests, which is a problem because:

  • more tests increase the chance of a false negative (caused by the underlying systems infrastructure, service dependencies, or even flakey tests,) which means you risk spending more time chasing down problems. Running tests that provide signal is good, but the chance of false negatives is a liability.
  • responsiveness of CI frameworks is important but incredibly difficult, and running fewer things can improve responsiveness. While parallelism might help some kinds of runtime limitations with larger numbers of tests, this incurs overhead, is expensive.
  • actual failures become redundant, and difficult to attribute failures in “complete matrices.” A test of certain high level systems may pass or fail consistently along all dimensions creating more noise when something fails. With any degree of non-determinism or chance of a false-negative, running tests more than once just make it difficult to attribute failures to a specific change or an intermittent bug.
  • some testing dimensions don’t make sense, leading to wasted time addressing test failures. For example when testing an RPC protocol library that supports both encryption and authentication, it’s not meaningful to test the combination of “no-encryption” and “authentication,” although the other three axes might be interesting.

The ultimate goal, of course is to have a test matrix that you are confident will catch bugs when they occur, is easy to maintain, and helps you build confidence in the software that you ship.

Conclusion

Many organizations have teams dedicated maintaining buildsystems and CI, and that’s often appropriate: keeping CI alive is of huge value. It’s also very possible for CI and related tools to accrued complexity and debt in ways that are difficult to maintain, even with dedicated teams: taking a step back and thinking about CI, buildsystems, and overall architecture strategically can be very powerful, and really improve the value provided by the system.


  1. Canonically buildsystems are things like makefiles (or cmake, scons, waf, rake, npm, maven, ant, gradle, etc.) that are responsible for converting your source files into executable, but the lines get blurry in a lot of languages/projects. For Golang, the go tool plays the part of the buildsystem and test harness without much extra configuration, and many environments have a pretty robust separation between building and testing. ↩︎

Get More Done

It’s really easy to over think the way that we approach our work and manage our own time and projects. There are no shortage of tools, services, books, and methods to organizing your days and work, and while there are a lot of good ideas out there, it’s easy to get stuck fiddling with how you work, at the expense of actuallying getting work done. While I’ve definitely thought about this a lot over time, for a long time, I’ve mostly just done things and not really worried much about the things on my to-do list.1

I think about the way that I work similarly to the way that I think about the way I work with other people. The way you work alone is different from collaboration, but a lot of the principles of thinking about big goals, and smaller actionable items is pretty transferable.

My suggestions here are centered around the idea that you have a todo list, and that you spend a few moments a day looking at that list, but actually I think the way I think about my work is really orthogonal to any specific tools. For years, most of my personal planning has revolved around making a few lists in a steno pad once or twice a day,2 though I’ve been trying to do more digital things recently. I’m not sure I like it. Again, tools don’t matter.

Smaller Tasks are Always Better

It’s easy to plan projects from the “top down,” and identify the major components and plan your work around those components, and the times that I run in to trouble are always the times when my “actionable pieces” are too big. Smaller pieces help you build momentum, allow to move around to different areas as your attention and focus change, and help you use avalible time effectively (when you want.)

It’s easy to find time in-between meetings, or while the pasta water is boiling, to do something small and quick. It’s also very easy to avoid starting something big until you have a big block of unfettered time. The combination of these factors makes bigger tasks liabilities, and more likely to take even longer to complete.

Multi-Task Different Kinds of Work

I read a bunch of articles that suggest that the way to be really productive is to figure out ways of focusing and avoiding context switches. I’ve even watched a lot of coworkers organize their schedules and work around these principles, and it’s always been something of a mystery for me. It’s true that too much multi-tasking and context switching can lead to a fragmented experience and make some longer/complicated tasks harder to really dig into, but it’s possible to manage the costs of context switching, by breaking apart bigger projects into smaller projects and leaving notes for your (future) self as you work.

Even if you don’t do a lot of actual multitasking within a given hour or day of time, it’s hard to avoid really working on different kinds of projects on the scale of days or weeks, and I’ve found that having multiple projects in flight at once actually helps me get more done. In general I think of this as the idea that more projects in flight means that you finish things more often, even if the total number of projects completed is the same in the macro context.

Regardless, different stages of a project require different kind of attention and energy and having a few things in flight increases the chance that when you’re in the mood to do some research, or editing, or planning, you have a project with that kind of work all queued up. I prefer to be able to switch to different kinds of work depending on my attention and mood. In general my work falls into the following kinds of activities:

  • planning (e.g. splitting up big tasks, outlining, design work,)
  • generative work (e.g. writing, coding, etc.)
  • organizational (email, collaboration coordination, user support, public issue tracking, mentoring, integration, etc.)
  • polishing (editing, writing/running tests, publication prepping,)
  • reviewing (code review, editing, etc.)

Do the Right Things

My general approach is “do lots of things and hope something sticks,” which makes the small assumption that all of the things you do are important. It’s fine if not everything is the most important, and it’s fine to do things a bit out of order, but it’s probably a problem if you do lots of things without getting important things done.

So I’m not saying establish a priority for all tasks and execute them in strictly that priority, at all. Part of the problem is just making sure that the things on your list are still relevant, and still make sense. As we do work and time passes, we have to rethink or rechart how we’re going to complete a project, and that reevaluation is useful.

Prioritization and task selection is incredibly hard, and it’s easy to cast “prioritization” in over simplified terms. I’ve been thinking about prioritization, for my own work, as being a decision based on the following factors:

  • deadline (when does this have to be done: work on things that have hard deadlines or expected completion times, ordered by expected completion date, to avoid needing to cram at the last moment.)
  • potential impact (do things that will have the greatest impact before lesser impact, this is super subjective, but can help build momentum, and give you a chance to decide if lower-impact items are worth doing.)
  • time availability fit (do the biggest thing you can manage with the time you have at hand, as smaller things are easier to fit in later,)
  • level of understanding (work on the things that you understand the best, and give yourself the opportunity to plan things that you don’t understand later. I sometimes think about this as “do easy things first,” but that might be too simple.)
  • time outstanding (how long ago was this task created: do older things first to prevent them from becoming stale.)
  • number of things (or people) that depend on this being done (work on things that will unblock other tasks or collaborators before things that don’t have any dependencies, to help increase overall throughput.)

Maintain a Pipeline of Work

Productivity, for me, has always been about getting momentum on projects and being able to add more things. For work projects, there’s (almost) always a backlog of tasks, and the next thing is ususally pretty obvious, but sometimes this is harder for personal projects. I’ve noticed a tendency in myself to prefer “getting everything done” on my personal todo list, which I don’t think particularly useful. Having a pipleine of backlog of work is great:

  • there’s always something next to do, and there isn’t a moment when you’ve finished and have to think about new things.
  • keeping a list of things that you are going to do in the more distant future lets you start thinking about how bigger pieces fit together without needint to starting to work on that.
  • you can add big things to your list(s) and then break them into smaller pieces as you make progress.

As an experiment, think about your todo list, not as a thing that you’d like to finish all of the items, but as list that shouldn’t be shorter than a certain amount (say 20 or 30?) items with rate of completion (10 a week?) though you should choose your own numbers, and set goals based on what you see yourself getting done over time.


  1. Though, to be clear, I’ve had the pleasure and benefit of working in an organization that lives-and-dies by a bug tracking system, with a great team of folks doing project management. So there are other people who manage sprints, keep an eye on velocity, and make sure that issues don’t get stuck. ↩︎

  2. My general approach is typically to have a “big projects” or “things to think about” list and a “do this next list”, with occasional lists about all the things in a specific big project. In retrospect these map reasonable well to SCRUM/Agile concepts, but it also makes sense. ↩︎

Open Source Emacs Configuration Improvements

In retrospect I’m not totally sure why I released my emacs configuration to the world. I find tweaking Emacs Lisp to be soothing, and in 2020 these kinds of projects are particularly welcome. I’ve always thought about making it public: I feel like I get a lot out of Emacs, and I’m super aware that it’s very hard for people who haven’t been using Emacs forever to get a comparable experience.1

I also really had no idea of what to expect, and while it’s still really recent, I’ve noticed a few things which are worth remarking:

  • Making your code usable for other people really does make it easy for people to find bugs. While it’s likely that there are bugs that people never noticed, I found a few things very quickly:

    • Someone reported higher than expected CPU use, and I discovered that there were a number of functions that ran regularly in timers, and I was able to quickly tune some knobs in order to reduce average CPU use by a lot. This is likely to be great both for the user in question, but also because it’ll help battery life.
    • The config includes a git submodule (!) with the contents of all third-party packages, mostly to reduce friction for people getting started. Downloading all of the packages fresh from the archive would take a few minutes, and the git clone is just faster. I realized, when someone ran into some problems when running with emacs 28 (e.g. the development/mainline build,) that the byte-compilation formats were different, which made the emacs27 files not work on emacs28. I pushed a second branch.

    More than anything the experience of getting bug reports and feedback has been great. It both makes it possible to focus time because the impact of the work is really clear, and it also makes it clear to me that I’ve accumulated some actually decent Emacs Lisp skills, without really noticing it.2

  • I was inspired to make a few structural improvements.

    • For a long time, including after the initial release, I had a “settings” file, and a “local functions” file that held code that I’d written or coppied from one place or another, and I finally divided them all into packages named tychoish-<thing>.el which allowed me to put all or most of the configuration into use-package forms, which is more consistent and also helps startup time a bit, and makes the directory structure a bit easier.
    • I also cleaned up a bunch of local snippets that I’d been carrying around, which wasn’t hurting anything but is a bit more clear in the present form.
  • I believe that I’ve hit the limit, with regards to startup speed. I’d really like to get a GUI emacs instance to start (with no buffers) in less than a second, but it doesn’t seem super plausible. I got really close. At this point there are two factors that constrain:

    • Raw CPU speed. I have two computers, and the machine with the newer CPU is consistently 25% faster than the slow computer.
    • While the default configuration doesn’t do this, my personal configuration sets a font (this is reasonable,) but seems that the time to do this is sometimes observable, and proportional to the number of fonts you have installed on the system.3
    • Dependencies during the early load. I was able to save about 10% time by moving a function between package to reduce the packages that startup code depended upon. There’s just a limit to how much you can clean up here.

    Having said that, these things can drift pretty easily. I’ve added some helper macros with-timer and with-slow-op-timer that I can use to report the timing of operations during startup to make sure that things don’t slow down.

    Interestingly, I’ve experimented with byte-compiling my local configuration and I haven’t really noticed much of a speedup at this scale, so for ease I’ve been leaving my own lisp directory unbytecompiled.

  • With everything in order, there’s not much to edit! I guess I’ll have other things to work on, but I have made a few improvements, generally:

    • Using the alert package for desktop notification, which allowed me to delete a legacy package I’ve been using. Deleting code is awesome.
    • I finally figured out how to really take advantage of projectile, which is now configured correctly, and has been a lot of help in my day-to-day work.
    • I’ve started using ERC more, and only really using my irssi (in screen) session as a fallback. My IRC/IM setup is a bit beyond the scope of this post but ERC has been a bit fussy to use on machines with intermittent connections, but I think I’ve been able to tweak that pretty well and have an experience that’s quite good.

It’s been interesting! And I’m looking forward to continuing to do this!


  1. Sure, other editors also have long setup curves, but Emacs is particularly gnarly in this regard, and I think adoption by new programmers is definitely constrained by this fact. ↩︎

  2. I never really thought of myself as someone who wrote Emacs Lisp: I’ve never really written a piece of software in Emacs, it’s always been a function here or there, or modifying some snippet from somewhere. I don’t know if I have a project or a goal that would involve writing more emacs software, but it’s nice to recognize that I’ve accidentally acquired a skill. ↩︎

  3. On Windows and macOS systems this may not matter, but you may have more fonts installed than you need. I certianly did. Be aware that webbrowsers often downlaod their own fonts separately from system fonts, so having fonts installed is really relevant to your GTK/QT/UI use and not actually to the place where you’re likely doing most of your font interaction (e.g. the browser.) ↩︎

Learning Common Lisp Again

In a recent post I spoke about abandoning a previous project that had gone off the rails, and I’ve been doing more work in Common Lisp, and I wanted to report a bit more, with some recent developments. There’s a lot of writing about learning to program for the first time, and a fair amount of writing about lisp itself, neither are particularly relevant to me, and I suspect there may be others who might find themselves in a similar position in the future.

My Starting Point

I already know how to program, and have a decent understanding of how to build and connect software components. I’ve been writing a lot of Go (Lang) for the last 4 years, and wrote rather a lot of Python before that. I’m an emacs user, and I use a Common Lisp window manager, so I’ve always found myself writing little bits of lisp here and there, but it never quite felt like I could do anything of consequence in Lisp, despite thinking that Lisp is really cool and that I wanted to write more.

My goals and rational are reasonably simple:

  • I’m always building little tools to support the way that I use computers, nothing is particularly complex, but it’d enjoy being able to do this in CL rather than in other languages, mostly because I think it’d be nice to not do that in the same languages that I work in professionally.1
  • Common Lisp is really cool, and I think it’d be good if it were more widely used, and I think by writing more of it and writing posts like this is probably the best way to make that happen.
  • Learning new things is always good, and I think having a personal project to learn something new will be a good way of stretching my self as a developer. Most of my development as a programmer has focused on
  • Common Lisp has a bunch of features that I really like in a programming language: real threads, easy to run/produce static binaries, (almost) reasonable encapsulation/isolation features.

On Learning

Knowing how to program makes learning how to program easier: broadly speaking programming languages are similar to each other, and if you have a good model for the kinds of constructs and abstractions that are common in software, then learning a new language is just about learning the new syntax and learning a bit more about new idioms and figuring out how different language features can make it easier to solve problems that have been difficult in other languages.

In a lot of ways, if you already feel confident and fluent in a programming language, learning a second language, is really about teaching yourself how to learn a new language, which you can then apply to all future languages as needed.

Except realistically, “third languages” aren’t super common: it’s hard to get to the same level of fluency that you have with earlier languages, and often we learn “third-and-later” languages are learned in the context of some existing code base or project4, so it’s hard to generalize our familiarity outside of that context.

It’s also the case that it’s often pretty easy to learn a language enough to be able to perform common or familiar tasks, but fluency is hard, particularly in different idioms. Using CL as an excuse to do kinds of programming that I have more limited experience with: web programming, GUI programming, using different kinds of databases.

My usual method for learning a new programming language is to write a program of moderate complexity and size but in a problem space that I know pretty well. This makes it possible to gain familiarity, and map concepts that I understand to new concepts, while working on a well understood project. In short, I’m left to focus exclusively on “how do I do this?” type-problems and not “is this possible,” or “what should I do?” type-problems.

Conclusion

The more I think about it, the more I realize that when we talk about “knowing a programming language,” inevitably linked to a specific kind of programming: the kind of Lisp that I’ve been writing has skewed toward the object oriented end of the lisp spectrum with less functional bits than perhaps average. I’m also still a bit green when it comes to macros.

There are kinds of programs that I don’t really have much experience writing:

  • GUI things,
  • the front-half of the web stack,2
  • processing/working with ASTs, (lint tools, etc.)
  • lower-level kind of runtime implementation.

There’s lots of new things to learn, and new areas to explore!

Notes


  1. There are a few reasons for this. Mostly, I think in a lot of cases, it’s right to choose programming languages that are well known (Python, Java+JVM friends, and JavaScript), easy to learn (Go), and fit in with existing ecosystems (which vary a bit by domain,) so while it might the be right choice it’s a bit limiting. It’s also the case that putting some boundaries/context switching between personal projects and work projects could be helpful in improving quality of life. ↩︎

  2. Because it’s 2020, I’ve done a lot of work on “web apps,” but most of my work has been focused on areas of applications including including data layer, application architecture, and core business logic, and reliability/observability areas, and less with anything material to rendering web-pages. Most projects have a lot of work to be done, and I have no real regrets, but it does mean there’s plenty to learn. I wrote an earlier post about the problems of the concept of “full-stack engineering” which feels relevant. ↩︎

Better Company

I’ve been using company mode, which is a great completion framework for years now, and in genreal, it’s phenomenal. For a while, however, I’ve had the feeling that I’m not getting completion options at exactly the right frequency that I’d like. And a completion framework that’s a bit sluggish or that won’t offer completions that you’d expect is a drag. I dug in a bit, and got a much better, because of some poor configuration choices I’d made, and I thought I write up my configuration.

Backend Configuration

Company allows for configurable backends, which are just functions that provide completions, many of which are provided in the main company package, but also provided by many third (fourth?) party packages. These backends, then, are in a list which is stored in the company-backends, that company uses to try and find completions. When you get to a moment when you might want to complete things, emacs+company iterate through this list and build a list of expansions. This is pretty straight forward, at least in principle.

Now company is pretty good at making these backends fast, or trying, particularly when the backend might be irrelevant to whatever you’re currently editing--except in some special cases--but it means that the order of things in the list matters sometimes. The convention for configuring company backends is to load the module that provides the backend and then push the new backend onto the list. This mostly works fine, but there are some backends that either aren’t very fast or have the effect of blocking backends that come later (because they’re theoretically applicable to all modes.) These backends to be careful of are: company-yasnippet, company-ispell, and company-dabbrev.

I’ve never really gotten company-ispell to work (you have to configure a wordlist,) and I’ve never been a dabbrev user, but I’ve definitely made the mistake to putting the snippet expansion near the front of the list rather than the end. I’ve been tweaking things recently, and have settled on the following value for company-backends: :

(setq company-backends '(company-capf
             company-keywords
             company-semantic
             company-files
             company-etags
             company-elisp
             company-clang
             company-irony-c-headers
             company-irony
             company-jedi
             company-cmake
             company-ispell
             company-yasnippet))

The main caveat is that everything has to be loaded or have autoloads registered appropriately, particularly for things like jedi (python,) clang, and irony. The “capf” backend is the integration with emacs' default completion-at-point facility, and is the main mechanism by which lap-mode interacts with company, so it’s good to keep that at the top.

Make it Fast

I think there’s some fear that a completion framework like company could impact the perceived responsiveness of emacs as a whole, and as a result there are a couple of knobs for how to tweak things. Having said that, I’ve always run things more aggressively, because I like seeing possible completions fast, and I’ve never seen any real impact on apparent performance or battery utilization. use these settings: :

(setq company-tooltip-limit 20)
(setq company-show-numbers t)
(setq company-idle-delay 0)
(setq company-echo-delay 0)

Configure Prompts

To be honest, I mostly like the default popup, but it’s nice to be able to look at more completions and spill over to helm when needed. It’s a sometimes thing, but it’s quite nice: :

(use-package helm-company
   :ensure t
   :after (helm company)
   :bind (("C-c C-;" . helm-company))
   :commands (helm-company)
   :init
   (define-key company-mode-map (kbd "C-;") 'helm-company)
   (define-key company-active-map (kbd "C-;") 'helm-company))

Full Configuration

Some of the following is duplicated above, but here’s the full configuration that I run with: :

(use-package company
  :ensure t
  :delight
  :bind (("C-c ." . company-complete)
         ("C-c C-." . company-complete)
         ("C-c s s" . company-yasnippet)
         :map company-active-map
         ("C-n" . company-select-next)
     ("C-p" . company-select-previous)
     ("C-d" . company-show-doc-buffer)
     ("M-." . company-show-location))
  :init
  (add-hook 'c-mode-common-hook 'company-mode)
  (add-hook 'sgml-mode-hook 'company-mode)
  (add-hook 'emacs-lisp-mode-hook 'company-mode)
  (add-hook 'text-mode-hook 'company-mode)
  (add-hook 'lisp-mode-hook 'company-mode)
  :config
  (eval-after-load 'c-mode
    '(define-key c-mode-map (kbd "[tab]") 'company-complete))

  (setq company-tooltip-limit 20)
  (setq company-show-numbers t)
  (setq company-dabbrev-downcase nil)
  (setq company-idle-delay 0)
  (setq company-echo-delay 0)
  (setq company-ispell-dictionary (f-join tychoish-config-path "aspell-pws"))

  (setq company-backends '(company-capf
               company-keywords
               company-semantic
               company-files
               company-etags
               company-elisp
               company-clang
               company-irony-c-headers
               company-irony
               company-jedi
               company-cmake
               company-ispell
               company-yasnippet))

  (global-company-mode))

(use-package company-quickhelp
  :after company
  :config
  (setq company-quickhelp-idle-delay 0.1)
  (company-quickhelp-mode 1))

(use-package company-irony
  :ensure t
  :after (company irony)
  :commands (company-irony)
  :config
  (add-hook 'irony-mode-hook 'company-irony-setup-begin-commands))

(use-package company-irony-c-headers
  :ensure t
  :commands (company-irony-c-headers)
  :after company-irony)

(use-package company-jedi
  :ensure t
  :commands (company-jedi)
  :after (company python-mode))

(use-package company-statistics
  :ensure t
  :after company
  :config
  (company-statistics-mode))

Distributed Systems Problems and Strategies

At a certain scale, most applications end up having to contend with a class of “distributed systems” problems: when a single computer or a single copy of an application can’t support the required throughput of an application there’s not much to do except to distribute it, and therein lies the problem. Taking one of a thing and making many of the thing operate similarly can be really fascinating, and frankly empowering. At some point, all systems become distributed in some way, to a greater or lesser extent. While the underlying problems and strategies are simple enough, distributed systems-type bugs can be gnarly and having some framework for thinking about these kinds of systems and architectures can be useful, or even essential, when writing any kind of software.

Concerns

Application State

Applications all have some kind of internal state: configuration, runtime settings, in addition to whatever happens in memory as a result of running the application. When you have more than one copy of a single logical application, you have to put state somewhere. That somewhere is usually a database, but it can be another service or in some kind of shared file resource (e.g. NFS or blob storage like S3.)

The challenge is not “where to put the state,” because it probably doesn’t matter much, but rather in organizing the application to remove the assumption that state can be stored in the application. This often means avoiding caching data in global variables and avoiding storing data locally on the filesystem, but there are a host of ways in which application state can get stuck or captured, and the fix is generally “ensure this data is always read out of some centralized and authoritative service,” and ensure that any locally cached data is refreshed regularly and saved centrally when needed.

In general, better state management within applications makes code better regardless of how distributed the system is, and when we use the “turn it off and turn it back on,” we’re really just clearing out some bit of application state that’s gotten stuck during the runtime of a piece of software.

Startup and Shutdown

Process creation and initialization, as well as shutdown, is difficult in distributed systems. While most configuration and state is probably stored in some remote service (like a database,) there’s a bootstrapping process where each process gets enough local configuration required to get that configuration and startup from the central service, which can be a bit delicate.

Shutdown has its own problems set of problems, as specific processes need to be able to complete or safely abort in progress operations.

For request driven work (i.e. HTTP or RPC APIs) without statefull or long-running requests (e.g. many websockets and most streaming connections), applications have to stop accepting new connections and let all in progress requests complete before terminating. For other kinds of work, the process has to either complete in progress work or provide some kind of “checkpointing” approach so that another process can pick up the work later.

Horizontal Scalability

Horizontal scalability, being able to increase the capability of an application by adding more instances of the application rather than creasing the resources allotted to the application itself, is one of the reasons that we build distributed systems in the first place,1 but simply being able to run multiple copies of the application at once isn’t always enough, the application needs to be able to effectively distribute it’s workloads. For request driven work this is genreally some kind of load balancing layer or strategy, and for other kinds of workloads you need some way to distribute work across the application.

There are lots of different ways to provide loadbalancing, and a lot depends on your application and clients, there is specialized software (and even hardware) that provides loadbalancing by sitting “in front of” the application and routing requests to a backend, but there are also a collection of client-side solutions that work quite well. The complexity of load balancing solutions varies a lot: there are some approaches that just distribute responses “evenly” (by number) to a single backend one-by-one (“round-robin”) and some approaches that attempt to distribute requests more “fairly” based on some reporting of each backend or an analysis of the incoming requests, and the strategy here depends a lot on the requirements of the application or service.

For workloads that aren’t request driven, systems require some mechanism of distributing work to workers, ususally with some kind of messaging system, though it’s possible to get pretty far using a just a normal general purpose database to store pending work. The options for managing, ordering, and distributing the work, is the meat of problem.

Challenges

When thinking about system design or architecture, I tend to start with the following questions.

  • how does the system handle intermittent failures of particular components?
  • what kind of downtime is acceptable for any given component? for the system as a whole?
  • how do operations timeout and get terminated, and how to clients handle these kinds of failures?
  • what are the tolerances for the application in terms of latency of various kinds of operations, and also the tolerances for “missing” or “duplicating” an operation?
  • when (any single) node or component of the system aborts or restarts abruptly, how does the application/service respond? Does work resume or abort safely?
  • what level of manual intervention is acceptable? Does the system need to node failure autonomously? If so how many nodes?

Concepts like “node” or “component” or “operation,” can mean different things in different systems, and I use the terms somewhat vaguely as a result. These general factors and questions apply to systems that have monolithic architectures (i.e. many copies of a single type of process which performs many functions,) and service-based architectures (i.e. many different processes performing specialized functions.)

Solutions

Ignore the Problem, For Now

Many applications run in a distributed fashion while only really addressing parts of their relevant distributed systems problems, and in practice it works out ok. Applications may store most of their data in a database, but have some configuration files that are stored locally: this is annoying, and sometimes an out-of-sync file can lead to some unexpected behavior. Applications may have distributed application servers for all request-driven workloads, but may still have a separate single process that does some kind of coordinated background work, or run cron jobs.

Ignoring the problem isn’t always the best solution in the long term, but making sure that everything is distributed (or able to be distributed,) isn’t always the best use of time, and depending the specific application it works out fine. The important part, isn’t always to distribute things in all cases, but to make it possible to distribute functions in response to needs: in some ways I think about this as the “just in time” approach.

Federation

Federated architectures manage distributed systems protocols at a higher level: rather than assembling a large distributed system, build very small systems that can communicate at a high level using some kind of established protocol. The best example of a federated system is probably email, though there are others.2

Federated systems have more complex protocols that have to be specification based, which can be complicated/difficult to build. Also, federated services have to maintain the ability to interoperate with previous versions and even sometimes non-compliant services, which can be difficult to maintain. Federated systems also end up pushing a lot of the user experience into the clients, which can make it hard to control this aspect of the system.

On the upside, specific implementations and instances of a federated service can be quite simple and have straight forward and lightweight implementations. Supporting email for a few users (or even a few hundred) is a much more tractable problem than supporting email for many millions of users.

Distributed Locks

Needing some kind of lock (for mutual exclusion or mutex) is common enough in programming, and provide some kind of easy way to ensure that only a single actor has access to a specific resource. Doing this within a single process involves using kernel (futexes) or programming language runtime implementations, and is simple to conceptualize, and while the concept in a distributed system is functionally the same, the implementation of distributed locks are more complicated and necessarily slower (both the lock themselves, and their impact on the system as a whole).

All locks, local or distributed can be difficult to use correctly: the lock must be acquired before using the resource, and it must fully protect the resource, without protecting too much and having a large portion of functionality require the lock. So while locks are required sometimes, and conceptually simple, using them correctly is hard. With that disclaimer, to work, distributed locks require:3

  • some concept of an owner, which must be sufficiently specific (hostname, process identifier,) but that should be sufficiently unique to protect against process restarts, host renaming and collision.
  • lock status (locked/link) and if the lock has different modes, such as a multi-reader/single-writer lock, then that status.
  • a timeout or similar mechanism to prevent deadlocks if the actor holding a lock halts or becomes inaccessible, the lock is eventually released.
  • versioning, to prevent stale actors from modifying the same lock. In the case that actor-1 has a lock and stalls for longer than the timeout period, such that actor-2 gains the lock, when actor-1 runs again it must know that its been usurped.

Not all distributed systems require distributed locks, and in most cases, transactions in the data layer, provide most of the isolation that you might need from a distributed lock, but it’s a useful concept to have.

Duplicate Work (Idempotency)

For a lot of operations, in big systems, duplicating some work is easier and ultimately faster than coordinating and isolating that work in a single location. For this, having idempotent operations4 is useful. Some kinds of operations and systems make idempotency easier to implement, and in cases where the work is not idempotent (e.g. as in data processing or transformation,) the operation can be, by attaching some kind of clock to the data or operation.5

Using clocks and idempotency makes it possible to maintain data consistency without locks. At the same time, some of the same considerations apply. Having all operations duplicated is difficult to scale so having ways for operations to abort early can be useful.

Consensus Protocols

Some operations can’t be effectively distributed, but are also not safe to duplicate. Applications can use consensus protocols to do “leader election,” to ensure that there’s only one node “in charge” at a time, and the protocol. This is common in database systems, where “single leader” systems are useful for balancing write performance in distributed context. Consensus protocols have some amount of overhead, and are good for systems of a small to moderate size, because all elements of the system must communicate with all other nodes in the system.

The two prevailing consensus protocols are Paxos and Raft--pardoning the oversimplification here--with Raft being a simpler and easier to implement imagination of the same underlying principles. I’ve characterized consensus as being about leader election, though you can use these protocols to allow a distributed system to reach agreement on any manner of operations or shared state.

Queues

Building a fully generalized distributed application with consensus is a very lofty proposition, and commonly beyond the scope of most applications. If you can characterize the work of your system as discrete units of work (tasks or jobs,) and can build or access a queue mechanism within your application that supports workers on multiple processes, this might be enough to support a great deal of your distributed requirements for the application.

Once you have reliable mechanisms and abstractions for distributing work to a queue, scaling the system can be managed outside of the application by using different backing systems, or changing the dispatching layer, and queue optimization is pretty well understood. There are lots of different ways to schedule and distribute queued work, but perhaps this is beyond the scope of this article.

I wrote one of these, amboy, but things like gearman and celery do this as well, and many of these tools are built on messaging systems like Kafka or AMPQ, or just use general purpose databases as a backend. Keeping a solid abstraction between the applications queue and then messaging system seems good, but a lot depends on your application’s workload.

Delegate to Upstream Services

While there are distributed system problems that applications must solve for themselves, in most cases no solution is required! In practice many applications centralize a lot of their concerns in trusted systems like databases, messaging systems, or lock servers. This is probably correct! While distributed systems are required in most senses, distributed systems themselves are rarely the core feature of an application, and it makes sense to delegate these problem to services that that are focused on solving this problem.

While multiple external services can increase the overall operational complexity of the application, implementing your own distributed system fundamentals can be quite expensive (in terms of developer time), and error prone, so it’s generally a reasonable trade off.

Conclusion

I hope this was as useful for you all as it has been fun for me to write!


  1. In most cases, some increase in reliability, by adding redundancy is a strong secondary motivation. ↩︎

  2. xmpp , the protocol behind jabber which powered/powers many IM systems is another federated example, and the fediverse points to others. I also suspect that some federation-like features will be used at the infrastructure layer to coordinate between constrained elements (e.g. multiple k8s clusters will use federation for coordination, and maybe multi-cloud/multi-region orchestration as well…) ↩︎

  3. This article about distributed locks in redis was helpful in summarizing the principles for me. ↩︎

  4. An operation is idempotent if it can be performed more than once without changing the outcome. For instance, the operation “increment the value by 10” is not idempotent because it increments a value every time it runs, so running the operation once is different than running it twice. At the same time the operation “set the value to 10” is idempotent, because the value is always 10 at the end of the operation. ↩︎

  5. Clocks can take the form of a “last modified timestamp,” or some kind of versioning integer associated with a record. Operations can check their local state against a canonical record, and abort if their data is out of date. ↩︎