Open Source Emacs Configuration Improvements

In retrospect I'm not totally sure why I released my emacs configuration to the world. I find tweaking Emacs Lisp to be soothing, and in 2020 these kinds of projects are particularly welcome. I've always thought about making it public: I feel like I get a lot out of Emacs, and I'm super aware that it's very hard for people who haven't been using Emacs forever to get a comparable experience. [1]

I also really had no idea of what to expect, and while it's still really recent, I've noticed a few things which are worth remarking:

  • Making your code usable for other people really does make it easy for people to find bugs. While it's likely that there are bugs that people never noticed, I found a few things very quickly:

    • Someone reported higher than expected CPU use, and I discovered that there were a number of functions that ran regularly in timers, and I was able to quickly tune some knobs in order to reduce average CPU use by a lot. This is likely to be great both for the user in question, but also because it'll help battery life.
    • The config includes a git submodule (!) with the contents of all third-party packages, mostly to reduce friction for people getting started. Downloading all of the packages fresh from the archive would take a few minutes, and the git clone is just faster. I realized, when someone ran into some problems when running with emacs 28 (e.g. the development/mainline build,) that the byte-compilation formats were different, which made the emacs27 files not work on emacs28. I pushed a second branch.

    More than anything the experience of getting bug reports and feedback has been great. It both makes it possible to focus time because the impact of the work is really clear, and it also makes it clear to me that I've accumulated some actually decent Emacs Lisp skills, without really noticing it. [2]

  • I was inspired to make a few structural improvements.

    • For a long time, including after the initial release, I had a "settings" file, and a "local functions" file that held code that I'd written or coppied from one place or another, and I finally divided them all into packages named tychoish-<thing>.el which allowed me to put all or most of the configuration into use-package forms, which is more consistent and also helps startup time a bit, and makes the directory structure a bit easier.
    • I also cleaned up a bunch of local snippets that I'd been carrying around, which wasn't hurting anything but is a bit more clear in the present form.
  • I believe that I've hit the limit, with regards to startup speed. I'd really like to get a GUI emacs instance to start (with no buffers) in less than a second, but it doesn't seem super plausible. I got really close. At this point there are two factors that constrain:

    • Raw CPU speed. I have two computers, and the machine with the newer CPU is consistently 25% faster than the slow computer.
    • While the default configuration doesn't do this, my personal configuration sets a font (this is reasonable,) but seems that the time to do this is sometimes observable, and proportional to the number of fonts you have installed on the system. [3]
    • Dependencies during the early load. I was able to save about 10% time by moving a function between package to reduce the packages that startup code depended upon. There's just a limit to how much you can clean up here.

    Having said that, these things can drift pretty easily. I've added some helper macros with-timer and with-slow-op-timer that I can use to report the timing of operations during startup to make sure that things don't slow down.

    Interestingly, I've experimented with byte-compiling my local configuration and I haven't really noticed much of a speedup at this scale, so for ease I've been leaving my own lisp directory unbytecompiled.

  • With everything in order, there's not much to edit! I guess I'll have other things to work on, but I have made a few improvements, generally:

    • Using the alert package for desktop notification, which allowed me to delete a legacy package I've been using. Deleting code is awesome.
    • I finally figured out how to really take advantage of projectile, which is now configured correctly, and has been a lot of help in my day-to-day work.
    • I've started using ERC more, and only really using my irssi (in screen) session as a fallback. My IRC/IM setup is a bit beyond the scope of this post but ERC has been a bit fussy to use on machines with intermittent connections, but I think I've been able to tweak that pretty well and have an experience that's quite good.

It's been interesting! And I'm looking forward to continuing to do this!

[1]Sure, other editors also have long setup curves, but Emacs is particularly gnarly in this regard, and I think adoption by new programmers is definitely constrained by this fact.
[2]I never really thought of myself as someone who wrote Emacs Lisp: I've never really written a piece of software in Emacs, it's always been a function here or there, or modifying some snippet from somewhere. I don't know if I have a project or a goal that would involve writing more emacs software, but it's nice to recognize that I've accidentally acquired a skill.
[3]On Windows and macOS systems this may not matter, but you may have more fonts installed than you need. I certianly did. Be aware that webbrowsers often downlaod their own fonts separately from system fonts, so having fonts installed is really relevant to your GTK/QT/UI use and not actually to the place where you're likely doing most of your font interaction (e.g. the browser.)

Learning Common Lisp Again

In a recent post I spoke about abandoning a previous project that had gone off the rails, and I've been doing more work in Common Lisp, and I wanted to report a bit more, with some recent developments. There's a lot of writing about learning to program for the first time, and a fair amount of writing about lisp itself, neither are particularly relevant to me, and I suspect there may be others who might find themselves in a similar position in the future.

My Starting Point

I already know how to program, and have a decent understanding of how to build and connect software components. I've been writing a lot of Go (Lang) for the last 4 years, and wrote rather a lot of Python before that. I'm an emacs user, and I use a Common Lisp window manager, so I've always found myself writing little bits of lisp here and there, but it never quite felt like I could do anything of consequence in Lisp, despite thinking that Lisp is really cool and that I wanted to write more.

My goals and rational are reasonably simple:

  • I'm always building little tools to support the way that I use computers, nothing is particularly complex, but it'd enjoy being able to do this in CL rather than in other languages, mostly because I think it'd be nice to not do that in the same languages that I work in professionally. [1]
  • Common Lisp is really cool, and I think it'd be good if it were more widely used, and I think by writing more of it and writing posts like this is probably the best way to make that happen.
  • Learning new things is always good, and I think having a personal project to learn something new will be a good way of stretching my self as a developer. Most of my development as a programmer has focused on
  • Common Lisp has a bunch of features that I really like in a programming language: real threads, easy to run/produce static binaries, (almost) reasonable encapsulation/isolation features.

On Learning

Knowing how to program makes learning how to program easier: broadly speaking programming languages are similar to each other, and if you have a good model for the kinds of constructs and abstractions that are common in software, then learning a new language is just about learning the new syntax and learning a bit more about new idioms and figuring out how different language features can make it easier to solve problems that have been difficult in other languages.

In a lot of ways, if you already feel confident and fluent in a programming language, learning a second language, is really about teaching yourself how to learn a new language, which you can then apply to all future languages as needed.

Except realistically, "third languages" aren't super common: it's hard to get to the same level of fluency that you have with earlier languages, and often we learn "third-and-later" languages are learned in the context of some existing code base or project4, so it's hard to generalize our familiarity outside of that context.

It's also the case that it's often pretty easy to learn a language enough to be able to perform common or familiar tasks, but fluency is hard, particularly in different idioms. Using CL as an excuse to do kinds of programming that I have more limited experience with: web programming, GUI programming, using different kinds of databases.

My usual method for learning a new programming language is to write a program of moderate complexity and size but in a problem space that I know pretty well. This makes it possible to gain familiarity, and map concepts that I understand to new concepts, while working on a well understood project. In short, I'm left to focus exclusively on "how do I do this?" type-problems and not "is this possible," or "what should I do?" type-problems.

Conclusion

The more I think about it, the more I realize that when we talk about "knowing a programming language," inevitably linked to a specific kind of programming: the kind of Lisp that I've been writing has skewed toward the object oriented end of the lisp spectrum with less functional bits than perhaps average. I'm also still a bit green when it comes to macros.

There are kinds of programs that I don't really have much experience writing:

  • GUI things,
  • the front-half of the web stack, [2]
  • processing/working with ASTs, (lint tools, etc.)
  • lower-level kind of runtime implementation.

There's lots of new things to learn, and new areas to explore!

Notes

[1]There are a few reasons for this. Mostly, I think in a lot of cases, it's right to choose programming languages that are well known (Python, Java+JVM friends, and JavaScript), easy to learn (Go), and fit in with existing ecosystems (which vary a bit by domain,) so while it might the be right choice it's a bit limiting. It's also the case that putting some boundaries/context switching between personal projects and work projects could be helpful in improving quality of life.
[2]Because it's 2020, I've done a lot of work on "web apps," but most of my work has been focused on areas of applications including including data layer, application architecture, and core business logic, and reliability/observability areas, and less with anything material to rendering web-pages. Most projects have a lot of work to be done, and I have no real regrets, but it does mean there's plenty to learn. I wrote an earlier post about the problems of the concept of "full-stack engineering" which feels relevant.

Better Company

I've been using company mode, which is a great completion framework for years now, and in genreal, it's phenomenal. For a while, however, I've had the feeling that I'm not getting completion options at exactly the right frequency that I'd like. And a completion framework that's a bit sluggish or that won't offer completions that you'd expect is a drag. I dug in a bit, and got a much better, because of some poor configuration choices I'd made, and I thought I write up my configuration.

Backend Configuration

Company allows for configurable backends, which are just functions that provide completions, many of which are provided in the main company package, but also provided by many third (fourth?) party packages. These backends, then, are in a list which is stored in the company-backends, that company uses to try and find completions. When you get to a moment when you might want to complete things, emacs+company iterate through this list and build a list of expansions. This is pretty straight forward, at least in principle.

Now company is pretty good at making these backends fast, or trying, particularly when the backend might be irrelevant to whatever you're currently editing--except in some special cases--but it means that the order of things in the list matters sometimes. The convention for configuring company backends is to load the module that provides the backend and then push the new backend onto the list. This mostly works fine, but there are some backends that either aren't very fast or have the effect of blocking backends that come later (because they're theoretically applicable to all modes.) These backends to be careful of are: company-yasnippet, company-ispell, and company-dabbrev.

I've never really gotten company-ispell to work (you have to configure a wordlist,) and I've never been a dabbrev user, but I've definitely made the mistake to putting the snippet expansion near the front of the list rather than the end. I've been tweaking things recently, and have settled on the following value for company-backends:

(setq company-backends '(company-capf
                         company-keywords
                         company-semantic
                         company-files
                         company-etags
                         company-elisp
                         company-clang
                         company-irony-c-headers
                         company-irony
                         company-jedi
                         company-cmake
                         company-ispell
                         company-yasnippet))

The main caveat is that everything has to be loaded or have autoloads registered appropriately, particularly for things like jedi (python,) clang, and irony. The "capf" backend is the integration with emacs' default completion-at-point facility, and is the main mechanism by which lap-mode interacts with company, so it's good to keep that at the top.

Make it Fast

I think there's some fear that a completion framework like company could impact the perceived responsiveness of emacs as a whole, and as a result there are a couple of knobs for how to tweak things. Having said that, I've always run things more aggressively, because I like seeing possible completions fast, and I've never seen any real impact on apparent performance or battery utilization. use these settings:

(setq company-tooltip-limit 20)
(setq company-show-numbers t)
(setq company-idle-delay 0)
(setq company-echo-delay 0)

Configure Prompts

To be honest, I mostly like the default popup, but it's nice to be able to look at more completions and spill over to helm when needed. It's a sometimes thing, but it's quite nice:

(use-package helm-company
   :ensure t
   :after (helm company)
   :bind (("C-c C-;" . helm-company))
   :commands (helm-company)
   :init
   (define-key company-mode-map (kbd "C-;") 'helm-company)
   (define-key company-active-map (kbd "C-;") 'helm-company))

Full Configuration

Some of the following is duplicated above, but here's the full configuration that I run with:

(use-package company
  :ensure t
  :delight
  :bind (("C-c ." . company-complete)
         ("C-c C-." . company-complete)
         ("C-c s s" . company-yasnippet)
         :map company-active-map
         ("C-n" . company-select-next)
         ("C-p" . company-select-previous)
         ("C-d" . company-show-doc-buffer)
         ("M-." . company-show-location))
  :init
  (add-hook 'c-mode-common-hook 'company-mode)
  (add-hook 'sgml-mode-hook 'company-mode)
  (add-hook 'emacs-lisp-mode-hook 'company-mode)
  (add-hook 'text-mode-hook 'company-mode)
  (add-hook 'lisp-mode-hook 'company-mode)
  :config
  (eval-after-load 'c-mode
    '(define-key c-mode-map (kbd "[tab]") 'company-complete))

  (setq company-tooltip-limit 20)
  (setq company-show-numbers t)
  (setq company-dabbrev-downcase nil)
  (setq company-idle-delay 0)
  (setq company-echo-delay 0)
  (setq company-ispell-dictionary (f-join tychoish-config-path "aspell-pws"))

  (setq company-backends '(company-capf
                           company-keywords
                           company-semantic
                           company-files
                           company-etags
                           company-elisp
                           company-clang
                           company-irony-c-headers
                           company-irony
                           company-jedi
                           company-cmake
                           company-ispell
                           company-yasnippet))

  (global-company-mode))

(use-package company-quickhelp
  :after company
  :config
  (setq company-quickhelp-idle-delay 0.1)
  (company-quickhelp-mode 1))

(use-package company-irony
  :ensure t
  :after (company irony)
  :commands (company-irony)
  :config
  (add-hook 'irony-mode-hook 'company-irony-setup-begin-commands))

(use-package company-irony-c-headers
  :ensure t
  :commands (company-irony-c-headers)
  :after company-irony)

(use-package company-jedi
  :ensure t
  :commands (company-jedi)
  :after (company python-mode))

(use-package company-statistics
  :ensure t
  :after company
  :config
  (company-statistics-mode))

Distributed Systems Problems and Strategies

At a certain scale, most applications end up having to contend with a class of "distributed systems" problems: when a single computer or a single copy of an application can't support the required throughput of an application there's not much to do except to distribute it, and therein lies the problem. Taking one of a thing and making many of the thing operate similarly can be really fascinating, and frankly empowering. At some point, all systems become distributed in some way, to a greater or lesser extent. While the underlying problems and strategies are simple enough, distributed systems-type bugs can be gnarly and having some framework for thinking about these kinds of systems and architectures can be useful, or even essential, when writing any kind of software.

Concerns

Application State

Applications all have some kind of internal state: configuration, runtime settings, in addition to whatever happens in memory as a result of running the application. When you have more than one copy of a single logical application, you have to put state somewhere. That somewhere is usually a database, but it can be another service or in some kind of shared file resource (e.g. NFS or blob storage like S3.)

The challenge is not "where to put the state," because it probably doesn't matter much, but rather in organizing the application to remove the assumption that state can be stored in the application. This often means avoiding caching data in global variables and avoiding storing data locally on the filesystem, but there are a host of ways in which application state can get stuck or captured, and the fix is generally "ensure this data is always read out of some centralized and authoritative service," and ensure that any locally cached data is refreshed regularly and saved centrally when needed.

In general, better state management within applications makes code better regardless of how distributed the system is, and when we use the "turn it off and turn it back on," we're really just clearing out some bit of application state that's gotten stuck during the runtime of a piece of software.

Startup and Shutdown

Process creation and initialization, as well as shutdown, is difficult in distributed systems. While most configuration and state is probably stored in some remote service (like a database,) there's a bootstrapping process where each process gets enough local configuration required to get that configuration and startup from the central service, which can be a bit delicate.

Shutdown has its own problems set of problems, as specific processes need to be able to complete or safely abort in progress operations.

For request driven work (i.e. HTTP or RPC APIs) without statefull or long-running requests (e.g. many websockets and most streaming connections), applications have to stop accepting new connections and let all in progress requests complete before terminating. For other kinds of work, the process has to either complete in progress work or provide some kind of "checkpointing" approach so that another process can pick up the work later.

Horizontal Scalability

Horizontal scalability, being able to increase the capability of an application by adding more instances of the application rather than creasing the resources allotted to the application itself, is one of the reasons that we build distributed systems in the first place, [1] but simply being able to run multiple copies of the application at once isn't always enough, the application needs to be able to effectively distribute it's workloads. For request driven work this is genreally some kind of load balancing layer or strategy, and for other kinds of workloads you need some way to distribute work across the application.

There are lots of different ways to provide loadbalancing, and a lot depends on your application and clients, there is specialized software (and even hardware) that provides loadbalancing by sitting "in front of" the application and routing requests to a backend, but there are also a collection of client-side solutions that work quite well. The complexity of load balancing solutions varies a lot: there are some approaches that just distribute responses "evenly" (by number) to a single backend one-by-one ("round-robin") and some approaches that attempt to distribute requests more "fairly" based on some reporting of each backend or an analysis of the incoming requests, and the strategy here depends a lot on the requirements of the application or service.

For workloads that aren't request driven, systems require some mechanism of distributing work to workers, ususally with some kind of messaging system, though it's possible to get pretty far using a just a normal general purpose database to store pending work. The options for managing, ordering, and distributing the work, is the meat of problem.

[1]In most cases, some increase in reliability, by adding redundancy is a strong secondary motivation.

Challenges

When thinking about system design or architecture, I tend to start with the following questions.

  • how does the system handle intermittent failures of particular components?
  • what kind of downtime is acceptable for any given component? for the system as a whole?
  • how do operations timeout and get terminated, and how to clients handle these kinds of failures?
  • what are the tolerances for the application in terms of latency of various kinds of operations, and also the tolerances for "missing" or "duplicating" an operation?
  • when (any single) node or component of the system aborts or restarts abruptly, how does the application/service respond? Does work resume or abort safely?
  • what level of manual intervention is acceptable? Does the system need to node failure autonomously? If so how many nodes?

Concepts like "node" or "component" or "operation," can mean different things in different systems, and I use the terms somewhat vaguely as a result. These general factors and questions apply to systems that have monolithic architectures (i.e. many copies of a single type of process which performs many functions,) and service-based architectures (i.e. many different processes performing specialized functions.)

Solutions

Ignore the Problem, For Now

Many applications run in a distributed fashion while only really addressing parts of their relevant distributed systems problems, and in practice it works out ok. Applications may store most of their data in a database, but have some configuration files that are stored locally: this is annoying, and sometimes an out-of-sync file can lead to some unexpected behavior. Applications may have distributed application servers for all request-driven workloads, but may still have a separate single process that does some kind of coordinated background work, or run cron jobs.

Ignoring the problem isn't always the best solution in the long term, but making sure that everything is distributed (or able to be distributed,) isn't always the best use of time, and depending the specific application it works out fine. The important part, isn't always to distribute things in all cases, but to make it possible to distribute functions in response to needs: in some ways I think about this as the "just in time" approach.

Federation

Federated architectures manage distributed systems protocols at a higher level: rather than assembling a large distributed system, build very small systems that can communicate at a high level using some kind of established protocol. The best example of a federated system is probably email, though there are others. [2]

Federated systems have more complex protocols that have to be specification based, which can be complicated/difficult to build. Also, federated services have to maintain the ability to interoperate with previous versions and even sometimes non-compliant services, which can be difficult to maintain. Federated systems also end up pushing a lot of the user experience into the clients, which can make it hard to control this aspect of the system.

On the upside, specific implementations and instances of a federated service can be quite simple and have straight forward and lightweight implementations. Supporting email for a few users (or even a few hundred) is a much more tractable problem than supporting email for many millions of users.

[2]xmpp , the protocol behind jabber which powered/powers many IM systems is another federated example, and the fediverse points to others. I also suspect that some federation-like features will be used at the infrastructure layer to coordinate between constrained elements (e.g. multiple k8s clusters will use federation for coordination, and maybe multi-cloud/multi-region orchestration as well...)

Distributed Locks

Needing some kind of lock (for mutual exclusion or mutex) is common enough in programming, and provide some kind of easy way to ensure that only a single actor has access to a specific resource. Doing this within a single process involves using kernel (futexes) or programming language runtime implementations, and is simple to conceptualize, and while the concept in a distributed system is functionally the same, the implementation of distributed locks are more complicated and necessarily slower (both the lock themselves, and their impact on the system as a whole).

All locks, local or distributed can be difficult to use correctly: the lock must be acquired before using the resource, and it must fully protect the resource, without protecting too much and having a large portion of functionality require the lock. So while locks are required sometimes, and conceptually simple, using them correctly is hard. With that disclaimer, to work, distributed locks require: [3]

  • some concept of an owner, which must be sufficiently specific (hostname, process identifier,) but that should be sufficiently unique to protect against process restarts, host renaming and collision.
  • lock status (locked/link) and if the lock has different modes, such as a multi-reader/single-writer lock, then that status.
  • a timeout or similar mechanism to prevent deadlocks if the actor holding a lock halts or becomes inaccessible, the lock is eventually released.
  • versioning, to prevent stale actors from modifying the same lock. In the case that actor-1 has a lock and stalls for longer than the timeout period, such that actor-2 gains the lock, when actor-1 runs again it must know that its been usurped.

Not all distributed systems require distributed locks, and in most cases, transactions in the data layer, provide most of the isolation that you might need from a distributed lock, but it's a useful concept to have.

[3]This article about distributed locks in redis was helpful in summarizing the principles for me.

Duplicate Work (Idempotency)

For a lot of operations, in big systems, duplicating some work is easier and ultimately faster than coordinating and isolating that work in a single location. For this, having idempotent operations [4] is useful. Some kinds of operations and systems make idempotency easier to implement, and in cases where the work is not idempotent (e.g. as in data processing or transformation,) the operation can be, by attaching some kind of clock to the data or operation. [5]

Using clocks and idempotency makes it possible to maintain data consistency without locks. At the same time, some of the same considerations apply. Having all operations duplicated is difficult to scale so having ways for operations to abort early can be useful.

[4]An operation is idempotent if it can be performed more than once without changing the outcome. For instance, the operation "increment the value by 10" is not idempotent because it increments a value every time it runs, so running the operation once is different than running it twice. At the same time the operation "set the value to 10" is idempotent, because the value is always 10 at the end of the operation.
[5]Clocks can take the form of a "last modified timestamp," or some kind of versioning integer associated with a record. Operations can check their local state against a canonical record, and abort if their data is out of date.

Consensus Protocols

Some operations can't be effectively distributed, but are also not safe to duplicate. Applications can use consensus protocols to do "leader election," to ensure that there's only one node "in charge" at a time, and the protocol. This is common in database systems, where "single leader" systems are useful for balancing write performance in distributed context. Consensus protocols have some amount of overhead, and are good for systems of a small to moderate size, because all elements of the system must communicate with all other nodes in the system.

The two prevailing consensus protocols are Paxos and Raft--pardoning the oversimplification here--with Raft being a simpler and easier to implement imagination of the same underlying principles. I've characterized consensus as being about leader election, though you can use these protocols to allow a distributed system to reach agreement on any manner of operations or shared state. solutions. However, if you can build or access a queue mechanism within your application that supports workers on multiple processes, this might be enough to support a great deal of your distributed requirements. Once you have reliable mechanisms and abstractions for distributing work to a queue, scaling the system can be managed outside of the application by using different backing systems, or changing the dispatching layer, and queue optimization is pretty well understood.

I wrote one of these, amboy, but things like gearman and celery do this as well, and many of these tools are built on messaging systems like Kafka or AMPQ, or just use general purpose databases as a backend. Keeping a solid abstraction between the applications queue and then messaging system seems good, but a lot depends on your application's workload.

Delegate to Upstream Services

While there are distributed system problems that applications have to solve for themselves, in most cases no solution is required! In practice many applications centralize a lot of their concerns in trusted systems like databases, messaging systems, or lock servers. This is probably correct! While distributed systems are required in most senses, distributed systems themselves are rarely the core feature of an application, and it makes sense to delegate these problem to services that that are focused on solving this problem.

While multiple external services can increase the overall operational complexity of the application, implementing your own distributed system fundamentals can be quite expensive (in terms of developer time), and error prone, so it's generally a reasonable trade off.

Conclusion

I hope this was as useful for you all as it has been fun for me to write!

Towards Faster Emacs Start Times

While I mostly run emacs as a daemon managed by systemd I like to keep my configuration nimble enough that I can just start emacs and have an ad-hoc session for editing some files. [1] Or that's the goal. This has been a long running project for me, but early on in quarantine times, I've gotten things tightened up to the point that I feel comfortable using EDITOR=emacs just about all the time. On my (very old) computers, emacs starts and in well under 1 or 2 seconds, and does better on faster machines. If your emacs setup takes a long time to start up, this article is for you! Let's see if we can get the program to start faster.

Measure!

The emacs-init-time function in emacs will tell you how long it took emacs to start up. This tends to be a bit under the actual start time, because some start-up activity gets pushed into various hooks, after startup. Just to keep an eye on things, I have a variant of the following as the first line in my config file:

(add-to-list 'after-init-hook
          (lambda ()
            (message (concat "emacs (" (number-to-string (emacs-pid)) ") started in " (emacs-init-time)))))

It's also possible to put additional information in this hook (and in reality I use something that pushes to a desktop notification tool, rather than just to message/logging.

You can also lower this number by putting more work into the after-init hook, but that sort of defeats the purpose: if you get the init time down to a second, but have 10 seconds of init-hook runtime, then that's probably not much of a win, particularly if the init-hook tasks are blocking.

The second thing to do is have a macro [2] available for use that allows to measure the runtime of a block of code, something like:

(defmacro with-timer (name &rest body)
  `(let ((time (current-time)))
     ,@body
     (message "%s: %.06f" ,name (float-time (time-since time)))))

You can then do something like:

(with-timer "mode-line-setup"
  (set-up-mode-line))

And then look in the *Message* buffer to see how long various things take so you know where focus your efforts.

Strategies

  • use-package makes it possible to do much less during applciation startup, and makes it possible to load big lisp packages only when you're going to use them, and registers everything so you never feel the difference. Audit your use-package forms and ensure that they all declare one of the following forms, which insures that things are loaded upon use, rather than at start time:
    • :bind registers specific keybindings, but as autoloads so that the pacakge is only loaded when you use the keybinding.
    • :command registers function names as autoloads, like keybinding, so the commands/functions are available but again the package isn't loaded until it's used.
    • :mode attaches the package to a specific file extension or extensions so again, the package doesn't get loaded until you open a file of that type.
    • :after for loading a package after another package it depends on or is only used after.
    • :hook this allows you to associate something from this package into another package's hook, with the same effect of deferring loading of one package until after another is used.
  • Avoid having the default-scratch buffer trigger loading as much as possible. It's very tempting to have *scratch* default to org-mode or something, but if you keep it in fundamental-mode you can avoid loading most of your packages until you actually use them, or can avoid pulling in too much too quickly.
  • Keep an eye on startup both for GUI, terminal, and if you use daemons, daemon instances of emacs. There are some subtleties that may be useful or important. It's also the case that terminal startup times are often much less than GUI times.
  • Think about strategies for managing state-management (e.g. recentf session-mode and desktop-mode.) I've definitley had times when I really wanted emacs to be able to restart and leave me exactly where I was when I shut down. These session features are good, but often it's just crufty, and causes a lot of expense at start up time. I still use desktop and session, but less so.
  • Fancy themes and modelines come at some cost. I recently was able to save a lot of start up time by omiting a call to spaceline-compile in my configuration. Theme setup also can take a bit of time in GUI mode (I think!), but I dropped automatic theme configuration so that my instances would play better with terminal mode. [3]

Helpful Settings

I think of these four settings as the "start up behavior" settings from my config:

(setq inhibit-startup-echo-area-message "tychoish")
(setq inhibit-startup-message 't)
(setq initial-major-mode 'fundamental-mode)
(setq initial-scratch-message 'nil)

I've had the following jit settings in my config for a long time, and they tend to make things more peppy, which can only help during startup. This may also just be cargo-cult, stuff:

(setq jit-lock-stealth-time nil)
(setq jit-lock-defer-time nil)
(setq jit-lock-defer-time 0.05)
(setq jit-lock-stealth-load 200)

If you use desktop-mode to save state and re-open buffers, excluding some modes from this behavior is good. In my case, avoiding reopening (and rereading) potentially large org-mode buffers was helpful for me. The desktop-modes-not-to-save value is useful here, as in the following snippet from my configuration:

(add-to-list 'desktop-modes-not-to-save 'dired-mode)
(add-to-list 'desktop-modes-not-to-save 'Info-mode)
(add-to-list 'desktop-modes-not-to-save 'org-mode)
(add-to-list 'desktop-modes-not-to-save 'info-lookup-mode)
(add-to-list 'desktop-modes-not-to-save 'fundamental-mode)

Notes

[1]Ok, ok, I live with a vim user and the "quick startup time," situation is very appealing.
[2]I tweaked this based on what I found this on stack overflow which stole it from this listserv post.
[3]Mixed GUI/Terminal mode with themes can be kind of finicky because theme settings are global and color backgrounds often won't play well with terminal frames. You can say "if this is a gui mode," but that doesn't play well with daemons, and hooking into frame-creation has never worked as seamlessly as I would expect, because the of the execution context the hook functions (and slowing down frame creation can defeat part of the delight of the daemon.)

Staff Engineering

In August of 2019 I became a Staff Engineer, which is what a lot of companies are calling their "level above Senior Engineer" role these days. Engineering leveling is a weird beast, which probably a post onto itself. Despite my odd entry into a career in tech, my path in the last 4 or 5 years has been pretty conventional; however, somehow, despite having an increasingly normal career trajectory, explaining what I do on a day to day basis has not gotten easier.

Staff Engineers are important for scaling engineering teams, but lots of teams get by with out them, and unlike more junior engineers who have broadly similar job roles, there are a lot of different ways to be a Staff Engineer, which only muddies things. This post is a reflection on some key aspects of my experience organized in to topics that I hope will be useful for people who may be interested in becoming staff engineers or managing such a person. If you're also a Staff Engineer and your experience is different, I wouldn't be particularly surprised.

Staff Engineers Help Teams Build Great Software

Lots of teams function just fine without Staff Engineers and teams can build great products without having contributors in Staff-type roles. Indeed, because Staff Engineers vary a lot, the utility of having more senior individual contributors on a team depends a lot of the specific engineer and the team in question: finding a good fit is even harder than usual. In general, having Senior Technical leadership can help teams by:

  • giving people managers more space and time to focus on the team organization, processes, and people. Particularly in small organizations, team managers often pick up technical leadership.
  • providing connections and collaborations between groups and efforts. While almost all senior engineers have a "home team" and are directly involved in a few specific projects, they also tend to have broader scope, and so can help coordinate efforts between different projects and groups.
  • increasing the parallelism of teams, and can provide the kind of infrastructure that allows a team to persue multiple streams of development at one time.
  • supporting the career path and growth of more junior engineers, both as a result of direct mentoring, but also by enabling the team to be more successful by having more technical leadership capacity creates opportunities for growth for everyone on the team.

Staff Promotions Reflect Organizational Capacity

In addition to experience and a history of effusiveness, like other promotions, getting promoted to Staff Engineer is less straight forward than other promotions. This is in part because the ways we think about leveling and job roles (i.e. to describe the professional activities and capabilities along several dimensions for each level,) become complicated when there are lots of different ways to be a Staff Engineer. Pragmatically, these kind of promotions often depend on other factors:

  • the existence of other Staff Engineers in the organization make it more likely that there's an easy comparison for a candidate.
  • past experience of managers getting Staff+ promotions for engineers. Enginering Managers without this kind of experience may have difficulty creating the kinds of opportunities within their organizations and for advocating these kinds of promotions.
  • organizational maturity and breadth to support the workload of a Staff Engineer: there are ways to partition teams and organizations that preclude some of the kinds of higher level concerns that justify having Staff Engineers, and while having senior technical leadership is often useful, if the organization can't support it, it won't happen.
  • teams with a sizable population of more junior engineers, particularly where the team is growing, will have more opportunity and need for Staff Engineers. Teams that are on the balance more senior, or are small and relatively static tend to have less opportunity for the kind of broadly synthetic work that tends to lead to Staff promotions.

There are also, of course, some kinds of technical achievements and professional characteristics that Staff Engineers often have, and I'm not saying that anyone in the right organizational context can be promoted, exactly. However, without the right kind of organizational support and context, even the most exceptional engineers will never be promoted.

Staff Promotions are Harder to Get Than Equivalent Management Promotions

In many organizations its true that Staff promotions are often much harder to get than equivalent promotions to peer-level management postions: the organizational contexts required to support the promotion of Engineers into management roles are much easier to create, particularly as organizations grow. As you hire more engineers you need more Engineering Managers. There are other factors:

  • managers control promotions, and it's easier for them to recapitulate their own career paths in their reports than to think about the Staff role, and so more Engineers tend to be pushed towards management than Senior IC roles. It's also probably that meta-managers benefit organizationally from having more front-line managers in their organizations than more senior ICs, which exacerbates this bias.
  • from an output perspective, Senior Engineers can write the code that Staff Engineers would otherwise write, in a way that Engineering Management tends to be difficult to avoid or do without. In other terms, management promotions are often more critical from the organization's perspective and therefore prioritized over Staff promotions, particularly during growth.
  • cost. Staff Engineers are expensive, often more expensive than managers particularly at the bottom of the brackets, and it's difficult to imagine that the timing of Staff promotions are not impacted by budgetary requirements.

Promoting a Staff Engineer is Easier than Hiring One

Because there are many valid ways to do the Staff job, and so much of the job is about leveraging context and building broader connections between different projects, people with more organizational experience and history often have an advantage over fresh industry hires. In general:

  • Success as a Staff Engineer in one organization does not necessarily translate to success at another.
  • The conventions within the process for industry hiring, are good at selecting junior engineers, and there are fewer conventions for more Senior roles, which means that candidates are not assessed for skills and experiences that are relevant to their day-to-day while also being penalized for (often) being unexceptional at the kind of problems that junior engineering interviews focus on. While interview processes are imperfect assessment tools in all cases, they're particularly bad at more senior levels.
  • Senior engineering contributors have a the potential to have huge impact on product development, engineering outcomes, all of which requires a bunch of trust on the part of the organization, and that kind of trust is often easier to build with someone who already has organizational experience

This isn't to say that it's impossible to hire Staff engineers, I'm just deeply dubious of the hiring process for these kinds of roles having both interviewed for these kinds of roles and also interviewed candidates for them. I've also watched more than one senior contributor not really get along well with a team or other leadership after being hired externally, and for reasons that end up making sense in retrospect. It's really hard.

Staff Engineers Don't Not Manage

Most companies have a clear distinction between the career trajectories of people involved in "management" and senior "individual contributor" roles (like Staff Engineers,) with managers involved in leadership for teams and humans, with ICs involved in technical aspects. This seems really clear on paper but incredibly messy in practice. The decisions that managers make about team organization and prioritization have necessary technical implications; while it's difficult to organize larger scale technical initiatives without awareness of the people and teams. Sometimes Staff Engineers end up doing actual management on a temporary basis in order to fill gaps as organizations change or cover parental leave

It's also the case that a huge part of the job for many Staff Engineer's involves direct mentorship of junior engineers, which can involve leading specific projects, conversations about career trajectories and growth, as well as conversations about specific technical topics. This has a lot of overlap with management, and that's fine. The major differences is that senior contributors share responsibility for the people they mentor with their actual managers, and tend to focus mentoring on smaller groups of contributors.

Staff Engineers aren't (or shouldn't be!) managers, even when they are involved in broader leadership work, even if the specific engineer is capable of doing management work: putting ICs in management roles, takes time away from their (likely more valuable) technical projects.

Staff Engineers Write Boring and Tedious But Easy Code

While this is perhaps not a universal view, I feel pretty safe in suggesting that Staff Engineers should be directly involved in development projects. While there are lots of ways to be involved in development: technical design, architecture, reviewing code and documents, project planning and development, and so fort, I think it's really important that Staff Engineers be involved with code-writing, and similar activies. This makes it easy to stay grounded and relevant, and also makes it possible to do a better job at all of the other kinds of engineering work.

Having said that, it's almost inevitable that the kinds of contribution to the code that you make as a Staff Engineer are not the same kinds of contributions that you make at other points in your career. Your attention is probably pulled in different directions. Where a junior engineer can spend most of their day focusing on a few projects and writing code, Staff Engineers:

  • consult with other teams.
  • mentor other engineers.
  • build long and medium term plans for teams and products.
  • breaking larger projects apart and designing APIs between components.

All of this "other engineering work" takes time, and the broader portfolio of concerns means that more junior engineers often have more time and attention to focus on specific programming tasks. The result is that the kind of code you end up writing tends to be different:

  • fixing problems and bugs in systems that require a lot of context. The bugs are often not very complicated themselves, but require understanding the implication of one component with regards to other components, which can make them difficult.
  • projects to enable future development work, including building infrastructure or designing an interface and connecting an existing implementation to that interface ahead of some larger effort. This kind of "refactor things to make it possible to write a new implementation."
  • writing small isolated components to support broader initiatives, such as exposing existing information via new APIs, and building libraries to facilitate connections between different projects or components.
  • projects that support the work of the team as a whole: tools, build and deployment systems, code clean up, performance tuning, test infrastructure, and so forth.

These kinds of projects can amount to rather a lot of development work, but they definitely have their own flavor. As I approached Staff and certainly since, the kind of projects I had attention for definitely shifted. I actually like this kind of work rather a lot, so that's been quite good for me, but the change is real.

There's definitely a temptation to give Staff Engineers big projects that they can go off and work on alone, and I've seen lots of teams and engineers attempt this: sometimes these projects work out, though more often the successes feel like an exception. There's no "right kind" of way to write software as a Staff Engineer, sometimes senior engineer's get to work on bigger "core projects." Having said that, if a Staff Engineer is working on the "other engineering" aspects of the job, there's just limited time to do big development projects in a reasonable time frame.

New Beginnings: Deciduous Platform

I left my job at MongoDB (8.5 years!) at the beginning of the summer, and started a new job at the beginning of the month. I'll be writing and posting more about my new gig, career paths in general, reflections on what I accomplished on my old team, the process of interviewing as a software engineer, as well as the profession and industry over time. For now, though, I want to write about one of the things I've been working on this summer: making a bunch of the open source libraries that I worked on more generally useable. I've been calling this the deciduous platform, [2] which now has its own github organization! So it must be real.

The main modification in these forks, aside from adding a few features that had been on my list for a while, has been to update the buildsystem to use go modules [3] and rewrite the history of the repository to remove all of the old vendoring. I expect to continue development on some aspects of these over time, though the truth is that these libraries were quite stable and were nearly in maintenance mode anyway.

Background

The team was responsible for a big monolith (or so) application: development had begun in 2013, which was early for Go, and while everything worked, it was a bit weird. My efforts when I joined in 2015 focused mostly on stabilization, architecture, and reliability. While the application worked, mostly, it was clear that it suffered from a few problem, which I believe were the result of originating early in the history of Go: First, because no one had tried to write big applications yet, the patterns weren't well established, and so the team ended up writing code that worked but that was difficult to maintain, and ended up with bespoke solutions to a number of generic problems like running workloads in the background or managing Apia. Second, Go's standard library tends to be really solid, but also tends towards being a little low level for most day-to-day tasks, so things like logging and process management end up requiring more code [4] than is reasonable.

I taught myself to write Go by working on a logging library, and worked on a distributed queue library. One of the things that I realized early, was that breaking the application into "microservices," would have been both difficult and offered minimal benefit, [5] so I went with the approach of creating a well factored monolith, which included a lot of application specific work, but also building a small collection of libraries and internal services to provide useful abstractions and separations for application developers and projects.

This allowed for a certain level of focus, both for the team creating the infrastructure, but also for the application itself: the developers working on the application mostly focused on the kind of high level core business logic that you'd expect, while the infrastructure/platform team really focused on these libraries and various integration problems. The focus wasn't just organizational: the codebases became easier to maintain and features became easier to develop.

This experience has lead me to think that architecture decisions may not be well captured by the monolith/microservice dichotomy, but rather there's' this third option that centers on internal architecture, platforms, and the possibility for developer focus and velocity.

Platform Overview

While there are 13 or so repositories in the platform, really there are 4 major libraries: grip, a logging library; jasper, a process management framework; amboy, a (possibly distributed) worker queue; and gimlet, a collection of tools for building HTTP/REST services.

The tools all work pretty well together, and combine to provide an environment where you can focus on writing the business logic for your HTTP services and background tasks, with minimal boilerplate to get it all running. It's pretty swell, and makes it possible to spin up (or spin out) well factored services with similar internal architectures, and robust internal infrastructure.

I wanted to write a bit about each of the major components, addressing why I think these libraries are compelling and the kinds of features that I'm excited to add in the future.

Grip

Grip is a structured-logging friendly library, and is broadly similar to other third-party logging systems. There are two main underlying interfaces, representing logging targets (Sender) and messages, as well as a higher level "journal" interface for use during programming. It's pretty easy to write new message or bakcends, which means you can use grip to capture all kinds of arbitrary messages in consistent manners, and also send those messages wherever they're needed.

Internally, it's quite nice to be able to just send messages to specific log targets, using configuration within an application rather than needing to operationally manage log output. Operations folks shouldn't be stuck dealing with just managing logs, after all, and it's quite nice to just send data directly to Splunk or Sumologic. We also used the same grip fundamentals to send notifications and alerts to Slack channels, email lists, or even to create Jira Issues, minimizing the amount of clunky integration code.

There are some pretty cool projects in and around grip:

  • support for additional logging targets. The decudous version of grip adds twitter as an output format as well as creating desktop notifications (e.g. growl/libnotify,) but I think it would also be interesting to add fluent/logstash connections that don't have to transit via standard error.'
  • While structured logging is great, I noticed that we ended up logging messages automatically in the background as a method of metrics collection. It would be cool to be able to add some kind of "intercepting sender" that handled some of these structured metrics, and was able to expose this data in a format that the conventional tools these days (prometheus, others,) can handle. Some of this code would clearly need to be in Grip, and other aspects clearly fall into other tools/libraries.

Amboy

Amboy is an interface for doing things with queues. The interfaces are simple, and you have:

  • a queue that has some way of storing and dispatching jobs.
  • implementations of jobs which are responsible for executing your business logic, and with a base implemention that you can easily compose, into your job types, all you need to implement, really is a Run() method.
  • a queue "group" which provides a higher level abstraction on top of queues to support segregating workflows/queues in a single system to improve quality of service. Group queues function like other queues but can be automatically managed by the processes.
  • a runner/pool implementation that provides the actual thread pool.

There's a type registry for job implementations and versioning in the schema for jobs so that you can safely round-trip a job between machines and update the implementation safely without ensuring the queue is empty.

This turns out to be incredibly powerful for managing background and asynchronous work in applications. The package includes a number of in-memory queues for managing workloads in ephemeral utilities, as well as a distributed MongoDB backed-queue for running multiple copies of an application with a shared queue(s). There's also a layer of management tools for introspecting, managing, the state of jobs.

While Amboy is quite stable, there is a small collection of work that I'm interested in:

  • a queue implementation that store jobs to a local Badger database on-disk to provide single-system restartabilty for jobs.
  • a queue implementation that stores jobs in a PostgreSQL, mirroring the MongoDB job functionality, to be able to meet job backends.
  • queue implementations that use messaging systems (Kafka, AMPQ) for backends. There exists an SQS implementation, but all of these systems have less strict semantics for process restarts than the database options, and database can easily handle on the order of a hundred of thousand of jobs an hour.
  • changes to the queue API to remove a few legacy methods that return channels instead of iterators.
  • improve the semantics for closing a queue.

While Amboy has provisions for building architectures with workers running on multiple processes, rather than having queues running multiple threads within the same process, it would be interesting to develop more fully-fledged examples of this.

Jasper

Jasper provides a high level set of tools for managing subprocesses in Go, adding a highly ergonomic API (in Go,) as well as exposing process management as a service to facilitate running processes on remote machines. Jasper also manages/tracks the state of running processes, and can reduce pressures on calling code to track the state of processes.

The package currently exposes Jasper services over REST, gRPC, and MongoDB's wire protocol, and there is also code to support using SSH as a transport so that you don't need to expose remote these services publically.

Jasper is, perhaps, the most stable of the libraries, but I am interested in thinking about a couple of extensions:

  • using jasper as PID 1 within a container to be able to orchestrate workloads running on containers, and contain (some) support for lower level container orchestration.
  • write configuration file-based tools for using jasper to orchestrate buildsystems and distributed test orchestration.

I'm also interested in cleaning up some of the MongoDB-specific code (i.e. the code that downloads MongoDB versions for use in test harnesses,) and perhaps reinvisioning that as client code that uses Jasper rather than as a part of Jasper.

Gimlet

I've written about gimlet here before when I started the project, and it remains a pretty useful and ergonomic way to define and regester HTTP APIs, in the past few years, its grown to add more authentication features, as well as a new "framework" for defining routes. This makes it possible to define routes by implementing an interface that:

  • makes it very easy to produce paginated routes, and provides some helpers for managing content
  • separates the parsing of inputs from executing the results, which can make route definitions easy to test without integration tests.
  • rehome functionality on top of chi router. The current implementation uses Negroni and gorilla mux (but neither are exposed in the interface), but I think it'd be nice to have this be optional, and chi looks pretty nice.

Other Great Tools

The following libraries are defiantly smaller, but I think they're really cool:

  • birch is a builder for programatically building BSON documents, and MongoDB's extended JSON format. It's built upon an earlier version of the BSON library. While it's unlikely to be as fast at scale, for many operations (like finding a key in a document), the interface is great for constructing payloads.
  • ftdc provides a way to generate (and read,) MongoDB's diagnostic data format, which is a highly compressed timeseries data format. While this implementation could drift from the internal implementation over time, the format and tool remain useful for arbitrary timeseries data.
  • certdepot provides a way to manage a certificate authority with the certificates stored in a centralized store. I'd like to add other storage backends over time.

And more...

Notes

[1]Though, given my usual publication lag, I'm writing this a couple days before starting.
[2]My old team built a continuous integration tool called evergreen which is itself a pun (using "green" to indicate passing builds, most CI systems are not ever-green.) Many of the tools and libraries that we built had got names with tree puns, and somehow "deciduous" seems like the right plan.
[3]For an arcane reason, all of these tools had to build with an old version of Go (1.10) that didn't support modules, so we had an arcane and annoying vendoring solution that wasn't compatible with modules.
[4]Go tends to be a pretty verbose language, and I think most of the time this creates clarity; however, for common tasks it has the feeling of offering a poor abstraction, or forcing you to write duplicated code. While I don't believe that more-terse-code is better, I think there's a point where the extra verbosity for route operations just creates the possibility for more errors.
[5]The team was small, and as an internal tools team, unlikely to grow to the size where microservices offered any kind of engineering efficiency (at some cost,) and there weren't significant technical gains that we could take advantage of: the services of the application didn't need to be globally distributed and the boundaries between components didn't need to scale independently.

Tycho Emacs Config Kit

So I made a thing, or at least, I took a thing I've built and made it presentable for other people. I'm talking, of course, about my Emacs configuration.

Check it out at github.com/tychoish/.emacs.d! The README is pretty complete, and giving it a whirl is simple, just do something like:

mv ~/.emacs.d/ ~/emacs.d-archive
git clone --recurse-submodules git@github.com:tychoish/.emacs.d.git ~/.emacs.d/

If you're using Emacs 27, you might also be able to clone it into ~/.config/emacs, for similar effect. It doesn't matter much to me. Let me know what you think!


The tl;dr of "what's special about this config," is that, it has:

  • a good set of defaults out of the box.
  • a lot of great wiz-bang features enabled and configured.
  • very quick start times, thanks to lazy-loading of libraries.
  • great support for running as a daemon, and even running multiple daemons at once.

I've sporadically produced copies of my emacs config for other folks to use, but those were always ad hoc, and difficult to reproduce and maintain, and I've been working on and off to put more polish on things, so making it usable for other people seemed like a natural step at this point.

I hope other people find it useful, but also I think it's a good exercise for me in keeping things well maintained, and it doesn't bother me much one way or another.

Discussion

I think a lot about developer experience: I've spent my career, thus far, working on infrastructure products (databases, CI systems, build tools, release engineering,) that help improve developer experience and productivity, and even my day-to-day work tends to focus mostly on problems that impact the way that my teams write software: system architecture, service construction, observability, test infrastructure, and similar.

No matter how you slice it, editor experience is a huge factor in people's experience, and can have a big impact on productivity, and there are so many different editors with wildly different strengths. Editors experience is also really hard: unlike other kinds of developer infrastructure (like buildsystems, or even programming languages) the field conceptualizes editors as personal: your choice in editor is seen to reflect on you (it probably doesn't), and your ability to use your editor well is seen to be a reflection of your skills as a developer (it isn't). It's also the case that because editors are so personal, it's very difficult to produce an editor with broad appeal, and the most successful editors tend to be very configurable and can easily lack defaults, which means that even great editors are terrible out of the box, which mostly affects would-be and new developers.

Nevertheless, time being able to have an editor that you're comfortable with and that you can use effectively without friction does make it easier to build software, so even if folks often conceptualize of editors in the wrong way, improving the editing experience is important. I think that there are two areas for improvement:

  • editor configurations should--to some extent--be maintained at the project (or metaproject) level, rather than on the level individual engineer. The hard part here is how to balance individual preference with providing a consistent experience, particularly for new developers. [1]
  • there should be more "starter kits" (like this one! or the many other starter kits, but also for (neo)vim, vscode, and others.) that help bootstrap the experience, while also figuring out ways to allow layering other project-based extensions on top of a base configuration.

Also, I want to give chemacs a shout out, for folks who want to try out other base configurations.

[1]There are two kinds of new developers with different experiences but some overlap: folks who have development experience in general but no experience with a specific project, and folks who are new to all development.