New Beginnings: Deciduous Platform

I left my job at MongoDB (8.5 years!) at the beginning of the summer, and started a new job at the beginning of the month. I'll be writing and posting more about my new gig, career paths in general, reflections on what I accomplished on my old team, the process of interviewing as a software engineer, as well as the profession and industry over time. For now, though, I want to write about one of the things I've been working on this summer: making a bunch of the open source libraries that I worked on more generally useable. I've been calling this the deciduous platform, [2] which now has its own github organization! So it must be real.

The main modification in these forks, aside from adding a few features that had been on my list for a while, has been to update the buildsystem to use go modules [3] and rewrite the history of the repository to remove all of the old vendoring. I expect to continue development on some aspects of these over time, though the truth is that these libraries were quite stable and were nearly in maintenance mode anyway.

Background

The team was responsible for a big monolith (or so) application: development had begun in 2013, which was early for Go, and while everything worked, it was a bit weird. My efforts when I joined in 2015 focused mostly on stabilization, architecture, and reliability. While the application worked, mostly, it was clear that it suffered from a few problem, which I believe were the result of originating early in the history of Go: First, because no one had tried to write big applications yet, the patterns weren't well established, and so the team ended up writing code that worked but that was difficult to maintain, and ended up with bespoke solutions to a number of generic problems like running workloads in the background or managing Apia. Second, Go's standard library tends to be really solid, but also tends towards being a little low level for most day-to-day tasks, so things like logging and process management end up requiring more code [4] than is reasonable.

I taught myself to write Go by working on a logging library, and worked on a distributed queue library. One of the things that I realized early, was that breaking the application into "microservices," would have been both difficult and offered minimal benefit, [5] so I went with the approach of creating a well factored monolith, which included a lot of application specific work, but also building a small collection of libraries and internal services to provide useful abstractions and separations for application developers and projects.

This allowed for a certain level of focus, both for the team creating the infrastructure, but also for the application itself: the developers working on the application mostly focused on the kind of high level core business logic that you'd expect, while the infrastructure/platform team really focused on these libraries and various integration problems. The focus wasn't just organizational: the codebases became easier to maintain and features became easier to develop.

This experience has lead me to think that architecture decisions may not be well captured by the monolith/microservice dichotomy, but rather there's' this third option that centers on internal architecture, platforms, and the possibility for developer focus and velocity.

Platform Overview

While there are 13 or so repositories in the platform, really there are 4 major libraries: grip, a logging library; jasper, a process management framework; amboy, a (possibly distributed) worker queue; and gimlet, a collection of tools for building HTTP/REST services.

The tools all work pretty well together, and combine to provide an environment where you can focus on writing the business logic for your HTTP services and background tasks, with minimal boilerplate to get it all running. It's pretty swell, and makes it possible to spin up (or spin out) well factored services with similar internal architectures, and robust internal infrastructure.

I wanted to write a bit about each of the major components, addressing why I think these libraries are compelling and the kinds of features that I'm excited to add in the future.

Grip

Grip is a structured-logging friendly library, and is broadly similar to other third-party logging systems. There are two main underlying interfaces, representing logging targets (Sender) and messages, as well as a higher level "journal" interface for use during programming. It's pretty easy to write new message or bakcends, which means you can use grip to capture all kinds of arbitrary messages in consistent manners, and also send those messages wherever they're needed.

Internally, it's quite nice to be able to just send messages to specific log targets, using configuration within an application rather than needing to operationally manage log output. Operations folks shouldn't be stuck dealing with just managing logs, after all, and it's quite nice to just send data directly to Splunk or Sumologic. We also used the same grip fundamentals to send notifications and alerts to Slack channels, email lists, or even to create Jira Issues, minimizing the amount of clunky integration code.

There are some pretty cool projects in and around grip:

  • support for additional logging targets. The decudous version of grip adds twitter as an output format as well as creating desktop notifications (e.g. growl/libnotify,) but I think it would also be interesting to add fluent/logstash connections that don't have to transit via standard error.'
  • While structured logging is great, I noticed that we ended up logging messages automatically in the background as a method of metrics collection. It would be cool to be able to add some kind of "intercepting sender" that handled some of these structured metrics, and was able to expose this data in a format that the conventional tools these days (prometheus, others,) can handle. Some of this code would clearly need to be in Grip, and other aspects clearly fall into other tools/libraries.

Amboy

Amboy is an interface for doing things with queues. The interfaces are simple, and you have:

  • a queue that has some way of storing and dispatching jobs.
  • implementations of jobs which are responsible for executing your business logic, and with a base implemention that you can easily compose, into your job types, all you need to implement, really is a Run() method.
  • a queue "group" which provides a higher level abstraction on top of queues to support segregating workflows/queues in a single system to improve quality of service. Group queues function like other queues but can be automatically managed by the processes.
  • a runner/pool implementation that provides the actual thread pool.

There's a type registry for job implementations and versioning in the schema for jobs so that you can safely round-trip a job between machines and update the implementation safely without ensuring the queue is empty.

This turns out to be incredibly powerful for managing background and asynchronous work in applications. The package includes a number of in-memory queues for managing workloads in ephemeral utilities, as well as a distributed MongoDB backed-queue for running multiple copies of an application with a shared queue(s). There's also a layer of management tools for introspecting, managing, the state of jobs.

While Amboy is quite stable, there is a small collection of work that I'm interested in:

  • a queue implementation that store jobs to a local Badger database on-disk to provide single-system restartabilty for jobs.
  • a queue implementation that stores jobs in a PostgreSQL, mirroring the MongoDB job functionality, to be able to meet job backends.
  • queue implementations that use messaging systems (Kafka, AMPQ) for backends. There exists an SQS implementation, but all of these systems have less strict semantics for process restarts than the database options, and database can easily handle on the order of a hundred of thousand of jobs an hour.
  • changes to the queue API to remove a few legacy methods that return channels instead of iterators.
  • improve the semantics for closing a queue.

While Amboy has provisions for building architectures with workers running on multiple processes, rather than having queues running multiple threads within the same process, it would be interesting to develop more fully-fledged examples of this.

Jasper

Jasper provides a high level set of tools for managing subprocesses in Go, adding a highly ergonomic API (in Go,) as well as exposing process management as a service to facilitate running processes on remote machines. Jasper also manages/tracks the state of running processes, and can reduce pressures on calling code to track the state of processes.

The package currently exposes Jasper services over REST, gRPC, and MongoDB's wire protocol, and there is also code to support using SSH as a transport so that you don't need to expose remote these services publically.

Jasper is, perhaps, the most stable of the libraries, but I am interested in thinking about a couple of extensions:

  • using jasper as PID 1 within a container to be able to orchestrate workloads running on containers, and contain (some) support for lower level container orchestration.
  • write configuration file-based tools for using jasper to orchestrate buildsystems and distributed test orchestration.

I'm also interested in cleaning up some of the MongoDB-specific code (i.e. the code that downloads MongoDB versions for use in test harnesses,) and perhaps reinvisioning that as client code that uses Jasper rather than as a part of Jasper.

Gimlet

I've written about gimlet here before when I started the project, and it remains a pretty useful and ergonomic way to define and regester HTTP APIs, in the past few years, its grown to add more authentication features, as well as a new "framework" for defining routes. This makes it possible to define routes by implementing an interface that:

  • makes it very easy to produce paginated routes, and provides some helpers for managing content
  • separates the parsing of inputs from executing the results, which can make route definitions easy to test without integration tests.
  • rehome functionality on top of chi router. The current implementation uses Negroni and gorilla mux (but neither are exposed in the interface), but I think it'd be nice to have this be optional, and chi looks pretty nice.

Other Great Tools

The following libraries are defiantly smaller, but I think they're really cool:

  • birch is a builder for programatically building BSON documents, and MongoDB's extended JSON format. It's built upon an earlier version of the BSON library. While it's unlikely to be as fast at scale, for many operations (like finding a key in a document), the interface is great for constructing payloads.
  • ftdc provides a way to generate (and read,) MongoDB's diagnostic data format, which is a highly compressed timeseries data format. While this implementation could drift from the internal implementation over time, the format and tool remain useful for arbitrary timeseries data.
  • certdepot provides a way to manage a certificate authority with the certificates stored in a centralized store. I'd like to add other storage backends over time.

And more...

Notes

[1]Though, given my usual publication lag, I'm writing this a couple days before starting.
[2]My old team built a continuous integration tool called evergreen which is itself a pun (using "green" to indicate passing builds, most CI systems are not ever-green.) Many of the tools and libraries that we built had got names with tree puns, and somehow "deciduous" seems like the right plan.
[3]For an arcane reason, all of these tools had to build with an old version of Go (1.10) that didn't support modules, so we had an arcane and annoying vendoring solution that wasn't compatible with modules.
[4]Go tends to be a pretty verbose language, and I think most of the time this creates clarity; however, for common tasks it has the feeling of offering a poor abstraction, or forcing you to write duplicated code. While I don't believe that more-terse-code is better, I think there's a point where the extra verbosity for route operations just creates the possibility for more errors.
[5]The team was small, and as an internal tools team, unlikely to grow to the size where microservices offered any kind of engineering efficiency (at some cost,) and there weren't significant technical gains that we could take advantage of: the services of the application didn't need to be globally distributed and the boundaries between components didn't need to scale independently.
comments powered by Disqus