Reintroducing Grip

2023-08-17 – tychoish

Once upon a time, I wrote this logging library for Go called grip and used it a lot at a couple of jobs, where it provided a pretty great API and way to think about running logging and metrics infrastructure for an application. For the last year and a half, I’ve mostly ignored it. I didn’t want to be that guy about loggers, so I mostly let it drop, but I was looking at some other logging systems and I felt inspired to unwrap it a bit and see if I could improve things and if the time away would inspire me to rethink any of the basic assumptions.

This post is about a lot of things (and maybe it will spill over,) but generally:

the practice and use of logging, and how that is changing,
adoption of libraries out side of the standard library for common infrastructure,
the process of changing grip, and also digging into some more specific changes.

Logging Trends

Grip came about sort of at the height of the structured logging craze, and I think still focuses more on being a way to provide not just the ability to do slightly cooler than a collection of print statements but also be the primary way your software transmits events. Because all of the parts of grip are pretty plugable, we used this for everything from normal application logging to our entire event notification system, and kind of everything in between, including a sort of frightful amount of managing the output of a distributed build farm. The output of these messages went to all of the usual notification targets (e.g. email, slack, github, jira,) but also directly to log aggregation services (e.g. splunk) and the system log without being written to a file, which just made the services much easier to operate.

Like me, grip is somewhat opinionated. A lot of loggers compete based on “speed” (typically of writing messages to standard output or to any file) or on features (automatic contextual data collection and message annotation, formatting tools, hierarchical filtering, etc.) but I think grip’s selling point is really:

great interfaces for doing all of the things that you normally do with a logger, and all the things you want to do.
provide ways of logging data that doesn’t involve building/formatting strings, and lets the normal data.
make logging work just as well for the small CLI process as the large distributed application, and with similar amounts of ease. Pursuant to this I really think that any logging system or architecture that requires lots of extra stuff (agents, forwarders, rotation tools, etc.) to actually be able to use the logs is overkill. There’s nothing particularly exciting about sending logs over the network (or to a file, or a local socket,) and it should be possible to reduce the amount of complexity in this part of the deployment, which is good for just about everyone.
provide good interfaces and paradigms for replacing or custimizing core logging behavior. It should be possible for applications to write their own tools for sending log messages to places, or functions to filter and format log messages. This is a logging tool kit that can grow and develop with your application.
- in the x hierarchy include implementations for many messaging formats, and tools: use grip implementations to send alerts to telegram, email, slack, jira, etc. Also, support for sending logs _directly to wherever they need to be (splunk, syslog, etc.). Whever the logs need to be, grip should be able to get you there.
- The core grip package provides logging multiplexers, buffers, and batching tools to help control and manage the flow of logging from your application code to whatever logging backend you’re targeting.'
- Implementations and bridges between grip’s interfaces and other logging framework and tools. Grip attempts to make it possible to become the logging system for existing applications without needing to replace all logging call sites immediately (or ever!)
- Tools to support using grip interfaces and backends inside of other tools: you can use a grip Sender as a writer, or access a standard-library logger implementation that writes to the underlying grip logger.
logging should be fast, but speed of logging only really matters when data volume is really high, which is usually a problem for any logger: when data volume is lower even a really slow logger is still faster than the network or the file system. Picking a logger that’s fast, given this, is typically a poor optimization. Grip should be fast enough for any real world application and contains enough flexibility to provide an optimized path for distributing log messages as needed.
provide features to support some great logging paradigms:
- lazy construction of messages: while the speed of writing messages to their output handler typically doesn’t matter, sometimes building log messages can be intensive. Recently I ran into a case where calling [Sprintf]{.title-ref} at the call site of the logger for messages that were not logged (e.g. debug messages,) had material effects on the memory usage of the program. While string formating is a common case (and grip has “[Logf]{.title-ref}” style handlers that are lazy by default,) we found a bunch of metrics collection workloads that had similar sorts of patterns and lazy execution of these metrics tended to help a lot.
- conditional logging, rather than wrapping a log statement in an [if]{.title-ref} block, I found that you could have logging handlers that
- randomized logging or sampled logging: In some cases, I want to have log messages that only get logged half the time or something similar. In some high-volume code paths logging all the time is too much, and never is also too much, but it’s possible to devise a happy medium. Grip, for years now, has implemented this in terms of the conditional logging functionality.
- structured logging, rather than logging strings, it’s sometimes nice to just pass data, in the form of a special structure or just a map and let the logging system package deal with the output formating and transmission of messages. In general, line-oriented JSON makes for a good logging format, assuming most or enough of your messages have related structures and your log viewing and processing tools support this kind of data, although alternate human-readable string formats should also be available.

None of these things changed in the rewrite: by default, grip mostly just looks and works like the Go standard library logger (and even uses the logger under the hood for a lot of things,) but it was definitely fun to look at more contemporary logging practices and tools and think about what makes a logging library compelling (or not).

Infrastructure Libraries

Infrastructure libraries are an interresting beast: ideally every dependency carries some kind of maintenance cost, so you want to minimize the number of dependencies you require. At the same time, not using libraries is also bad because it means you end up writing more code and that has even more maintenance costs. It’s also the case that software engineers love writing infrastructure code and are often quite opinionated about the ergonomics of their infrastructure libraries.

On top of that, you make infrastructure software decisions once and then are sort of stuck with them for a really long time. I’ve changed loggers in projects and it’s rough, and in general you want to choose libraries: as an application developer you have the great feeling that no one’s differentiating feature is going to have anything to do with the logger¹ and you want something that’s battle tested and familiar to everyone. Sometimes--as in the logging package in Python--the standard library has a library just works and everyone just uses that; other times, there are one or two options that most project uses (e.g. log4j in java, etc.).

Even if grip is great, it seems unlikely that everyone (or anyone?) will switch to using grip over some of the other more popular options. I’m OK with that, I’m not sure that beyond writing a few blog posts I’m really that excited about doing the kind of marketing and promotion that it might take to promote a logging library like this, and at the same time the moment for a new logging library might have already passed.

Grip Reborn

The “new grip” is a pretty substantial change. A lot of implementation details changed and I deleted big parts of the interface that didn’t quite make sense, or that I thought were a bit fussy to use. Basically just sanding off a lot of awkward edges. The big changes:

greatly simplified dependencies, with more packages and an x hierarchy for extensions. The main grip package, and it’s primary sub packages' no longer has any external dependencies (beyond github.com/tychoish/fun.) Any logging backend or message type that has additional dependencies are in x.
I deleted a lot of code. There were a lot of things that just weren’t needed, there was an extra interface, a bunch of constructors for various objects that weren’t useful. I also simplified the concept of levels/priority within the logging system.
simpler high level logging interface. I used to have an extra interface and package to hold all of the Info, Error, (and so forth), and I cut a lot of that out and just made a Logger type in the main package which just wraps an underlying interface and doesn’t need to be mutable, and doing this made it possible to simplify a lot of the code.
added some straight forward handlers to attach loggers to contexts. I think previously, I was split on the opinion that loggers should either be (functionally) global or passed explicitly around as objects, and I think I’ve come around to the idea that loggers maybe ought to hang off the context object, but contextual loggers, global loggers, and logging objects are all available.
the high level logging interface is much smaller, with handlers for all the levels and formatting, line, and conditional (e.g. f, ln, When) logging. I’m not 100% sold on having ln, but I’m pretty pleased with having things be pretty simple and streamlined. The logger interface, as with the standard logger is mirrored in the grip package, with a shim.
new message.Builder interface and methods that provides a chainable interface for building messages without needing to mess with the internals of the message package which might be ungainly at logging call sites.
new KV message type: this makes it possible to have structured logging payloads without using map types, which might prove easier easier to integrate with the new zerolog and zap backends.
I have begun exploring in the [series]{.title-ref} package, what it’d mean to have a metrics collection and publication system that is integrated into a logger. I wrote probably too much code, but I am excited to do some more work to do some more work in this area.

Arguably, in a CI platform, most of the hard problems have something to do with logging, so this is an exception, but perhaps one of the only exceptions ↩︎

Against Testify

2022-04-19 – tychoish

For a long time I’ve used this go library testify, and mostly it’s been pretty great: it provides a bunch of tools that you’d expect in a testing library, in the grand tradition of jUnit/xUnit/etc., and managed to come out on top in a field similar libraries a few years ago. It was (and is, but particularly then) easy to look at the testing package and say “wouldn’t it be nice if there were a bit more higher-level functionality,” but I’ve recently come around to the idea that maybe it’s not worth it.¹ This is a post to collect and expand upon that thought, and also explain why I’m going through some older projects to cut out the dependency.

First, and most importantly, I should say that testify isn’t that bad, and there’s definitely a common way to use the library that’s totally reasonable. My complaint is basically:

The “suite” functionality for managing fixtures is a bit confusing: first it’s really easy to get the capitalization of the Setup/Teardown (TearDown?) functions wrong and have part of your fixture not run, and they’re enough different from “plain tests” to be a bit confusing. Frankly, writing test cases out by hand and using Go’s subtest functionality is more clear anyway.
I’ve never used testify’s mocking functionality, in part because I don’t tend to do much mock-based testing (which I see as a good thing,) and for the cases where I want to use mocks, I tend to prefer either hand written mocks or something like mockery.
While I know “require” means “halt on failure” and “assert” means “continue on error,” and it makes sense now, “assert” in most² other languages means “halt on failure” so this is a bit confusing. Also while there are cases where you do want continue on error semantics for test assertions, (I suppose,) it doesn’t come up that often'
There are a few warts, with the assertions (including requires,) most notably that you can create an “assertion object” that wraps a *testing.T, which is really an anti-pattern, and can cause assertion failures to be reported at the wrong level.
There are a few testify assertions that have some wonky argument structure, notably that Equal wants arguments in expected, actual form but Len wants arguments in object, expected form. I have to look that up every time.
I despise the failure reporting format. I typically run tests in my text editor and then use “jump to failure” point when a test fails, and testify assertions aren’t well formed in the way that basically every other tool are (including the standard library!)³ such that it’s fussy to find a failure when it happens.

The alternative is just to check the errors manually and use t.Fatal and t.Fatalf to halt test execution (and t.Error and t.Errorf for the continue on error case.) So we get code that looks like this: :

// with testify:
require.NoErorr(t, err)

// otherwise:
if err != nil {
     t.Fatal(err)
}

In addition to giving us better reporting, the second case looks like code that is more typical of code that you might write outside of test code, and so gives you a chance to use the production API which can help you detect any awkwardness but also serve as a kind of documentation. Additionally, if you’re not lazy, the failure messages that you pass to Fatal can be quite useful in explaining what’s gone wrong.

Testify is fine and it’s not worth rewriting existing tests to exclude the dependency (except maybe in small libraries) but for new code, give it a shot!

I must also confess that my coworker played some role in this conversion. ↩︎
I’d guess all, but I haven’t done a survey. ↩︎
Ok, the stdlib failures have the problem, where the failures are just attributed to the filename (no path) of the failure, which doesn’t work great in the situation where you have a lot of packages with similarly named files and you’re running tests from the root of the project. ↩︎

emt -- Golang Error Tools

2022-04-14 – tychoish

I write a lot of Go code, increasingly so to the point that I don’t really write much code in other languages. This is generally, fine for me, and it means that most of the quirks of the language have just become sort of normal to me. There are still a few things that I find irritating, and I stumbled across some code at work a few weeks ago that was awkwardly aggregating errors from a collection of goroutines and decided to package up some code that I think solves this pretty well. This is an introduction and a story about this code.

But first, let me back up a bit.

The way that go models concurrency is very simple: you start gorountines, but you have to explicitly manage their lifecycle and output. If you want to get errors out of a thread you have to collect them somehow, and there’s no standard library code that does this so there are a million bespoke solutions to this, and while every Go programmer has or will eventually write a channel or some kind of error aggregator to collect errors from a goroutine, it’s a bit dodgy because you have to stop thinking about whatever thing you’re working on to write some thread-safe, non-deadlocking aggregation code, which inevitably means even more goroutines and channels and mutexes or some such.

Years ago, I wrote this type that I called a “catcher” that was really just a slice of errors and a mutex, wrapped up with [Add(error)]{.title-ref} and [Resolve() error]{.title-ref} methods, and a few other convenience methods. You’d pass or access the catcher from different goroutines and never really have to think much about it. You get “continue-on-error” semantics for thread pools, which is generally useful, and you never accidentally deadlock on a channel of errors that you fumbled in some way. This type worked its way into the logging package that I wrote for my previous team and got (and presumably still gets) heavy use.

We added more functionality over time: different output formats, support for error annotation when it came and also the ability to have a catcher annotate incoming errors with a timestamp for long running applications of the type. The ergonomics are pretty good, and it helped the team spend more time implementing core features and thinking about the core problems of the product’s domain and less time thinking about managing errors in goroutines.

When I left my last team, I thought that it’d be good to take a step back from the platform and tools that I’d been working on and with for the past several years, but when I saw some code a while back that implemented its own error handling again something clicked, and I wanted just this thing. '

So I dug out the old type, put it in a new package, dusted off a few cobwebs, improved the test coverage, gave it a cool name, and reworked a few parts to avoid forcing downstream users to pickup unnecessary dependencies. It was a fun project, and I hope you all find it useful!

Check out emt! Tell me what you think!

New Beginnings: Deciduous Platform

2020-08-30 – tychoish

I left my job at MongoDB (8.5 years!) at the beginning of the summer, and started a new job at the beginning of the month. I’ll be writing and posting more about my new gig, career paths in general, reflections on what I accomplished on my old team, the process of interviewing as a software engineer, as well as the profession and industry over time. For now, though, I want to write about one of the things I’ve been working on this summer: making a bunch of the open source libraries that I worked on more generally useable. I’ve been calling this the deciduous platform,¹ which now has its own github organization! So it must be real.

The main modification in these forks, aside from adding a few features that had been on my list for a while, has been to update the buildsystem to use go modules² and rewrite the history of the repository to remove all of the old vendoring. I expect to continue development on some aspects of these over time, though the truth is that these libraries were quite stable and were nearly in maintenance mode anyway.

Background

The team was responsible for a big monolith (or so) application: development had begun in 2013, which was early for Go, and while everything worked, it was a bit weird. My efforts when I joined in 2015 focused mostly on stabilization, architecture, and reliability. While the application worked, mostly, it was clear that it suffered from a few problem, which I believe were the result of originating early in the history of Go: First, because no one had tried to write big applications yet, the patterns weren’t well established, and so the team ended up writing code that worked but that was difficult to maintain, and ended up with bespoke solutions to a number of generic problems like running workloads in the background or managing Apia. Second, Go’s standard library tends to be really solid, but also tends towards being a little low level for most day-to-day tasks, so things like logging and process management end up requiring more code³ than is reasonable.

I taught myself to write Go by working on a logging library, and worked on a distributed queue library. One of the things that I realized early, was that breaking the application into “microservices,” would have been both difficult and offered minimal benefit,⁴ so I went with the approach of creating a well factored monolith, which included a lot of application specific work, but also building a small collection of libraries and internal services to provide useful abstractions and separations for application developers and projects.

This allowed for a certain level of focus, both for the team creating the infrastructure, but also for the application itself: the developers working on the application mostly focused on the kind of high level core business logic that you’d expect, while the infrastructure/platform team really focused on these libraries and various integration problems. The focus wasn’t just organizational: the codebases became easier to maintain and features became easier to develop.

This experience has lead me to think that architecture decisions may not be well captured by the monolith/microservice dichotomy, but rather there’s' this third option that centers on internal architecture, platforms, and the possibility for developer focus and velocity.

Platform Overview

While there are 13 or so repositories in the platform, really there are 4 major libraries: grip, a logging library; jasper, a process management framework; amboy, a (possibly distributed) worker queue; and gimlet, a collection of tools for building HTTP/REST services.

The tools all work pretty well together, and combine to provide an environment where you can focus on writing the business logic for your HTTP services and background tasks, with minimal boilerplate to get it all running. It’s pretty swell, and makes it possible to spin up (or spin out) well factored services with similar internal architectures, and robust internal infrastructure.

I wanted to write a bit about each of the major components, addressing why I think these libraries are compelling and the kinds of features that I’m excited to add in the future.

Grip

Grip is a structured-logging friendly library, and is broadly similar to other third-party logging systems. There are two main underlying interfaces, representing logging targets (Sender) and messages, as well as a higher level “journal” interface for use during programming. It’s pretty easy to write new message or bakcends, which means you can use grip to capture all kinds of arbitrary messages in consistent manners, and also send those messages wherever they’re needed.

Internally, it’s quite nice to be able to just send messages to specific log targets, using configuration within an application rather than needing to operationally manage log output. Operations folks shouldn’t be stuck dealing with just managing logs, after all, and it’s quite nice to just send data directly to Splunk or Sumologic. We also used the same grip fundamentals to send notifications and alerts to Slack channels, email lists, or even to create Jira Issues, minimizing the amount of clunky integration code.

There are some pretty cool projects in and around grip:

support for additional logging targets. The decudous version of grip adds twitter as an output format as well as creating desktop notifications (e.g. growl/libnotify,) but I think it would also be interesting to add fluent/logstash connections that don’t have to transit via standard error.'
While structured logging is great, I noticed that we ended up logging messages automatically in the background as a method of metrics collection. It would be cool to be able to add some kind of “intercepting sender” that handled some of these structured metrics, and was able to expose this data in a format that the conventional tools these days (prometheus, others,) can handle. Some of this code would clearly need to be in Grip, and other aspects clearly fall into other tools/libraries.

Amboy

Amboy is an interface for doing things with queues. The interfaces are simple, and you have:

a queue that has some way of storing and dispatching jobs.
implementations of jobs which are responsible for executing your business logic, and with a base implemention that you can easily compose, into your job types, all you need to implement, really is a Run() method.
a queue “group” which provides a higher level abstraction on top of queues to support segregating workflows/queues in a single system to improve quality of service. Group queues function like other queues but can be automatically managed by the processes.
a runner/pool implementation that provides the actual thread pool.

There’s a type registry for job implementations and versioning in the schema for jobs so that you can safely round-trip a job between machines and update the implementation safely without ensuring the queue is empty.

This turns out to be incredibly powerful for managing background and asynchronous work in applications. The package includes a number of in-memory queues for managing workloads in ephemeral utilities, as well as a distributed MongoDB backed-queue for running multiple copies of an application with a shared queue(s). There’s also a layer of management tools for introspecting, managing, the state of jobs.

While Amboy is quite stable, there is a small collection of work that I’m interested in:

a queue implementation that store jobs to a local Badger database on-disk to provide single-system restartabilty for jobs.
a queue implementation that stores jobs in a PostgreSQL, mirroring the MongoDB job functionality, to be able to meet job backends.
queue implementations that use messaging systems (Kafka, AMPQ) for backends. There exists an SQS implementation, but all of these systems have less strict semantics for process restarts than the database options, and database can easily handle on the order of a hundred of thousand of jobs an hour.
changes to the queue API to remove a few legacy methods that return channels instead of iterators.
improve the semantics for closing a queue.

While Amboy has provisions for building architectures with workers running on multiple processes, rather than having queues running multiple threads within the same process, it would be interesting to develop more fully-fledged examples of this.

Jasper

Jasper provides a high level set of tools for managing subprocesses in Go, adding a highly ergonomic API (in Go,) as well as exposing process management as a service to facilitate running processes on remote machines. Jasper also manages/tracks the state of running processes, and can reduce pressures on calling code to track the state of processes.

The package currently exposes Jasper services over REST, gRPC, and MongoDB’s wire protocol, and there is also code to support using SSH as a transport so that you don’t need to expose remote these services publically.

Jasper is, perhaps, the most stable of the libraries, but I am interested in thinking about a couple of extensions:

using jasper as PID 1 within a container to be able to orchestrate workloads running on containers, and contain (some) support for lower level container orchestration.
write configuration file-based tools for using jasper to orchestrate buildsystems and distributed test orchestration.

I’m also interested in cleaning up some of the MongoDB-specific code (i.e. the code that downloads MongoDB versions for use in test harnesses,) and perhaps reinvisioning that as client code that uses Jasper rather than as a part of Jasper.

Gimlet

I’ve written about gimlet here before when I started the project, and it remains a pretty useful and ergonomic way to define and regester HTTP APIs, in the past few years, its grown to add more authentication features, as well as a new “framework” for defining routes. This makes it possible to define routes by implementing an interface that:

makes it very easy to produce paginated routes, and provides some helpers for managing content
separates the parsing of inputs from executing the results, which can make route definitions easy to test without integration tests.
rehome functionality on top of chi router. The current implementation uses Negroni and gorilla mux (but neither are exposed in the interface), but I think it’d be nice to have this be optional, and chi looks pretty nice.

Other Great Tools

The following libraries are defiantly smaller, but I think they’re really cool:

birch is a builder for programatically building BSON documents, and MongoDB’s extended JSON format. It’s built upon an earlier version of the BSON library. While it’s unlikely to be as fast at scale, for many operations (like finding a key in a document), the interface is great for constructing payloads.
ftdc provides a way to generate (and read,) MongoDB’s diagnostic data format, which is a highly compressed timeseries data format. While this implementation could drift from the internal implementation over time, the format and tool remain useful for arbitrary timeseries data.
certdepot provides a way to manage a certificate authority with the certificates stored in a centralized store. I’d like to add other storage backends over time.

And more…

Notes

My old team built a continuous integration tool called evergreen which is itself a pun (using “green” to indicate passing builds, most CI systems are not ever-green.) Many of the tools and libraries that we built had got names with tree puns, and somehow “deciduous” seems like the right plan. ↩︎
For an arcane reason, all of these tools had to build with an old version of Go (1.10) that didn’t support modules, so we had an arcane and annoying vendoring solution that wasn’t compatible with modules. ↩︎
Go tends to be a pretty verbose language, and I think most of the time this creates clarity; however, for common tasks it has the feeling of offering a poor abstraction, or forcing you to write duplicated code. While I don’t believe that more-terse-code is better, I think there’s a point where the extra verbosity for route operations just creates the possibility for more errors. ↩︎
The team was small, and as an internal tools team, unlikely to grow to the size where microservices offered any kind of engineering efficiency (at some cost,) and there weren’t significant technical gains that we could take advantage of: the services of the application didn’t need to be globally distributed and the boundaries between components didn’t need to scale independently. ↩︎

In Favor of an Application Infrastructure Framework

2018-07-27 – tychoish

The byproduct of a lot of my work on Evergreen over the past few years has been that I’ve amassed a small collection of reusable components in the form of libraries that address important but not particularly core functionality. While I think the actual features and scale that we’ve achieved for “real” features, the infrastructure that we built has been particularly exciting.

It turns out that I’ve written about a number of these components already here, even. Though I think, my initial posts were about these components in their more proof-of-concept stage, now (finally!) we’re using them all in production so their a bit more hardened.

The first grip is a logging framework. Initially, I thought a high-level logging framework with plug-able backends was going to be really compelling. While configurable back-ends has been good for using grip as the primary toolkit for writing messaging and user-facing alerting, the most compelling feature has been structured logging.

Most of the logging that we do, now, (thanks to grip,) has been to pass structures (e.g. maps) to the logger with key/value data. In combination with log aggregation services/tools (like ELK, splunk, or sumologic,) we can basically take care of nearly all of our application observablity (monitoring) use cases in one stop. It includes easy to use system and golang runtime metrics collection, all using an easy push-based collection, and can also power alert escalation. After having maintained an application using this kind of event driven structured logging system, I have a hard time thinking about running applications without it.

Next we have amboy which is a queue-system. Like grip, all of the components are plug-able, so it support in-memory (ephemeral) queues, distributed queues, dependency graph systems and priority queue implementations as well as a number of different execution models. The most powerful thing that amboy affords us is a single and clear abstraction for defining “background” execution and workloads.

In go it’s easy to spin up a go routine to do some work in the background, it’s super easy to implement worker pools to parallelize the processing of simple tasks. The problem is that as systems grow, it becomes pretty hard to track this complexity in your own code, and we discovered that our application was essentially bifurcated between offline (e.g. background) and online (e.g. request-driven) work. To address all of this problem, we defined all of the background work as small, independent units of work, which can be easily tested, and as a result there is essentially no-adhoc concurrency in the application except what runs in the queues.

The end result of having a unified way to characterize background work is that scaling the application because much less complicated. We can build new queue implementations, without needing to think about the business logic of the background work itself, and we add capacity by increasing the resources of worker machines without needing to think about the architecture of the system. Delightfully, the queue metaphor is independent of external services, so we can run the queue in memory backed by a heap or hash map with executors running in dedicated go-routines if we want, and also scale it out to use databases or dedicated queue services with additional process-based workers, as needed.

The last component, gimlet, addresses building HTTP interfaces, and provides tools for registering routes, writing responses, managing middleware and authentication, an defining routes in a way that’s easy to test. Gimlet is just a wrapper around some established tools like negroni, gorilla/mux, all built on established standard-library foundations. Gimlet has allowed us to unify a bunch of different approaches to these problems, and has lowered the barrier to entry for most of our interfaces.

There are other infrastructural problems still on the table: tools for building inter-system communication and RPC when you can’t communicate via a queue or a shared database (I’ve been thinking a lot about gRPC and protocol buffers for this,) and also about object-mapping and database access patterns, which I don’t really have an answer for.¹

Nevertheless, with the observability, background tasks, and HTTP interface problems well understood at supported, it definitely frees developers to spend more of their time focused core problems of importance to users and the goals of the project. Which is a great place to be.

I built a database migration tool called anser which is mostly focused on integrating migration workflows into production systems so that migrations are part of the core code and can run without affecting production traffic, and while these tools have been useful, I haven’t seen a clear path between this project and meaningfully simplifying the way we manage access to data. ↩︎

Cache Maintence

2016-11-11 – tychoish

Twice this fall I’ve worked on code that takes a group of files and ensures that the total size of the files are less than a given size. The operation is pretty simple: identify all the files and their size (recursively, or not but accounting for the size of directories,) sort them, and and delete files from the front or back of the populated list. When you’ve reached the desired size.

If you have a cache and you’re constantly adding content to it, eventually you will either need an infinite amount of storage or you’ll have to delete something.

But what to delete? And how?

Presumably you use some items in the cache more often than others, and some files that change very often while others change very rarely, and in many cases, use and change frequency are orthogonal.

For the cases that I’ve worked on, the first case, frequency of use, is the property that we’re interested in. If we haven’t used a file in a while relative to the other files, the chances are its safe to delete.

The problem is that access time (atime) is that while most file systems have a concept of atime, most of them don’t update it. Which makes sense: if every time you read a file you have to update the metadata, then every read operation becomes a write operations, and everything becomes slow.

Relative access time or, relatime, helps some. Here atime is updated, but only if you’re writing to the file or if it’s been more than 24 hours since your last update. The problem, of course, is that if cache are write-once-read-many and operates with a time granularity of less than a day, then relatime is often just creation time. That’s no good.

The approach I’ve been taking is to use the last modification time, (mtime), and to intentionally update mtime (e.g. using touch or a similar operation,) after cache access. It’s slightly less elegant than it could be, but it works really well and requires very little overhead.

Armed with these decisions all you need is a thing that crawls a file system, collects objects and stores their size and time, so we know how large the cache is, and can maintain an ordered list of file objects by mtime. The ordered lists of files should be a heap, but the truth is that you build and sort the structure once, and then just remove the “lowest” (oldest) items until the cache is the right size and then throwing it all away, so you’re not really doing many heap-ish operations.

Therefore, I present lru. Earlier this summer I wrote a less generic implementation of the same principal, and was elbows deep into another project when I realized I needed another cache pruning tool. Sensing a trend, I decided to put a little more time into the project and built it out as a library that other people can use, though frankly I’m mostly concerned about my future self.

The package has two types, a Cache type that incorporates the core functionality and FileObject which represents items in the cache.

Operation is simple. You can construct and add items to the cache manually, or you can use DirectoryContents or TreeContents which build caches from a starting file system point. DirectoryContents looks at the contents of a single directory (skipping sub-directories optionally) and returns a Cache object with those contents. If you do not skip directories, each directory has, in the cache the total size of its contents.

TreeContents recurses through the tree and ignores directories, and returns a Cache object with all of those elements. TreeContents does not clean up empty directories.

Once you have a Cache object, use its Prune method with the maximum size of the cache (in bytes), any objects to exclude, and an optional dry-run flag, to prune the cache down until it’s less than or equal to the max size.

Done.

I’m not planning any substantive changes to the library at this time as it meets most of my needs but there are some obvious features:

a daemon mode where the cache object can “watch” a file system (using ionotify or similar) and add items to or update existing items in the cache. Potentially using fsnotify.
an option to delete empty directories encountered during pruning.
options to use other time data from the file system when possible, potentially using the times library.

With luck, I can go a little while longer without doing this again. With a little more luck, you’ll find lru useful.

Shimgo Hugo

2016-11-06 – tychoish

In an effort to relaunch tychoish with a more contemporary theme and a publishing tool that (hopefully) will support a more regular posting schedule, I also wrote a nifty go library for dealing with reStructuredText, which may be useful and I think illustrates something about build systems.

In my (apparently still) usual style, there’s some narrative lead in that that takes a bit to get through.

Over the past couple of weeks, I redesigned and redeployed my blog. The system it replaced was somewhat cobbled together, was missing a number of features (e.g. archives, rss feeds, social features, etc) and to add insult to injury it was pretty publishing was pretty slow, and it was difficult to manage a pipeline of posts.

In short, I didn’t post much, though I’ve written things from time to time that I haven’t done a great job of actually posting them, and it was hard to actually get people to read them, which was further demotivating. I’ve been reading a lot of interesting things, and I’m not writing that much for work any more, and I’ve been doing enough things recently that I want to write about them. See this twitter strand I had a bit ago on the topic.

So I started playing around again. Powering this blog is hard, because I have a lot of content¹ and I very much want to use restructuredText. ² There’s this thing called hugo which seems to be pretty popular. I’ve been using static site generators for years, and prefer the approach. It’s also helpful that I worked with Steve (hugo’s original author) during its initial development, and either by coincidence, or as a result our conversations and a couple of very small early contributions a number of things I cared about were included in its design:

support for multiple text markup features (including reStructuredText,) (I cobbled together rst support. )
customizeable page metadata formats. (I think I pushed for support of alternate front-matter formats, specifically YAML, and might have made a few prototype commits on this project)
the ability to schedule posts in the future, (I think we talked about this.)

I think I also winged a bunch in those days about performance. I’ve written about this here before, but one of the classic problems with static site generators is that no one expects sites with one or two thousand posts/content atoms, and so they’re developed against relatively small corpus' and then have performance that doesn’t really scale.

Hugo is fast, but mostly because go is fast, which I think is, in most cases, good enough, but not in my case, and particularly not with the rst implementation as it stood. After all this preamble, we’ve gotten to the interesting part: a tool I’m calling shimgo.

The initial support for rst in hugo is straight forward. Every time hugo encounters an rst file, it calls the shell rst2html utility that is installed when you install docutils, passing it the content of the file on standard input, and parsing from the output, the content we need. It’s not pretty, it'’s not smart, but it works.

Slowly: to publish all of tychoish it took about 3 minutes.

I attempted an rst-to-markdown translation of my exiting content and then ran that through the markdown parsers in hugo, just to get comparative timings: 3ish seconds.

reStructuredText is a bit slower to parse than markdown, on account of it’s comparative strictness and the fact that the toolchain is in python and not go, but this difference seemed absurd.

There’s a go-rst project to write a pure-go implementation of reStructuredText, but I’ve kept my eye on that project for a couple of years, and it’s a lot of work that is pretty far off. While I do want to do more to support this project, I wanted to get a new blog up and running in a few weeks, not years.

Based on the differences in timing, and some intuition from years of writing build systems, I made a wager with myself: while the python rst implementation is likely really slow, it’s not that slow, and I was loosing a lot of time to process creation, teardown, and context switching: processing a single file is pretty quick, but the overhead gets to be too much at scale.

I built a little prototype where I ran a very small HTTP service that took rst as a POST request and returned processed HTML. Now there was one process running, and instead of calling fork/exec a bunch, we just had a little but of (local) network overhead.

Faster: 20 second.

I decided I could deal with it.

What remains is making it production worthy or hugo. While it was good enough for me, I very much don’t want to get into the position of needing to maintain a single-feature fork of a software project in active development, and frankly the existing rst support has a difficult to express external dependency. Adding a HTTP service would be a hard sell.

This brings us to shimgo: the idea is to package everything needed to implement the above solution in an external go package, and package it behind a functional interface, so that hugo maintainers don’t need to know anything about its working.

Isn’t abstraction wonderful?

So here we are. I’m still working on getting this patch mainlined, and there is some polish for shimgo itself (mostly the README file and some documentation), but it works, and if you’re doing anything with reStructuredText in go, then you ought to give shimgo a try.

While I think it would be reasonable to start afresh, I think the whole point of having archives is that you mostly just leave them around. ↩︎
It’s not the most popular markup language, but I’ve used it more than any other text markup, and I find the fact that other langauges (e.g. markdown) vary a lot between implementations to be distressing. Admitedly the fact that there aren’t other implementations of rst is also distressing, but one the balance is somewhat less distressing. ↩︎

Going Forward

2016-07-24 – tychoish

I wrote a post about moving on from being a technical writer, and I’ve definitely written some since then about programming and various side projects, but I haven’t really done the kind of public reflection on this topic that I’ve done historically about, many other things.

When I switched to a programming team, I knew some things about computers, and I was a decent Python programmer. The goal, then was to teach myself a second programming language (Go,) and learn how to make “real” software with other people, or on teams with other people. Both of those projects are going well: I think I’ve become pretty solid as a Go programmer, although, it’s hard to say what “real” software is, or if I’m good at making it, but all indications are positive.

This weekend, for various reasons, I’ve been reviving a project that I did some work on this fall and winter, that I’ve abandoned for about 6 months. It’s been both troubling (there are parts that are truly terrible,) and kind of rewarding to see how much I’ve grown as a programmer just from looking at the code.

Queue then, I guess, the self reflective interlude.

My reason for wanting to learn--really learn--a second programming language, was to make sure that all the things I knew about system design, algorithms, and data structures was generalizable, and not rooted in the semantics of a specific language or even implementation of that language. I was also interested in learning more about the process of learning new programming languages so that I had some experience with the learning process, which may come in handy in the future.

Learning Go, I think helped me achieve or realize these goals. While I haven’t really set out to learn a third language yet, it feels tractable. I’ve also noticed some changes and differences in some other aspects of my interests.

I used to be really interested in programming qua programming, and I thought a lot about programming languages. While I still can evaluate programming languages, and have my own share of opinions about “the way things work,” I’m less concerned with the specific syntax or implementation. I think a lot about build tools, platform support, deployment models, and distributing methods and stories, rather than what it can do or how you have to write it. Or, how you make it ship it and run it.

I’ve also gotten less interested in UNIX-esque systems administration and operations, which is historically a thing I’ve been quite interested in. These days, I find myself thinking more about the following kinds of problems:

build systems, the tools building software from source files, (and sometimes testing it!) and the ways to do this super efficiently and sensibly. Build systems are quite hard because in a lot of ways they’re the point through which your software (as software) interacts with all of the platforms it runs on. Efficient build systems have a huge impact on developer productivity, which is a big interest.
developer productivity, this is a big catch all category, but it’s almost always true that people are more expensive than computers, so working on tools and features (like better build systems, or automating various aspects of the development process,)
continuous integration and deployment, again connected to developer productivity, but taking the “automate building and testing,” story to its logical conclusion. CD environments mean you deploy changes much more often, but you also require and force yourself to trust the automated systems and make sure that project leadership and management is just as automated as the development experience.
internal infrastructure, as in “internal services and tools that all applications need,” like logging, queuing systems, abstractions for persistence, deployment systems, testing, and exposed interfaces (e.g. RPC systems, REST/HTTP, or command line option option parsing). Having good tools for these generic aspects of the application make writing actual features for users easier. I’m also increasingly convinced that the way to improve applications and systems is to improve these lower level components and their interfaces.

Free Software and open source are still important, as is UNIX, but these kinds of developer productivity and automation issues are a level above that. I’ve changed in the last 5 years, software has changed in the last five years, the way we run software on systems has changed in the last 5 years. I’m super excited to see what kinds of things I can do in this space, and where I end up in 5 years.

I’m also interested in thinking about ways to write about this. I’d written drafts of a number of posts that were about learning how to program, about systems administration, and now that I’m finding and making more time for writing, one of the things I don’t really know about is what kind of writing on these topics I’m interested in doing, or how to do it in a way that anyone would be interested in reading.

We shall see. Regardless, I hope that I’m back, now.

Logging Trends#

Infrastructure Libraries#

Grip Reborn#

Background#

Platform Overview#

Grip#

Amboy#

Jasper#

Gimlet#

Other Great Tools#

Notes#