New Beginnings: Deciduous Platform

I left my job at MongoDB (8.5 years!) at the beginning of the summer, and started a new job at the beginning of the month. I'll be writing and posting more about my new gig, career paths in general, reflections on what I accomplished on my old team, the process of interviewing as a software engineer, as well as the profession and industry over time. For now, though, I want to write about one of the things I've been working on this summer: making a bunch of the open source libraries that I worked on more generally useable. I've been calling this the deciduous platform, [2] which now has its own github organization! So it must be real.

The main modification in these forks, aside from adding a few features that had been on my list for a while, has been to update the buildsystem to use go modules [3] and rewrite the history of the repository to remove all of the old vendoring. I expect to continue development on some aspects of these over time, though the truth is that these libraries were quite stable and were nearly in maintenance mode anyway.

Background

The team was responsible for a big monolith (or so) application: development had begun in 2013, which was early for Go, and while everything worked, it was a bit weird. My efforts when I joined in 2015 focused mostly on stabilization, architecture, and reliability. While the application worked, mostly, it was clear that it suffered from a few problem, which I believe were the result of originating early in the history of Go: First, because no one had tried to write big applications yet, the patterns weren't well established, and so the team ended up writing code that worked but that was difficult to maintain, and ended up with bespoke solutions to a number of generic problems like running workloads in the background or managing Apia. Second, Go's standard library tends to be really solid, but also tends towards being a little low level for most day-to-day tasks, so things like logging and process management end up requiring more code [4] than is reasonable.

I taught myself to write Go by working on a logging library, and worked on a distributed queue library. One of the things that I realized early, was that breaking the application into "microservices," would have been both difficult and offered minimal benefit, [5] so I went with the approach of creating a well factored monolith, which included a lot of application specific work, but also building a small collection of libraries and internal services to provide useful abstractions and separations for application developers and projects.

This allowed for a certain level of focus, both for the team creating the infrastructure, but also for the application itself: the developers working on the application mostly focused on the kind of high level core business logic that you'd expect, while the infrastructure/platform team really focused on these libraries and various integration problems. The focus wasn't just organizational: the codebases became easier to maintain and features became easier to develop.

This experience has lead me to think that architecture decisions may not be well captured by the monolith/microservice dichotomy, but rather there's' this third option that centers on internal architecture, platforms, and the possibility for developer focus and velocity.

Platform Overview

While there are 13 or so repositories in the platform, really there are 4 major libraries: grip, a logging library; jasper, a process management framework; amboy, a (possibly distributed) worker queue; and gimlet, a collection of tools for building HTTP/REST services.

The tools all work pretty well together, and combine to provide an environment where you can focus on writing the business logic for your HTTP services and background tasks, with minimal boilerplate to get it all running. It's pretty swell, and makes it possible to spin up (or spin out) well factored services with similar internal architectures, and robust internal infrastructure.

I wanted to write a bit about each of the major components, addressing why I think these libraries are compelling and the kinds of features that I'm excited to add in the future.

Grip

Grip is a structured-logging friendly library, and is broadly similar to other third-party logging systems. There are two main underlying interfaces, representing logging targets (Sender) and messages, as well as a higher level "journal" interface for use during programming. It's pretty easy to write new message or bakcends, which means you can use grip to capture all kinds of arbitrary messages in consistent manners, and also send those messages wherever they're needed.

Internally, it's quite nice to be able to just send messages to specific log targets, using configuration within an application rather than needing to operationally manage log output. Operations folks shouldn't be stuck dealing with just managing logs, after all, and it's quite nice to just send data directly to Splunk or Sumologic. We also used the same grip fundamentals to send notifications and alerts to Slack channels, email lists, or even to create Jira Issues, minimizing the amount of clunky integration code.

There are some pretty cool projects in and around grip:

  • support for additional logging targets. The decudous version of grip adds twitter as an output format as well as creating desktop notifications (e.g. growl/libnotify,) but I think it would also be interesting to add fluent/logstash connections that don't have to transit via standard error.'
  • While structured logging is great, I noticed that we ended up logging messages automatically in the background as a method of metrics collection. It would be cool to be able to add some kind of "intercepting sender" that handled some of these structured metrics, and was able to expose this data in a format that the conventional tools these days (prometheus, others,) can handle. Some of this code would clearly need to be in Grip, and other aspects clearly fall into other tools/libraries.

Amboy

Amboy is an interface for doing things with queues. The interfaces are simple, and you have:

  • a queue that has some way of storing and dispatching jobs.
  • implementations of jobs which are responsible for executing your business logic, and with a base implemention that you can easily compose, into your job types, all you need to implement, really is a Run() method.
  • a queue "group" which provides a higher level abstraction on top of queues to support segregating workflows/queues in a single system to improve quality of service. Group queues function like other queues but can be automatically managed by the processes.
  • a runner/pool implementation that provides the actual thread pool.

There's a type registry for job implementations and versioning in the schema for jobs so that you can safely round-trip a job between machines and update the implementation safely without ensuring the queue is empty.

This turns out to be incredibly powerful for managing background and asynchronous work in applications. The package includes a number of in-memory queues for managing workloads in ephemeral utilities, as well as a distributed MongoDB backed-queue for running multiple copies of an application with a shared queue(s). There's also a layer of management tools for introspecting, managing, the state of jobs.

While Amboy is quite stable, there is a small collection of work that I'm interested in:

  • a queue implementation that store jobs to a local Badger database on-disk to provide single-system restartabilty for jobs.
  • a queue implementation that stores jobs in a PostgreSQL, mirroring the MongoDB job functionality, to be able to meet job backends.
  • queue implementations that use messaging systems (Kafka, AMPQ) for backends. There exists an SQS implementation, but all of these systems have less strict semantics for process restarts than the database options, and database can easily handle on the order of a hundred of thousand of jobs an hour.
  • changes to the queue API to remove a few legacy methods that return channels instead of iterators.
  • improve the semantics for closing a queue.

While Amboy has provisions for building architectures with workers running on multiple processes, rather than having queues running multiple threads within the same process, it would be interesting to develop more fully-fledged examples of this.

Jasper

Jasper provides a high level set of tools for managing subprocesses in Go, adding a highly ergonomic API (in Go,) as well as exposing process management as a service to facilitate running processes on remote machines. Jasper also manages/tracks the state of running processes, and can reduce pressures on calling code to track the state of processes.

The package currently exposes Jasper services over REST, gRPC, and MongoDB's wire protocol, and there is also code to support using SSH as a transport so that you don't need to expose remote these services publically.

Jasper is, perhaps, the most stable of the libraries, but I am interested in thinking about a couple of extensions:

  • using jasper as PID 1 within a container to be able to orchestrate workloads running on containers, and contain (some) support for lower level container orchestration.
  • write configuration file-based tools for using jasper to orchestrate buildsystems and distributed test orchestration.

I'm also interested in cleaning up some of the MongoDB-specific code (i.e. the code that downloads MongoDB versions for use in test harnesses,) and perhaps reinvisioning that as client code that uses Jasper rather than as a part of Jasper.

Gimlet

I've written about gimlet here before when I started the project, and it remains a pretty useful and ergonomic way to define and regester HTTP APIs, in the past few years, its grown to add more authentication features, as well as a new "framework" for defining routes. This makes it possible to define routes by implementing an interface that:

  • makes it very easy to produce paginated routes, and provides some helpers for managing content
  • separates the parsing of inputs from executing the results, which can make route definitions easy to test without integration tests.
  • rehome functionality on top of chi router. The current implementation uses Negroni and gorilla mux (but neither are exposed in the interface), but I think it'd be nice to have this be optional, and chi looks pretty nice.

Other Great Tools

The following libraries are defiantly smaller, but I think they're really cool:

  • birch is a builder for programatically building BSON documents, and MongoDB's extended JSON format. It's built upon an earlier version of the BSON library. While it's unlikely to be as fast at scale, for many operations (like finding a key in a document), the interface is great for constructing payloads.
  • ftdc provides a way to generate (and read,) MongoDB's diagnostic data format, which is a highly compressed timeseries data format. While this implementation could drift from the internal implementation over time, the format and tool remain useful for arbitrary timeseries data.
  • certdepot provides a way to manage a certificate authority with the certificates stored in a centralized store. I'd like to add other storage backends over time.

And more...

Notes

[1]Though, given my usual publication lag, I'm writing this a couple days before starting.
[2]My old team built a continuous integration tool called evergreen which is itself a pun (using "green" to indicate passing builds, most CI systems are not ever-green.) Many of the tools and libraries that we built had got names with tree puns, and somehow "deciduous" seems like the right plan.
[3]For an arcane reason, all of these tools had to build with an old version of Go (1.10) that didn't support modules, so we had an arcane and annoying vendoring solution that wasn't compatible with modules.
[4]Go tends to be a pretty verbose language, and I think most of the time this creates clarity; however, for common tasks it has the feeling of offering a poor abstraction, or forcing you to write duplicated code. While I don't believe that more-terse-code is better, I think there's a point where the extra verbosity for route operations just creates the possibility for more errors.
[5]The team was small, and as an internal tools team, unlikely to grow to the size where microservices offered any kind of engineering efficiency (at some cost,) and there weren't significant technical gains that we could take advantage of: the services of the application didn't need to be globally distributed and the boundaries between components didn't need to scale independently.

In Favor of an Application Infrastructure Framework

The byproduct of a lot of my work on Evergreen over the past few years has been that I've amassed a small collection of reusable components in the form of libraries that address important but not particularly core functionality. While I think the actual features and scale that we've achieved for "real" features, the infrastructure that we built has been particularly exciting.

It turns out that I've written about a number of these components already here, even. Though I think, my initial posts were about these components in their more proof-of-concept stage, now (finally!) we're using them all in production so their a bit more hardened.

The first grip is a logging framework. Initially, I thought a high-level logging framework with plug-able backends was going to be really compelling. While configurable back-ends has been good for using grip as the primary toolkit for writing messaging and user-facing alerting, the most compelling feature has been structured logging.

Most of the logging that we do, now, (thanks to grip,) has been to pass structures (e.g. maps) to the logger with key/value data. In combination with log aggregation services/tools (like ELK, splunk, or sumologic,) we can basically take care of nearly all of our application observablity (monitoring) use cases in one stop. It includes easy to use system and golang runtime metrics collection, all using an easy push-based collection, and can also power alert escalation. After having maintained an application using this kind of event driven structured logging system, I have a hard time thinking about running applications without it.

Next we have amboy which is a queue-system. Like grip, all of the components are plug-able, so it support in-memory (ephemeral) queues, distributed queues, dependency graph systems and priority queue implementations as well as a number of different execution models. The most powerful thing that amboy affords us is a single and clear abstraction for defining "background" execution and workloads.

In go it's easy to spin up a go routine to do some work in the background, it's super easy to implement worker pools to parallelize the processing of simple tasks. The problem is that as systems grow, it becomes pretty hard to track this complexity in your own code, and we discovered that our application was essentially bifurcated between offline (e.g. background) and online (e.g. request-driven) work. To address all of this problem, we defined all of the background work as small, independent units of work, which can be easily tested, and as a result there is essentially no-adhoc concurrency in the application except what runs in the queues.

The end result of having a unified way to characterize background work is that scaling the application because much less complicated. We can build new queue implementations, without needing to think about the business logic of the background work itself, and we add capacity by increasing the resources of worker machines without needing to think about the architecture of the system. Delightfully, the queue metaphor is independent of external services, so we can run the queue in memory backed by a heap or hash map with executors running in dedicated go-routines if we want, and also scale it out to use databases or dedicated queue services with additional process-based workers, as needed.

The last component, gimlet, addresses building HTTP interfaces, and provides tools for registering routes, writing responses, managing middleware and authentication, an defining routes in a way that's easy to test. Gimlet is just a wrapper around some established tools like negroni, gorilla/mux, all built on established standard-library foundations. Gimlet has allowed us to unify a bunch of different approaches to these problems, and has lowered the barrier to entry for most of our interfaces.

There are other infrastructural problems still on the table: tools for building inter-system communication and RPC when you can't communicate via a queue or a shared database (I've been thinking a lot about gRPC and protocol buffers for this,) and also about object-mapping and database access patterns, which I don't really have an answer for. [1]

Nevertheless, with the observability, background tasks, and HTTP interface problems well understood at supported, it definitely frees developers to spend more of their time focused core problems of importance to users and the goals of the project. Which is a great place to be.

[1]I built a database migration tool called anser which is mostly focused on integrating migration workflows into production systems so that migrations are part of the core code and can run without affecting production traffic, and while these tools have been useful, I haven't seen a clear path between this project and meaningfully simplifying the way we manage access to data.

Cache Maintence

Twice this fall I've worked on code that takes a group of files and ensures that the total size of the files are less than a given size. The operation is pretty simple: identify all the files and their size (recursively, or not but accounting for the size of directories,) sort them, and and delete files from the front or back of the populated list. When you've reached the desired size.

If you have a cache and you're constantly adding content to it, eventually you will either need an infinite amount of storage or you'll have to delete something.

But what to delete? And how?

Presumably you use some items in the cache more often than others, and some files that change very often while others change very rarely, and in many cases, use and change frequency are orthogonal.

For the cases that I've worked on, the first case, frequency of use, is the property that we're interested in. If we haven't used a file in a while relative to the other files, the chances are its safe to delete.

The problem is that access time (atime) is that while most file systems have a concept of atime, most of them don't update it. Which makes sense: if every time you read a file you have to update the metadata, then every read operation becomes a write operations, and everything becomes slow.

Relative access time or, relatime, helps some. Here atime is updated, but only if you're writing to the file or if it's been more than 24 hours since your last update. The problem, of course, is that if cache are write-once-read-many and operates with a time granularity of less than a day, then relatime is often just creation time. That's no good.

The approach I've been taking is to use the last modification time, (mtime), and to intentionally update mtime (e.g. using touch or a similar operation,) after cache access. It's slightly less elegant than it could be, but it works really well and requires very little overhead.

Armed with these decisions all you need is a thing that crawls a file system, collects objects and stores their size and time, so we know how large the cache is, and can maintain an ordered list of file objects by mtime. The ordered lists of files should be a heap, but the truth is that you build and sort the structure once, and then just remove the "lowest" (oldest) items until the cache is the right size and then throwing it all away, so you're not really doing many heap-ish operations.


Therefore, I present lru. Earlier this summer I wrote a less generic implementation of the same principal, and was elbows deep into another project when I realized I needed another cache pruning tool. Sensing a trend, I decided to put a little more time into the project and built it out as a library that other people can use, though frankly I'm mostly concerned about my future self.

The package has two types, a Cache type that incorporates the core functionality and FileObject which represents items in the cache.

Operation is simple. You can construct and add items to the cache manually, or you can use DirectoryContents or TreeContents which build caches from a starting file system point. DirectoryContents looks at the contents of a single directory (skipping sub-directories optionally) and returns a Cache object with those contents. If you do not skip directories, each directory has, in the cache the total size of its contents.

TreeContents recurses through the tree and ignores directories, and returns a Cache object with all of those elements. TreeContents does not clean up empty directories.

Once you have a Cache object, use its Prune method with the maximum size of the cache (in bytes), any objects to exclude, and an optional dry-run flag, to prune the cache down until it's less than or equal to the max size.

Done.


I'm not planning any substantive changes to the library at this time as it meets most of my needs but there are some obvious features:

  • a daemon mode where the cache object can "watch" a file system (using ionotify or similar) and add items to or update existing items in the cache. Potentially using fsnotify.
  • an option to delete empty directories encountered during pruning.
  • options to use other time data from the file system when possible, potentially using the times library.

With luck, I can go a little while longer without doing this again. With a little more luck, you'll find lru useful.

Shimgo Hugo

In an effort to relaunch tychoish with a more contemporary theme and a publishing tool that (hopefully) will support a more regular posting schedule, I also wrote a nifty go library for dealing with reStructuredText, which may be useful and I think illustrates something about build systems.

In my (apparently still) usual style, there's some narrative lead in that that takes a bit to get through.


Over the past couple of weeks, I redesigned and redeployed my blog. The system it replaced was somewhat cobbled together, was missing a number of features (e.g. archives, rss feeds, social features, etc) and to add insult to injury it was pretty publishing was pretty slow, and it was difficult to manage a pipeline of posts.

In short, I didn't post much, though I've written things from time to time that I haven't done a great job of actually posting them, and it was hard to actually get people to read them, which was further demotivating. I've been reading a lot of interesting things, and I'm not writing that much for work any more, and I've been doing enough things recently that I want to write about them. See this twitter strand I had a bit ago on the topic.

So I started playing around again. Powering this blog is hard, because I have a lot of content [1] and I very much want to use restructuredText. [2] There's this thing called hugo which seems to be pretty popular. I've been using static site generators for years, and prefer the approach. It's also helpful that I worked with Steve (hugo's original author) during its initial development, and either by coincidence, or as a result our conversations and a couple of very small early contributions a number of things I cared about were included in its design:

  • support for multiple text markup features (including reStructuredText,) (I cobbled together rst support. )
  • customizeable page metadata formats. (I think I pushed for support of alternate front-matter formats, specifically YAML, and might have made a few prototype commits on this project)
  • the ability to schedule posts in the future, (I think we talked about this.)

I think I also winged a bunch in those days about performance. I've written about this here before, but one of the classic problems with static site generators is that no one expects sites with one or two thousand posts/content atoms, and so they're developed against relatively small corpus' and then have performance that doesn't really scale.

Hugo is fast, but mostly because go is fast, which I think is, in most cases, good enough, but not in my case, and particularly not with the rst implementation as it stood. After all this preamble, we've gotten to the interesting part: a tool I'm calling shimgo.


The initial support for rst in hugo is straight forward. Every time hugo encounters an rst file, it calls the shell rst2html utility that is installed when you install docutils, passing it the content of the file on standard input, and parsing from the output, the content we need. It's not pretty, it''s not smart, but it works.

Slowly: to publish all of tychoish it took about 3 minutes.

I attempted an rst-to-markdown translation of my exiting content and then ran that through the markdown parsers in hugo, just to get comparative timings: 3ish seconds.

reStructuredText is a bit slower to parse than markdown, on account of it's comparative strictness and the fact that the toolchain is in python and not go, but this difference seemed absurd.

There's a go-rst project to write a pure-go implementation of reStructuredText, but I've kept my eye on that project for a couple of years, and it's a lot of work that is pretty far off. While I do want to do more to support this project, I wanted to get a new blog up and running in a few weeks, not years.

Based on the differences in timing, and some intuition from years of writing build systems, I made a wager with myself: while the python rst implementation is likely really slow, it's not that slow, and I was loosing a lot of time to process creation, teardown, and context switching: processing a single file is pretty quick, but the overhead gets to be too much at scale.

I built a little prototype where I ran a very small HTTP service that took rst as a POST request and returned processed HTML. Now there was one process running, and instead of calling fork/exec a bunch, we just had a little but of (local) network overhead.

Faster: 20 second.

I decided I could deal with it.

What remains is making it production worthy or hugo. While it was good enough for me, I very much don't want to get into the position of needing to maintain a single-feature fork of a software project in active development, and frankly the existing rst support has a difficult to express external dependency. Adding a HTTP service would be a hard sell.

This brings us to shimgo: the idea is to package everything needed to implement the above solution in an external go package, and package it behind a functional interface, so that hugo maintainers don't need to know anything about its working.

Isn't abstraction wonderful?

So here we are. I'm still working on getting this patch mainlined, and there is some polish for shimgo itself (mostly the README file and some documentation), but it works, and if you're doing anything with reStructuredText in go, then you ought to give shimgo a try.

[1]While I think it would be reasonable to start afresh, I think the whole point of having archives is that you mostly just leave them around.
[2]It's not the most popular markup language, but I've used it more than any other text markup, and I find the fact that other langauges (e.g. markdown) vary a lot between implementations to be distressing. Admitedly the fact that there aren't other implementations of rst is also distressing, but one the balance is somewhat less distressing.

Going Forward

I wrote a post about moving on from being a technical writer, and I've definitely written some since then about programming and various side projects, but I haven't really done the kind of public reflection on this topic that I've done historically about, many other things.

When I switched to a programming team, I knew some things about computers, and I was a decent Python programmer. The goal, then was to teach myself a second programming language (Go,) and learn how to make "real" software with other people, or on teams with other people. Both of those projects are going well: I think I've become pretty solid as a Go programmer, although, it's hard to say what "real" software is, or if I'm good at making it, but all indications are positive.

This weekend, for various reasons, I've been reviving a project that I did some work on this fall and winter, that I've abandoned for about 6 months. It's been both troubling (there are parts that are truly terrible,) and kind of rewarding to see how much I've grown as a programmer just from looking at the code.

Queue then, I guess, the self reflective interlude.

My reason for wanting to learn--really learn--a second programming language, was to make sure that all the things I knew about system design, algorithms, and data structures was generalizable, and not rooted in the semantics of a specific language or even implementation of that language. I was also interested in learning more about the process of learning new programming languages so that I had some experience with the learning process, which may come in handy in the future.

Learning Go, I think helped me achieve or realize these goals. While I haven't really set out to learn a third language yet, it feels tractable. I've also noticed some changes and differences in some other aspects of my interests.

I used to be really interested in programming qua programming, and I thought a lot about programming languages. While I still can evaluate programming languages, and have my own share of opinions about "the way things work," I'm less concerned with the specific syntax or implementation. I think a lot about build tools, platform support, deployment models, and distributing methods and stories, rather than what it can do or how you have to write it. Or, how you make it ship it and run it.

I've also gotten less interested in UNIX-esque systems administration and operations, which is historically a thing I've been quite interested in. These days, I find myself thinking more about the following kinds of problems:

  • build systems, the tools building software from source files, (and sometimes testing it!) and the ways to do this super efficiently and sensibly. Build systems are quite hard because in a lot of ways they're the point through which your software (as software) interacts with all of the platforms it runs on. Efficient build systems have a huge impact on developer productivity, which is a big interest.
  • developer productivity, this is a big catch all category, but it's almost always true that people are more expensive than computers, so working on tools and features (like better build systems, or automating various aspects of the development process,)
  • continuous integration and deployment, again connected to developer productivity, but taking the "automate building and testing," story to its logical conclusion. CD environments mean you deploy changes much more often, but you also require and force yourself to trust the automated systems and make sure that project leadership and management is just as automated as the development experience.
  • internal infrastructure, as in "internal services and tools that all applications need," like logging, queuing systems, abstractions for persistence, deployment systems, testing, and exposed interfaces (e.g. RPC systems, REST/HTTP, or command line option option parsing). Having good tools for these generic aspects of the application make writing actual features for users easier. I'm also increasingly convinced that the way to improve applications and systems is to improve these lower level components and their interfaces.

Free Software and open source are still important, as is UNIX, but these kinds of developer productivity and automation issues are a level above that. I've changed in the last 5 years, software has changed in the last five years, the way we run software on systems has changed in the last 5 years. I'm super excited to see what kinds of things I can do in this space, and where I end up in 5 years.

I'm also interested in thinking about ways to write about this. I'd written drafts of a number of posts that were about learning how to program, about systems administration, and now that I'm finding and making more time for writing, one of the things I don't really know about is what kind of writing on these topics I'm interested in doing, or how to do it in a way that anyone would be interested in reading.

We shall see. Regardless, I hope that I'm back, now.

Get a Grip

I made another Go(lang) thing. Grip is a set of logging tools modeled on Go's standard logging system, with some additional (related) features, including:

  • level-based logging, with the ability to set a minimum threshold to exclude log messages based on priority (i.e. debugging.)
  • Error capture/logging, to log Go error objects.
  • Error aggregation, in continue-on-error situations, where you want to perform a bunch of operations and then return any errors if any of them returned an error but don't want to return an error after the first operation fails.
  • Logging to the systemd journal with fallback to standard library logging to standard output.

There are helper functions for logging using different kinds of default string formatting, as well as functions that take error objects, and a "lazy" logging method that take a simple interface for building log messages at log-time rather than at operation time.

None of these features are terribly exciting, and the systemd library wraps the systemd library from CoreOS. I'm a big fan of log levels and priority filtering, so it's nice to have a tool for that.

In the future, I'd like to add more generic syslog support if that's useful, and potentially tools for better categorical logging. There's also a good deal of repeated code and it might be nice to us this as an excuse to write a code-generator using go tool.

Pull requests and feedback are, of course, welcome.

Have a Gimlet: A Go JSON/HTTP API Toolkit

Look folks, I made a thing!

It's called gimlet, and it's a Go(lang) tool for making JSON/HTTP APIs (i.e. REST with JSON). Give it a whirl!

It's actually even less a tool, and more of a toolkit or just "a place to put all of the annoying infrastructure that you'll inevitably need when you want to build an JSON/HTTP interface, but that have nothing to do what whatever your API/application does: routing, serializing and serializing JSON.

Nothing hard, nothing terribly interesting, and certainly not anything you couldn't do another way, but, it's almost certainly true that this layer of application infrastructure is totally orthogonal to whatever you application is actually doing, so you should focus on that, and probaly use something like Gimliet.

Background

I'm using the term HTTP/JSON APIs for services where you send and recive JSON data over HTTP. Sometimes people call these REST APIs, and that's not inaccurate, but I think REST is a bit more complicated, and not exactly the core paradigm that I'm pursuing with Gimlet.

Sending and reviving JSON over HTTP makes a lot of sense: there are great tools for parsing JSON and HTTP is a decent high level protocol for interprocess communication between simple data applications. Look up "microservices" at your leisure.

Go is a great language for this it has a lot of tooling that anticipates these kinds of applications, and the deployment model is really friendly to operations teams and systems. Also the static

typing and reasonable separation of private and public interfaces is particularly lovely.

So it should be no surprise that there are a lot tools for building stweb applications, frameworks even. They're great, things like gorilla and negroni are great and provide a very useful set of tools for building Go web apps. Indeed even Gimlet uses components of each of these tools.

The issue, and reason for Gimlet, is that all of these tools assume that you're building a web application, with web pages, static resources, form handling, session state handling, and other things that are totally irrelevant to writing JSON/HTTP interfaces.

So then, Gimlet is a tool to build these kinds of APIs: simple, uses Negroni and Gorilla's mux, and does pretty much everything you need except actually write your code.

Example

Set up the app with some basic configuration:

import "github.com/tychoish/gimlet"

app := gimlet.NewApp()
app.SetPort(9001)
app.SetDefaultVersion(1)

This sets which port the HTTP server is going to listen for requests and configures the default version of the API. You do want all of your endpoints prefixed with "/v<number>" right? The default version of the API is also avalible without the prefix, or if the version of the route is 0. If you don't set it to 0.

Then register some routes:

app.AddRoute("/<path>").Version(<int>).Get().Handler(http.HandlerFunc)
app.AddRoute("/<path>").Version(<int>).Post().Handler(http.HandlerFunc)

app.AddRoute returns an API route object with a set of chainable methods for defining the routes. If you add multiple HTTP methods (GET POST and the like,) then Gimlet automatically defines multiple routes with the same handler for each method.

For handlers, I typically just write functions that take arguments from the top level context (database connections, application configuration, etc) and return``http.HandlerFunc`` objects. For example:

func helloWorld(config *Configuration) http.HandlerFunc {
     return func(w http.ResponseWriter, r *http.Request) {
          input := make(map[string]interface{})
          response := make(map[string]interface{})

          err := gimlet.GetJSON(input)

          // do stuff here

          gimlet.WriteJSON(w, response)
     }
}

Gimlet has the following functions that parse JSON out of the body of a request, or add JSON output to the body of a response, they are:

  • WriteJSONResponse(w http.ResponseWrite, code int, data interface{})
  • GetJSON(r *http.Request, data interface)

Which read or write data into the interface{} object (typically a struct.) The following three provide consistent response writers for common exit codes:

  • WriteJSON(w http.ResponseWriter, data interface{}) // 200
  • WriteErrorJSON(w http.ResponseWriter, data interface{}) // 400
  • WriteInternalErrorJSON(w http.ResponseWriter, data interface{}) // 500

Finally, when you've written your app, kick it all off, with the following:

err := app.Run()
if err != nil {
   fmt.Println(err)
   os.Exit(1)
}

And that's it. Enjoy, tell me in the comments or on the issues feed if you find something broken or confusing. Contribution welcome, of course.