A Common Failure

2020-07-03 – tychoish

I’ve been intermittently working on a common lisp library to produce a binary encoding of arbitrary objects, and I think I’m going to be abandoning the project. This is an explanation of that decision and an reflection on my experience.

Why Common Lisp?

First, some background. I’ve always thought that Common Lisp is a language with a bunch of cool features and selling points, but unsurprisingly, I’ve never really had the experience of writing more than some one-off bits of code in CL, which isn’t surprising. This project was a good experience for really digging into writing and managing a conceptually larger project which was a good kick in the pants to learn more.

The things I like:

the implementations of the core runtime are really robust and high quality, and make it possible to imagine running your code in a bunch of different contexts. Even though it’s a language with relatively few users, it feels robust in a way. The most common implementations also have ways of producing fully self contained static binaries (like Go, say), which makes the thought of distributing software seem reasonable.
quicklisp, a package/library management tool is new (in the last decade or so,) has really raised the level of the ecosystem. It’s not as complete as I’d expect in many ways, but quicklisp changed CL from something quaint to something that you could actually imagine using.
the object system is really nice. There isn’t quite compile time-type checking on the values of slots (attributes) of objects, though you can opt in. My general feeling is that I can pretty easily get the feeling of writing statically typed code with all of the freedom of writing dynamic code.
multiple dispatch, and the conceptual approach to genericism, is amazing and really simplifies flow control in a lot of cases. You implement the methods you need, for the appropriate types/objects and then just write the logic you need, and the function call machinery just does the right thing. There’s surprisingly little conditional logic, as a result.

Things I don’t like:

there are all sorts of things that don’t quite have existing libraries, and so I find myself wanting to do things that require more effort than necessary. This project to write a binary encoding tool would have been a component in service of a much larger project. It’d be nice if you could skip some of the lower level components, or didn’t have your design choices so constrained by gaps in infrastructure.
at the same time, the library ecosystem is pretty fractured and there are common tools around which there aren’t really consensus. Why are there so many half-finished YAML and JSON libraries? There are a bunch of HTTP server (!) implementations, but really you need 2 and not 5?
looping/iteration isn’t intuitive and difficult to get common patterns to work. The answer, in most cases is to use (map) with lambdas rather than loops, in most cases, but there’s this pitfall where you try and use a (loop) and really, that’s rarely the right answer.
implicit returns seem like an over sight, hilariously, Rust also makes this error. Implicit returns also make knowing what type a function or method returns quite hard to reason about.

Writing an Encoder

So the project I wrote was an attempt to write really object oriented code as a way to writing an object encoder to a JSON-like format. Simple enough, I had a good mental model of the format, and my general approach to doing any kind of binary format processing is to just write a crap ton of unit tests and work somewhat iteratively.

I had a lot of fun with the project, and it gave me a bunch of key experiences which make me feel comfortable saying that I’m able to write common lisp even if it’s not a language that I feel maximally comfortable in (yet?). The experiences that really helped included:

producing enough code to really have to think about how packaging and code organization worked. I’d written a function here and there, before, but never something where I needed to really understand and use the library/module/packaging (e.g. systems and libraries.) infrastructure.
writing and running lots of tests. I don’t always follow test-driven development closely, but writing lots of tests is part of my process, and being able to watch the layers of this code come together was a lot of fun and very instructive.
this project for me, was mostly about API design and it was nice to have a project that didn’t require much design in terms of the actual functionality, as object encoding is pretty straight forward.

From an educational perspective all of my goals were achieved.

Failure Mode

The problem is that it didn’t work out, in the final analysis. While the library I constructed was able to encode and decode objects and was internally correct, I never got it to produce encoding that other implementations of the same specification could reliably read, and the ability to read data encoded by other libraries only worked in trivial cases. In the end:

this was mostly a project designed to help me gain competence in a programming language I don’t really know, and in that I was successful.
adding this encoding format isn’t particularly useful to any project I’m thinking of working on in the short term, and doesn’t directly enable me to do anything in particular.
the architecture of the library would not be particularly performant in practice, as the encoding process didn’t deploy a buffer pool of any kind, and it would have been harder than not to back fill that in, and I wasn’t particularly interested in that.
it didn’t work, and the effort to debug the issue would be more substantive than I’m really interested in doing at this point, particularly given the limited utility.

So here we are. Onto a different project!

There's No Full Stack

2020-04-26 – tychoish

Software engineers use terms like “backend” and “frontend” to describe areas of expertice and focus, and the thought is that these terms map roughly onto “underlying systems and business logic” and “user interfaces.” The thought, is that these are different kinds of work and no person can really specialize on “everything.”

But it’s all about perspective. Software is build in layers: there are frontends and backends at almost every level, so the classification easily breaks down if you look at it too hard. It’s also the case that that logical features, from the perspective of the product and user, require the efforts of both disciplines. Often development organizations struggle to hand projects off between groups of front-end and back-end teams.¹

Backend/Frontend is also a poor way to organize work, as often it forces a needless boundary between people and teams wokring on related projects. Backend work (ususally) has to be completed first, and if that slips (or estimation is off) then the front end work has to happen in a crunch. Even if timing goes well, it’s difficult to maintain engineering continuity through the handoff and context is often lost in the process.

In response to splitting projects and teams into front and backend, engineers have developed this idea of “full stack” engineering. This typically means “integrated front end and backend development.” A noble approach: keep the same engineer on the project from start to finish, and avoid an awkward handoff or resetting context halfway through a project. Historic concerns about “front end and backend being in different languages” are reduced both by the advent of back-end javascript, and a realization that programmers often work in multiple languages.

While full stack sounds great, it’s a total lie. First, engineers by and large cannot maintain context on all aspects of a system, so boundaries end up appearing in different places. A full stack engineer might end up writing front end and the APIs on the backed that the front end depends on, but not the application logic that supports the feature. Or an engineer might focus only a very specific set of features, but not be able to branch out very broadly. Second, specialization is important for allowing engineers to focus and be productive, and while context switching projects between engineers, having engineers that must context switch regularly between different disciplines is bad for those engineers. In short you can’t just declare that engineers will be able to do it all.'

Some, particularly larger, teams and prodcuts can get around the issue entirely by dividing ownership and specialization along functional boundaries rather than by engineering discipline, but there can be real technical limitations, and getting a team to move to this kind of ownership model is super difficult. Therefore, I’d propose a different organization or a way of dividing projects and engineering that avoids both “frontend/backend” as well as the idea of “full stack”:

feature or product engineers, that focus on core functionality delivered to users. This includes UI, supporting backend APIs, and core functionality. The users of these teams are the users of the product. These jobs have the best parts of “full stack” type orientation, but draw an effective “lower” boundary of responsibility and allow feature-based specialization.
infrastructure or product platform engineers, that focus on deployment, operations and supporting internal APIs. These teams and engineers should see their users as feature and product engineers. These engineers should fall somewhere between “backend engineers,” and the “devops” and “sre” -type roles of the last decade, and cover the area “above” systems (e.g. not inclusive of machine management and access provisioning,) and below features.

This framework helps teams scale up as needs and requirements change: Feature teams can be divided and parallelized and focus in functionality slices, while, infrastructure teams divide easily into specialties (e.g. networking, storage, databases, internal libraries, queues, etc.) and along service boundaries. Teams are in a better position to handle continuity of projects, and engineers can maintain context and operate using more agile methods. I suspect that, if we look carefully, many organizations and teams have this kind of de facto organization, even if they use different kind of terminology.

Thoughts?

In truth this problem of coordination between frontend and backend teams is really that it forces a waterfall-like coordination between teams, which is always awkward. The problem isn’t that backend engineers can’t write frontend code, but that having different teams requires a handoff that is difficult to manage correctly, and around that handoff processes and management happens. ↩︎

What is it That You Do?

2019-12-24 – tychoish

The longer that I have this job, the more difficult it is to explain what I do. I say, “I’m a programmer,” and you’d think that I write code all day, but that doesn’t map onto what my days look like, and the longer it seems the less code I actually end up writing. I think the complexity of this seemingly simple question grows from the fact that building software involves a lot more than writing code, particularly as projects become more complex.

I’d venture to say that most code is written and maintained by one person, and typically used by a very small number of pepole (often on behalf of many more people,) though this is difficult to quantify. Single maintainer software is still software, and there are lots of interesting problems, but as much as anything else I’m interested in the problems adjacent to multi-author code-bases and multi-operator software development.¹

Fundamentally, I’m interested in the following questions:

How can (sometimes larger) groups of people collaborate to build something that’s bigger than the scope of any of their work?
How can we build software in a way that lets individual developers focus most of the time on the features and concerns that are the most important to them and their users.²

The software development process, regardless of the scope of the problem, has a number of different aspects:

Operations: How does is this software execute and how do we know that its successful when it runs?
Behavior: What does it do, and how do we ensure it has the correct behavior?
Interface: How will users interact with the process, and how do we ensure a consistent experience across versions and users' environment?
Product: Who are the users? What features do they want? Which features are the most important?

Sometimes we can address these questions by writing code, but often there’s a lot of talking to users, other developers, and other people who work in software development organizations (e.g. product managers, support, etc.) not to mention writing a lot of English (documentation, specs, and the like.)

I still don’t think that I’ve successfully answered the framing question, except to paint a large picture of what kinds of work goes into making software, and described some of my specific domain interests. This ends up boiling down to:

I write a lot of documents describing new features and improvements to our software. [product]
I think a lot about how our product works as it grows (scaling), and what kinds of changes we can make now to make that process more smooth. [operations]
How can I help the more junior members of my team focus on the aspects of their jobs that they enjoy the most, and help illustrate broader contexts to them. [mentoring]
How can we take the problems we’re solving today and build the solution that balances the immediate requirements with longer term maintainability and reuse. [operations/infrastructure]

The actual “what” I’m spending my time boils down to reading a bunch of code, meeting with my teamates, meeting with users (who are also coworkers.) And sometimes writing code. If I’m lucky.

I think the single-author and/or single-operator class is super interesting and valuable, particularly because it includes a lot of software outside of the conventional disciplinary boundaries of software and includes things like macros, spreadsheets, small scale database, and IT/operations (“scripting”) work. ↩︎
It’s very easy to spend most of your time as a developer writing infrastructure code of some sort, to address either internal concerns (logging, data management and modeling, integrating with services) or project/process automation (build, test, operations) concerns. Infrastructure isn’t bad, but it isn’t the same as working on product features. ↩︎

Non-Trad Software Engineer

2018-07-30 – tychoish

It happened gradually, and it wasn’t entirely an intentional thing, but at some point I became a software engineer. While a lot of people become software engineers, many of them have formal backgrounds in engineering, or have taken classes or done programs to support this retooling (e.g. bootcamps or programming institutes.)

I skipped that part.

I wrote scripts from time to time for myself, because there were things I wanted to automate. Then I was working as a technical writer and had to read code that other people had written for my job. Somewhere in there I was responsible for managing the publication workflow, and write a couple of build systems.

And then it happened.

I don’t think it’s the kind of thing that is right for everyone, but I was your typical, nerdy/bookish kid who wasn’t great in math class, and I suspect that making software is the kind of thing that a lot of people could do. I don’t think that my experience is particularly replicable, but I have learned a number of useful (and important) things, and I realize as I’ve started writing more about what I’m working on now, I realize that I’ve missed some of the fundamentals¹

Formal education in programming, from what I’ve been able to gather strikes me as really weird: there are sort of two main ways of teaching people about software and computer science: Option one is that you start with a very theoretical background that focuses on data structures, the performance of algorithms, or the internals of how core technologies function (operating systems, compilers, databases, etc.) Option two, is that you spend a lot of time learning about (a) programming language and about how to solve problems using programming.

The first is difficult, because the theory² is not particularly applicable except invery rare cases and only at the highest level which is easy to back-fill as needed. The second is also challenging, as idioms change between languages and most generic programming tasks are easily delegated to libraries. The crucial skill for programming is the ability to learn new languages and solve problems in the context of existing systems, and developing a curriculum to build those skills is hard.

The topics that I’d like to write about include:

Queue behavior, particularly in the context of distributed systems.
Observability/Monitoring and Logging, particularly for reasonable operations at scale.
build systems and build automation.
unit-testing, test automation, and continuous integration.
interface design for users and other programmers.
maintaining and improving legacy systems.

These are, of course, primarily focused on the project of making software rather than computer science or computing in the abstract. I’m particularly interested (practically) in figuring out what kinds of experiences and patterns are important for new programmers to learn, regardless of background.³ I hope you all find it interesting as well!

This is, at least in part, because I mostly didn’t blog very much during this process. Time being finite and all. ↩︎
In practice, theoretical insights come up pretty infrequently and are mostly useful for providing shorthand for characterizing a problem in more abstract terms. Most of the time, you’re better off intuiting things anyway because programming is predominantly a pragmatic exercise. For the exceptions, there are a lot of nerds around (both at most companies and on the internet) who can figure out what the proper name is for a phenomena and then you can look on wikipedia. ↩︎
A significant portion of my day-to-day work recently has involved mentoring new programmers. Some have traditional backgrounds or formal technical education and many don’t. While everyone has something to learn, I often find that because my own background is so atypical it can be hard for me to outline the things that I think are important, and to identify the high level concepts that are important from more specific sets of experiences. ↩︎

Get a Grip

2015-11-01 – tychoish

I made another Go(lang) thing. Grip is a set of logging tools modeled on Go’s standard logging system, with some additional (related) features, including:

level-based logging, with the ability to set a minimum threshold to exclude log messages based on priority (i.e. debugging.)
Error capture/logging, to log Go error objects.
Error aggregation, in continue-on-error situations, where you want to perform a bunch of operations and then return any errors if any of them returned an error but don’t want to return an error after the first operation fails.
Logging to the systemd journal with fallback to standard library logging to standard output.

There are helper functions for logging using different kinds of default string formatting, as well as functions that take error objects, and a “lazy” logging method that take a simple interface for building log messages at log-time rather than at operation time.

None of these features are terribly exciting, and the systemd library wraps the systemd library from CoreOS. I’m a big fan of log levels and priority filtering, so it’s nice to have a tool for that.

In the future, I’d like to add more generic syslog support if that’s useful, and potentially tools for better categorical logging. There’s also a good deal of repeated code and it might be nice to us this as an excuse to write a code-generator using go tool.

Pull requests and feedback are, of course, welcome.

Have a Gimlet: A Go JSON/HTTP API Toolkit

2015-07-26 – tychoish

Look folks, I made a thing!

It’s called gimlet, and it’s a Go(lang) tool for making JSON/HTTP APIs (i.e. REST with JSON). Give it a whirl!

It’s actually even less a tool, and more of a toolkit or just “a place to put all of the annoying infrastructure that you’ll inevitably need when you want to build an JSON/HTTP interface, but that have nothing to do what whatever your API/application does: routing, serializing and serializing JSON.

Nothing hard, nothing terribly interesting, and certainly not anything you couldn’t do another way, but, it’s almost certainly true that this layer of application infrastructure is totally orthogonal to whatever you application is actually doing, so you should focus on that, and probaly use something like Gimliet.

Background

I’m using the term HTTP/JSON APIs for services where you send and recive JSON data over HTTP. Sometimes people call these REST APIs, and that’s not inaccurate, but I think REST is a bit more complicated, and not exactly the core paradigm that I’m pursuing with Gimlet.

Sending and reviving JSON over HTTP makes a lot of sense: there are great tools for parsing JSON and HTTP is a decent high level protocol for interprocess communication between simple data applications. Look up “microservices” at your leisure.

Go is a great language for this it has a lot of tooling that anticipates these kinds of applications, and the deployment model is really friendly to operations teams and systems. Also the static

typing and reasonable separation of private and public interfaces is particularly lovely.

So it should be no surprise that there are a lot tools for building stweb applications, frameworks even. They’re great, things like gorilla and negroni are great and provide a very useful set of tools for building Go web apps. Indeed even Gimlet uses components of each of these tools.

The issue, and reason for Gimlet, is that all of these tools assume that you’re building a web application, with web pages, static resources, form handling, session state handling, and other things that are totally irrelevant to writing JSON/HTTP interfaces.

So then, Gimlet is a tool to build these kinds of APIs: simple, uses Negroni and Gorilla’s mux, and does pretty much everything you need except actually write your code.

Example

Set up the app with some basic configuration: :

import "github.com/tychoish/gimlet"

app := gimlet.NewApp()
app.SetPort(9001)
app.SetDefaultVersion(1)

This sets which port the HTTP server is going to listen for requests and configures the default version of the API. You do want all of your endpoints prefixed with “/v<number>” right? The default version of the API is also avalible without the prefix, or if the version of the route is 0. If you don’t set it to 0.

Then register some routes: :

app.AddRoute("/<path>").Version(<int>).Get().Handler(http.HandlerFunc)
app.AddRoute("/<path>").Version(<int>).Post().Handler(http.HandlerFunc)

app.AddRoute returns an API route object with a set of chainable methods for defining the routes. If you add multiple HTTP methods (GET POST and the like,) then Gimlet automatically defines multiple routes with the same handler for each method.

For handlers, I typically just write functions that take arguments from the top level context (database connections, application configuration, etc) and returnhttp.HandlerFunc objects. For example: :

func helloWorld(config *Configuration) http.HandlerFunc {
     return func(w http.ResponseWriter, r *http.Request) {
          input := make(map[string]interface{})
          response := make(map[string]interface{})            

          err := gimlet.GetJSON(input)

          // do stuff here

          gimlet.WriteJSON(w, response)
     }
}

Gimlet has the following functions that parse JSON out of the body of a request, or add JSON output to the body of a response, they are:

WriteJSONResponse(w http.ResponseWrite, code int, data interface{})
GetJSON(r *http.Request, data interface)

Which read or write data into the interface{} object (typically a struct.) The following three provide consistent response writers for common exit codes:

WriteJSON(w http.ResponseWriter, data interface{}) // 200
WriteErrorJSON(w http.ResponseWriter, data interface{}) // 400
WriteInternalErrorJSON(w http.ResponseWriter, data interface{}) // 500

Finally, when you’ve written your app, kick it all off, with the following: :

err := app.Run()
if err != nil {
   fmt.Println(err)
   os.Exit(1)
}

And that’s it. Enjoy, tell me in the comments or on the issues feed if you find something broken or confusing. Contribution welcome, of course.

Distributed Bug Tracking

2012-04-30 – tychoish

The free software/open source/software development world needs a distributed bug tracking story. Because the current one sucks.

::: {.contents} :::

The State of the Art

There are a number of tools written between 2006 and 2010 or so that provide partial or incomplete solutions to the problem. Almost isn’t quite good enough. The “resources” section of this post, contains an overview of the most important (my judgment,) representatives of the current work in the area with a bit of editorializing.

In general these solutions are good starts, and I think they allow us (or me) a good starting point for thinking about what distributed bug tracking could be like. Someday.

Bug tracking needs are diverse, which creates a signifigant design challenge for any system in this space. There are many existing solutions, that everyone hates, and I suspect most would-be developers and innovators in the space would like to avoid opening this can of worms.

Another factor is that, while most people have come to the conclusion that distributed source control tools are the “serious” contemporary tool for managing source code the benefits of distributed bug tracking hasn’t yet propogated in the same way. Many folks have begun to come to terms with the fact that some amount of tactical centralization is inevitable, required, and even desirable¹ in the context of a issue tracking systems.

Add to this the frequent requirement that non-developer users often need to track and create issues, and the result is that we’ve arrived at something of an impasse.

Requirements

A distributed bug tracking system would need:

A good way to provide short, unique identifiers for individual issues and comments so that users can discuss issues canonically.
An interface contained in a single application, script, or binary, that you could distribute with the application.
A simple/lightweight web-based interface so that users can (at least) review, search, and reference issues from a web browser.

Write access would also be good, but is less critical. Also, it might be more practical (both from a design and a workflow perspective,) to have users submit bugs on the web into a read-only “staging queue,” that developers/administrators would then formally import into the project. This formalizes a certain type of triage approach that many projects may find useful.
To be separable from the source code history, either by using a branch, or by using pre-commit hooks to ensure that you never commit changes to code/content and the bugs at the same time.
To be editable, and to interact with commonly accessible tools that users already use. Email, command line tools, the version control systems, potentially documentation systems, build systems, testing frameworks and so forth.
Built on reliable tools.²
To provide an easy way to customize your “views” on bugs for a particular team or project. In other words, each team can freely decide which extra fields get attached to their bugs, along with which fields are visible by default, which are required, and so on--without interfering with other projects.

The Future of the Art

We (all) need to work on building new and better tools to help solve the distributed issue tracking problem. This will involve:
- learning from the existing attempts,
- continuing to develop and solidify the above requirements,
- (potentially) test and develop a standard (yaml/json?) based data storage format that is easy to parse, and easily merged that multiple tools can use.
- Develop some simple prototype tools, potentially as a suite of related utilities (a la early versions of git.) that facilitate interaction with the git database. With an eye towards flexibility and extensible.
While there are implications for free software hosting as well as vendor independence and network service autonomy (a la `Franklin Street Statement <http://autonomo.us/2008/07/franklin-street-statement/>`_.) I think the primary reason to pursue distributed bug tracking has more to do with productivity and better engineering practices, and less with the policy. In summary:
- Bug database systems that run locally and are fast³ and always available.
- Tools that permit offline interaction with issue database.
- Tools that allow users to connect issues to branches.
- Tools that make it possible to component-ize bug databases in parallel with software

Resources

(With commentary,)

dist-bugs mailing list

This is the canonical source for discussion around distributed bug tracking.
Bugs Everywhere

This is among the most well developed solution speaking holistically. “be” is written in Python, can generate output for the web. It uses its own data format, and has a pretty good command line tool. The HTML output generate is probably not very fast at scale (none are,) but I have not tested it.
Ditz

Ditz is a very well developed solution. Ditz: implemented in Ruby, has a web interface, has a command line tool, uses a basic YAML data format, and stores data in branch. Current development is slow, getting it up and running is non-trivial, and my sense is that there isn’t a very active community of contributors. There are reasons for this, likely but they are beyond the scope of this overview.
pitz

Pitz is a Python re-implementation of Ditz, and while the developer(s?) have produced a “release,” the “interface” is a Python shell, and to interact with the database you have to, basically write commands in Python syntax. From a data perspective, however, Pitz, like Ditz is quite developed. Pitz while it stores data in-tree, I think it’s important source of ideas/examples/scaffolding.
Artemis

This is a really clever solution that uses Maildirs to store issues. As a result you can interact with and integrate Artimis issues with your existing email client. Pull down changes, and see new bugs in your email, without any complicated email and list server setups.

The huge caveat is that it’s implemented as a plugin for Mercurial, and so can’t be used with git projects. Also, all data resides in the tree.
git-issues

In most ways, git-issues is my favorite: it’s two Python files, 1700 lines of code, stores issues outside of the source branch, and has a good command line interface. On the downside, it uses XML (which shouldn’t matter, but I think probably does, at least in terms of attracting developers,) and doesn’t have a web-based interface. It’s also currently un-maintained.
Prophet/sd

SD, which is based on a distributed database named Prophet, is a great solution. The primary issue is that it’s currently unmentioned and is not as feature complete as it should be. Also a lot of SD focuses on synchronizing with existing centralized issue trackers, potentially at the expense of developing other tools.

It seems that you want centralized issue databases, or at least the fact that centralized issue databases appear canonical is a major selling point for issue tracking software in general. Otherwise, everyone would have their own text file with a bunch of issues, and that would suck. ↩︎
Because I don’t program (much) and it’s easy to criticize architectural decisions from afar, I don’t want to explicitly say “we need to write this in Python for portability reasons” or something that would be similarly unfounded. At the same time, adoption and ease of use is crucial here, both for developers and users. Java and Ruby (and maybe Perl,) for various reasons, add friction to the adoption possibilities. ↩︎
“Is Jira/Bugzilla/etc. slow for you today?” ↩︎

Why Common Lisp?#

Writing an Encoder#

Failure Mode#

Background#

Example#

The State of the Art#

Requirements#

The Future of the Art#

Resources#