The Most Forgotten CI Feature
Developers love to complain about their tools, particularly continuous integration (CI) software, everyone has a pet idea of how to make it faster to run or make the results easier to interpret, or more useful. They're all wrong. [1]
I spent a few years as the, I guess tech lead(?), [2] of a team that was developing a CI tool/platform. It was a cool project, big scale (IO-bound workloads! Cloud!) in an organization that grew rapidly and was exactly the right age for the organization to believe in CI, but also predate the maturity of really mature off-the-shelf CI systems (e.g. Github Actions.)
We did a lot of cool things, but I think the most useful thing we ever built was a notification system that let people know when their tests were done running.
That's it.
See, people were always asking "I want my test to run in 5 minutes, or 10 minutes," so that I don't lose too much time waiting for the results and I can avoid getting distracted or losing focus. You could spend a lot of time making things faster, and in some cases this is a great idea: slow things are bad, compute time is expensive (particularly in aggregate), and for trivial things you do want a fast response time.
The problem is really that sometimes things can't be made all that much faster without an exceptional amount of effort, and while compute time is expensive sometimes you end up spending significantly more on faster machines or more machinesfor increased parallelism, which can result in for only modest gains.
This of course misses the point: human attention spans are incredibly fickle and while really well focused and disciplined engineers might be able to wait for 5 minutes, anything longer than that and most people will have moved on and at that point it might as well taken an hour. While there are some upper bounds and pragmatic aspects on this just because if a build takes 2 hours (say) a developer can only really has time to try 1 or 2 things out in a given work day, but it does mean that execution times between a minute and about 20 minutes are functionally the same.
So, just notify people when their build is done. Don't beat distraction by being really fast, beat distraction by interrupting the distraction. People don't need to spend their day looking at a dashboard if the dashboard tells them (gently) to come back when their task is done.
Why doesn't every CI tool do this? It might be the feature that every developer wants, and yet, there's no really good "tell me when my build is done," feature in any of the popular tools. It's a hard feature to get right, and there are a lot of tricky bits:
- People generally don't want to get emails; emails would be easy, but they're not a good way to send a quick--largely ephemeral--reminder. While you can pull emails out of commits (which isn't a great strategy most of the time,) or (presumably) usernames if you're on a platform like GitHub, there are some important user settings that you have to store somewhere.
- Who to notify for a particular build is a little hard, it turns out. People often want to opt into notifications and be able to only receive notifications that are important to them. Sometimes the person who wants the notification isn't the set of the authors of a commit, or the user that submitted the branch/pull request.
- When to send a notification isn't clear either. Whenall tasks in the build complete? What if some tasks are blocked from starting because one thing failed? Is "do what I mean" notifications something like "notify me on the first failure, or when all (runable) tasks succeed"? If a task fails (and generates a notification,) and then a task is restarted and then the build goes on to succeed do you send a second notification (probably?) Not only are these hard questions, but different teams might want different semantics.
It's one of those things that seems simple, but there's just enough complexity that it's hard to build and hard to get right. Easier, a bit to do when all of your CI platform is developed in house, but only a bit, and (probably for the better) there aren't many specific tasks Anyway, I hope someone builds something like this, I'd certainly use it.
[1] | Well not all of them. The problem is that developers are very likely to get stuck spinning in weird local optimizations and it's really hard to think about CI features as a user. Developing CI software is also a fun ride because all your users are also engineers, which is a tough crowd. |
[2] | In retrospect I realize I was doing tech-lead things, but my title and how we talked about I what I was doing at the time wasn't indicative of this reality. Ah well. |