Three Way Merge Script

2013-01-05 – tychoish

Note: This is an old post about a script I wrote a few months ago about a piece of code that I’m no longer (really) using. I present it here as an archival piece with a boatload of caveats. Enjoy!

I have a problem that I think is not terribly unique: I have a directory of files and I want to maintain two distinct copies of these files at once, and I want a tool that looks at both directories and makes sure they’re up to date. That’s all. Turns out nothing does exactly that, so I wrote a hacked up shell script, and you can get it from the code section:

merge-script

I hope you enjoy!

Background

You might say, “why not just use git to take care of this,” which is fair. The truth is that I don’t really care about the histories as long as there’s revision. Here’s the situation:

I keep a personal ikiwiki instance for all of my notes, tasks, and project stuff. There’s nothing revolutionary, and I even use deft, dired, and some hacked up lisp to do most of the work. But I also work on a lot of projects that have their own git repositories and I want to be able to track the notes of some of those files in those repositories as well.

Conflicts.

There are some possible solutions:

1. Use hard links so that both files will point at the same data on disk.

Great idea, but it breaks on multiple systems. Even if it might have worked in this case, it freight ens me to have such fragile systems.

Note: the more I play with this, the less suitable I think that it might be for multi system use. If one or both of the sides is in a git repo, and you make changes locally and then pull changes in from a git upstream, the git files, may look newer than the files that you changed. A flaw.

2. Only edit files in one repository or the other, and have a pre-commit hook, or similar, that copies data from the new system to the old system.

I rejected this because I thought I’d have a hard time enforcing this behavior.

3. Write a script that uses some diff3 to merge (potential) changes from both sources of changes.

This is what I did.

The script actually uses the merge command which is a wrapper around diff3 from rcs. shrug.

Beyond my somewhat trivial and weird use-case, I actually think that this script is more useful for the following situation:

You use services like Dropbox as a way of getting data onto mobile devices (say,) but you want the canonical version of the file to live in a git repository on your system.

This is the script for you.

I hope you enjoy it!

Today's Bottleneck

2013-01-04 – tychoish

Computers are always getting faster. From the perspective of the casual observer it may seem like every year all of the various specs keep going up, and systems are faster.¹ In truth, progress isn’t uniform across all systems and subsystems, and thinking about this progression of technology gives us a chance to think about the constraints that developers² and other people who build technology face.

For most of the past year, I’ve used a single laptop, for all of my computing work, and while it’s been great, in this time I lost touch with the comparative speed of systems. No great loss, but I found myself surprised to learn that all computers did not have the same speed: It wasn’t until I started using other machines on a regular basis that I remembered that hardware could affect performance.

For most of the past decade, processors have been fast. While some processors are theoretically faster and some have other features like virtualization extensions and better multitasking capacities (i.e. hyperthreading and multi-core systems) the improvements have been incremental at best.

Memory (RAM) manages to mostly keep up with the processors, so there’s no real bottleneck between RAM and the processor. Although RAM capacities are growing, at current volumes extra RAM just means services/systems that had to be distributed given RAM density can all run on one server. In general: “ho hum.”

Disks are another story all together.

While disks got faster over this period, they didn’t get much faster during this period, and so for a long time disks were the bottle neck in computing speed. To address this problem, a number of things changed:

We designed systems for asynchronous operation.

Basically, folks spilled a lot of blood and energy to make sure that systems could continue to do work while waiting for the disk to reading or writing data. This involves using a lot of event loops, queuing systems, and so forth.

These systems are really cool, the only problem is that it means that we have to be smarter about some aspects of software design and deployment. This doesn’t fix the tons of legacy sitting around, or the fact that a lot of tools and programmers are struggling to keep up.

We started to build more distributed systems so that any individual spinning disk is responsible for writing/reading less data.
We hacked disks themselves to get better performance.

There are some ways you can eek out a bit of extra performance from spinning disks: namely RAID-10, hardware RAID controllers, and using smaller platters. RAID approaches use multiple drives (4) to provide simple redundancy and roughly double performance. Smaller platters require less movement of the disk arm, and you get a bit more out of the hardware.

Now, with affordable solid state disks (SSDs,) all of these disk related speed problems are basically moot. So what are the next bottlenecks for computers and performance:
Processors. It might be the case that processors are going to be the slow to develop bottleneck. There are a lot of expectations on processors these days: high speed, low power consumption, low temperature, high amount of parallelism (cores and hyperthreading.) But these expectations are necessarily conflicting.

The main route to innovation is to make the processors themselves smaller, which does increase performance and helps control heat and power consumption, but there is a practical limit to the size of a processor.

Also, no matter how fast you make the processor, it’s irrelevant unless the software is capable of taking advantage of the feature.
Software.

We’re still not great at building software with asynchronous components. “Non-blocking” systems do make it easier to have systems that work better with slower disks. Still, we don’t have a lot of software that does a great job of using the parallelism of a processor, so it’s possible to get some operations that are slow and will remain slow because a single threaded process must grind through a long task and can’t share it.
Network overhead.

While I think better software is a huge problem, network throughput could be a huge issue. The internet endpoints (your connection) has gotten much faster in the past few years. That’s a good thing, indeed, but there are a number of problems:
Transfer speeds aren’t keeping up with data growth or data storage, and if that trend continues, we’re going to end up with a lot of data that only exists in one physical location, which leads to catastrophic data loss.

I think we’ll get back to a point where moving physical media around will begin to make sense. Again.
Wireless data speeds and architectures (particularly 802.11x, but also wide area wireless,) have become ubiquitous, but aren’t really sufficient for serious use. The fact that our homes, public places, and even offices (in some cases) aren’t wired correctly to be able to provide opportunities to plug in will begin to hurt.

Thoughts? Other bottlenecks? Different reading of the history?

By contrast, software seems like its always getting slower, and while this is partially true, there are additional factors at play, including feature growth, programmer efficiency, and legacy support requirements. ↩︎
Because developers control, at least to some extent, how everyone uses and understands technology, the constrains on the way they use computers id important to everyone. ↩︎

Github without Github

2013-01-02 – tychoish

Github is great, and I think they’ve done a lot--for the better--to change and shape the way that everyone uses and does really awesome things with git.

But I worry about lock-in, I worry about having a project that relies on some feature of github that can’t be easily accomplished on another platform.

This post is an index of “git ecosystem” tools that let you get something that looks a bit like github on your own servers. Feel free to edit this page (it’s a wiki!) if you have other tools you like or can recommend!

Permissions Control

Wiki / Pages

Github has a wiki system that’s open source. I’ve never played around with it, because there’s ikiwiki which is better anyway.

Their page’s functionally is also open source, it’s Jekyll there’s no particular shortage of programs that do this kind of thing, and most aren’t that good but that’s an orthogonal point.

Hosted Solutions

There are about a million different repository viewers, but the magic of the github website is that there’s a lot of other integrated functionality (bug tracking, merge request queues, automatic forking/branching/etc.

gitorious - functional but inelegant.
gitlab - promising but untested.
repo.or.cz - functional but not practical for casual administration.

Web hooks

I’ve only recently found notify-webhook, but it basically implements something like github’s service hooks, as a traditional git post-recieve hook.

The Rest

There’s no great stand alone merge (pull) request system. Other code review tools are uneven, but the truth is that pull requests are not a sufficiently advanced code review tool.

Patchwork might work, but its a bit rustic for contemporary workflows.

Integrated issue tracking, hell any kind of issue tracking, remains an unsolved problem, but I think github’s approach is a good start, and that feature set isn’t as easily available from other projects/products/tools.

Knitting Shopping

2013-01-02 – tychoish

… and planning

I did a little bit of holiday knitting shopping. Given how infrequently I buy yarn and knitting things and the fact that shopping for knitting things correlates strongly with my project planning, it seems worth sharing:

I got a cone of merino/tencel lace weight yarn in a steel blue color to knit a long plain tube to wear as a neck tube/scarf thing. I bought one of these a few months ago knit out of a jersey tencel knit, and I adore it, so it makes sense to knit something similar.

Hopefully knitting these scarves will prove successful and useful. I’m not much of a knitted sock wearer, I find most flat scarves dreadful to knit, I find shawls difficult to pull off, and I enjoy knitted hats but don’t find them windproof enough for common use. Having good, small, lightweight, and plain knitting projects would probably be very good thing indeed.
I bought a couple of carbon fiber knitting needles, in sizes 2.5 (the size that my sweaters have been and will be for a little while,) and 0s (for the scarfs and hem facings as needed.)

I’m a chronic needle bender and like sharp points and reasonably slippery needles. I also have a set of carbon fiber needles for socks which are great. Very much looking forward to trying these out.
Finally, I’ve procured a few cones of HD Shetland to complement some of my left overs. On the sweater queue:
- a few cardigans. One for my mother, and a second one for me?
- maybe something with shoulders/sleeves in a different color? I’ve used this kind of shading on otherwise plain sweaters, but it seems interesting to see how it might look on a two color sweater.

Sweater Evolutions

2013-01-02 – tychoish

I’ve been knitting! Here’s an update:

I finished a sweater. Still need to block it, but it looks great so far. It’s a (near) duplicate of a sweater that I made a few years ago in blues. The biggest difference in construction is that I did the hem in a very slightly different way. Other than that, it’s very much the exemplar of “the default tychoish sweater.”

Note to self, it would be good to have a version of this sweater in brown.
I started another sweater. This one is a medium gray and light blue-gray as a cardigan. Rather than do hems, the idea with this is:
- purl when you can and want to border.
  
  This is an Elizabeth Zimmerman and Meg Swansen technique for colorwork to avoid ribbing or hems. If you purl occasionally in the first few inches, you can prevent rolling. Seems to work well enough, and it makes the bottom edge less bulky and more integrated into the sweater.
- Crocheted front steek with knitted cord (i-cord) edge, using the steek as facing.
  
  Another Meg Swansen technique where you use the steek as facing, by crocheting along the edges and then knitting an i-cord to cause the steak to “fold” under and act as a facing. Blocking does the rest. Again, the end result is lightweight, flexible, and easy to handle.
I have about five inches done, and I expect slow but steady progress on this over the next few months.

Documentation Maximalism

2013-01-01 – tychoish

You may hear people, particularly people who don’t like to write documentation, something like:

Users need minimalist documentation that only answers their questions, and there’s no point in overwhelming users with bloated, maximalist documentation that they’ll never read.

Which sounds great, but doesn’t reflect reality or best practice. Consider the following:

Documentation is as much for the producers of the software as it is for the users. Having extensive documentation contributes too, and reflects a sane design process. Collecting and curating the documentation helps ensure that the software is usable and knowable.
Having complete documentation reduces support costs, both by reducing the volume of support requests and by lowering the complexity of the work associated with support.
Good extensive documentation drives adoption of software. Products with better documentation will always see better adoption than comparable products with worse documentation.

Without users, software is useless.

Does this mean that you shouldn’t value minimalism, particularly conceptual minimalism, and visual minimalism? Does this mean you should avoid customizing the documentation to fit the needs and patterns of your users?

No, of course not.

But minimalism for the sake of minimalism, without a particular strategy is an awful and ill-gotten ideology.

Even if it looks good, and particularly if it sounds good.

Information Debts

2012-12-18 – Kleinman

Like technical debt, information debt is a huge problem for all kinds of organizations, and one that all technical writers need to be aware and able to combat directly. Let’s backup a little…

Information debt is what happens when there aren’t proper systems, tools, and processes in place to maintain and create high quality information resources. A number of unfortunate and expensive things result:

People spend time recreating documents, pages, and research that already exists. This is incredibly innefficent and leads to:
Inaccurate information propagates throughout the organization and to the public.
Information and style “drifts,” when information and facts exist in many places.
Organizations spend more money on infrastructure and tools as a band-aid when data is poorly organized.
People lose confidence in information resources and stop relying on them, preferring to ask other people for information. This increases the communication overhead, noise level, and takes longer for everyone, than using a good resource.

To help resolve information debt:

Dedicate resources to paying back information debts. It takes time to build really good resources, to collect and consolidate information, and to keep them up to date. But given the costs of the debt, it’s often worth it.
Documents must be “living,” usefully versioned, and there must be a process for updating documents. Furthermore, while it doesn’t make sense to actually limit editing privileges, it’s important that responsibility for editing and maintaining documents isn’t diffused and thus neglected.
Information resources, must have an “owner” within an organization or group who is responsible for keeping it up to date, and making sure that people know it exists. You can have the best repository for facts, if no one uses it and the documents are not up to date, it’s worthless.
Minimize the number of information resources. While it doesn’t always make sense to keep all information in the same resource or system, the more “silos” where a piece of information or document might live the less likely a reader/user will find it.

… and more.

I’m working on adding a lot of writing on information debt in the technical writing section of the wiki. I’ll blog more about this, while I continue to work through some of these ideas, but I’m quite interested in hearing your thoughts on this post and on the information-debt pages as well.

Onward and Upward!

Novel Automation

2012-12-11 – tychoish

This post is a follow up to the interlude in the /posts/programming-tutorials post, which part of an ongoing series of posts on programmer training and related issues in technological literacy and education.

In short, creating novel automations is hard. The process would have to look something like:

Realize that you have an unfulfilled software need.
Decide what the proper solution to that need is. Make sure the solution is sufficiently flexible to be able to support all required complexity.
Then sit down, open an empty buffer and begin writing code.

Not easy.¹

Something I’ve learned in the past few years is that the above process is relatively uncommon for actual working programmers: most of the time you’re adding a few lines here and there, testing various changes or adding small features built upon other existing systems and features.

If this is how programming work is actually done, then the kinds of methods we use to teach programmers how to program should hold some resemblance to the actual work that programmers do. As an attempt at a case study, my own recent experience:

I’ve been playing with Buildbot for a few weeks now for personal curiosity, and it may be useful to automate some stuff for the Cyborg Institute. Buildbot has its merits and frustrations, but this post isn’t really about buildbot. Rather, the experience of doing buildbot work has taught me something about programming and about “building things,” including:

When you set up buildbot, it generates a python configuration file where all buildbot configuration and “programming” goes.

As a bit of a sidebar, I’ve been using a base configuration derived from the buildbot configuration for buildbot itself, and the fact that the default configuration is less clean and a big and I’d assumed that I was configuring a buildbot in the “normal way.”

Turns out I haven’t, and this hurts my (larger) argument slightly.

I like the idea of having a very programmatic interface for systems that must integrate with other components, and I really like the idea of a system that produces a good starting template. I’m not sure what this does for overall maintainability in the long term, but it makes getting started and using the software in a meaningful way, much more possible.
Using organizing my buildbot configuration as I have, modeled on the “metabuildbot,” has nicely illustrated the idea software is just a collection of modules that interact with each other in a defined way. Nothing more, nothing less.
Distributed systems are incredibly difficult to get people to conceptualize properly, for anyone, and I think most of the frustration with buildbot stems from this.
Buildbot provides an immediate object lesson on the trade-offs between simplicity and terseness on the one hand and maintainability and complexity on the other.

This point relates to the previous one. Because distributed systems are hard, it’s easy to configure something that’s too complex and that isn’t what you want at all in your Buildbot before you realize that what you actually need is something else entirely.

This doesn’t mean that there aren’t nightmarish Buildbot configs, and there are, but the lesson is quite valuable.
There’s something interesting and instructive in the way that Buildbot’s user experience lies somewhere between “an application,” that you install and use, and a program that you write using a toolkit.

It’s clearly not exactly either, and both at the same time.

I suspect some web-programming systems may be similar, but I have relatively little experience with systems like these. And frankly, I have little need for these kinds of systems in any of my current projects.

Thoughts?

Indeed this may be why the incidence of people writing code, getting it working and then rewrite it from the ground up: writing things from scratch is an objectively hard thing, where rewriting and iterating is considerably easier. And the end result is often, but not always better. ↩︎

Background#

Permissions Control#

Wiki / Pages#

Hosted Solutions#

Web hooks#

The Rest#

Background

Permissions Control

Wiki / Pages

Hosted Solutions

Web hooks

The Rest