Git Sync

With the new laptop, I once again have more than one computer, and with it a need to synchronize the current state of my work. This is a crucial function, and pretty difficult to do right: I’ve had multiple systems before and condensed everything into one laptop because I didn’t want to deal with the headache of sitting down in front of a computer and cursing the fact that the one thing that I needed to work on was stuck somewhere that I couldn’t get to.

I store most of my files and projects in git version control repositories for a number of reasons, though the fact that this enables a pretty natural backup and synchronization system, was a significant inspiration for working this way. But capability is more than a stones throw from working implementation. The last time I tried to manage using more than one computer on a regular basis, I thought “oh, it’s all in git, I’ll just push and pull those repositories around and it’ll be dandy.” It wasn’t. The problem is, if you keep different projects in different repositories (as you should, when using git,) remembering to commit and push all repositories before moving between computers is a headache.

In the end synchronization is a rote task, and it seems like the kind of thing that was worth automating. There are a number of different approaches to this and what I’ve done is some very basic bash/zsh script1 that takes care of all of this syncing process. I call it “git sync,” you may use all or some of this as you see fit.

git sync lib

The first piece of the puzzle is a few variables and functions. I decided to store this in multiple files for two reasons: First, I wanted access to the plain functions in the shell. Second, I wanted the ability to roll per-machine configurations using the components described within. Consider the source.

The only really complex assumption here is that, given a number git repositories, there are: some that you want to commit and publish changes too regularly and automatically, some that you want to fetch new updates for regularly but don’t want to commit, and a bunch that you want to monitor but probably want to interact with manually. In my case: I want to monitor a large list of repositories, automatically fetch changes from a subset of those repositories, and automatically publish changes changes to a subset of the previous set.

Insert the following line into your .zshrc:

source /path/to/git-sync-lib

Then configure the beginning of the git-sync-lib file with references to your git repositories. When complete, you will have access to the following functions in your shell: gss (provides a system-wide git status,) autoci (automatically pulls new content and commits local changes to the appropriate repository,) and syncup (pulls new content from the repositories and publishes any committed changes.

syncup and autoci do their work in a pretty straightforward for [...] done loop, which is great, unless you need some repositories to only publish in some situations (i.e. when you’re connected to a specific VPN.) You can modify this section to account for this case, take the following basic form:

syncup(){

   CURRENT=`pwd`

   for repo in $force_sync_repo; do
       cd $repo;

       echo -- syncing $repo
       git pull -q
       git push -q

   done
   cd $CURRENT

}

Simply insert some logic into the `for`` loop, like so:

for repo in $force_sync_repo; do
   cd $repo;
   if [ $repo = ~/work ]; then
      if [ `netcfg current | grep -c "vpn"` = "1" ]; then
          echo -- syncing $repo on work vpn
          git pull -q
          git push -q dev internal
      else
         echo -- $repo skipped because lacking vpn connection
      fi
   elif [ $repo = ~/personal ]; then
       if [ `netcfg current | grep -c "homevpn"` = "1" ]; then
          echo -- syncing $repo with homevpn
          git pull -q
          git push -q
       else
          echo -- $repo skipped because lacking homevpn connection
       fi
   else
      echo -- syncing $repo
      git pull -q
      git push -q
   fi
done

Basically, for two repositories we test to make sure that a particular network profile is connected before operating on those repositories. All other operations are as in the first example. I use the output of “netcfg current”, which is an ArchLinux network configuration tool that I use. You will need to use another test, if you are not using Arch Linux.

git sync

You can use the functions provided by the “library” and skip this part if you don’t need to automate your backup and syncing process. The whole point of this project was specifically to automate this kind of thing, so this--though short--is kind of the cool part. You can download git sync here.

Put this script in your $PATH, (e.g. “/usr/bin” or “/usr/bin/local”; I keep a “~/bin” directory for personal scripts like this in my path, and you might enjoy.) You will then have access to the following commands at any shell prompt:

git-sync backup
git-sync half
git-sync full

Backup calls a function in git-sync to backup some site-specific files to a git repository (e.g. crontabs, etc.) The half sync only downloads new changes, and is meant to run silently on a regular interval: I cron this every five minutes. The full sync runs the backup, commits local changes, downloads new changes, and sends me an xmpp message to log when it finishes successfully: I run this a couple of times an hour. But there’s an exception: if the laptop isn’t connected to a Wifi or ethernet network, then it skips sync options. If you’re offline, you’re not syncing. If you’re connected on 3g tethering, you’re not syncing.

That’s it! Feedback is of course welcome, and if anyone wants these files in their own git repository so they can modify and hack them up, I’m more than willing to provide that, just ask.

Onward and Upward!


  1. I wrote this as a bash script but discovered that something with the way I was handling arrays was apparently a zsh-ism. Not a big fuss for me, because I use zsh on all my machines, but if you don’t use zsh or don’t have it installed, you’ll need to modify something in the array or install zsh (which you might enjoy anyway.) ↩︎

Jekyll Publishing

I wrote about my efforts to automate my publishing workflow a couple of weeks ago, (egad!) and I wanted to follow that up with a somewhat more useful elucidation of how all of the gears work around here.

At first I had this horrible scheme setup that dependent on regular builds triggered by cron, which is a functional, if inelegant solution. There’s a lot of tasks that you can give the appearance of “real time,” responsiveness by scheduling more brute tasks regularly enough. The truth is, however, that its not quite the same, and I knew that there was a better way.

Basically the “right way” to solve this problem is to use the “hooks” provided by the git repositories that I use to store the source of the website. Hooks, in this context refer to a number of scripts which are optionally run before or after various operations in the repositories that allow you to attach actions to the operations you perform on your git repositories. In effect, you can say “when I git push do these other things” or “before I git commit check for these conditions, and if they’re not met, reject the commit” and so forth. The possibilities can be a bit staggering.

In this case what happen is: I commit to the tychoish.com repositories a script that synchronizes the appropriate local packages runs and publishes changes to the server. It then sends me an xmpp message saying that this operation is in progress. This runs as the post-commit hook, and for smaller sites could simply be “git push origin master”. Because tychoish is a large site, and I don’t want to be rebuilding it constantly, I do the following:

#!/bin/bash

# This script is meant to be run in a cron job to perform a rebuilding
# of the slim contents of a jekyll site.
#
# This script can be run several times an hour to greatly simplify the
# publishing routine of a jekyll site.

cd ~/sites/tychoish.com/

# Saving and Fetching Remote Updates from tychoish.com
git pull >/dev/null &&

# Local Adding and Committing
git checkout master >/dev/null 2>&1
git add .
git commit -a -q -m "$HOSTNAME: changes prior to an  slim rebuild"  >/dev/null 2>

# Local "full-build" Branch Mangling
git checkout full-build >/dev/null 2>&1 &&
git merge master &&

# Local "slim-bild" Branch Magling and Publishing
git checkout slim-build >/dev/null 2>&1 &&
git merge master &&
git checkout master >/dev/null 2>&1 &
git push --all

# echo done

Then on the server, once the copy of the repo on the server is current with the changes published to it (i.e. the post-update hook), the following code is run:

#!/bin/bash
#
# An example hook script to prepare a packed repository for use over
# dumb transports.
#
# To enable this hook, make this file executable by "chmod +x post-update".

unset GIT_DIR
unset GIT_WORKING_TREE

export GIT_DIR
export GIT_WORKING_TREE

cd /path/to/build/tychoish.com
git pull origin;

/path/to/scripts/jekyll-rebuild-tychoish-auto-slim &

exit

When the post-update hook runs, in runs in the context of the repository that you just pushed to, and unless you do the magic (technical term, it seems) the GIT_DIR and GIT_WORKING_TREE variables are stuck in the environment and the commands you run fail. So basically this is a fancy git pull, in a third repository (the one that the site is built from.) The script jekyll-rebuild-tychoish-auto-slim looks like this:

#!/bin/bash
# to be run on the server

# setting the variables
SRCDIR=/path/to/build/tychoish.com/
DSTDIR=/path/to/public/tychoish/
SITENAME=tychoish
BUILDTYPE=slim
DEFAULTBUILD=slim

build-site(){
 cd ${SRCDIR}
 git checkout ${BUILDTYPE}-build >/dev/null 2>&1
 git pull source >/dev/null 2>&1

 /var/lib/gems/1.8/bin/jekyll ${SRCDIR} ${DSTDIR} >/dev/null 2>&1
 echo \<jekyll\> completed \*${BUILDTYPE}\* build of ${SITENAME} | xmppipe garen@tychoish.com

 git checkout ${DEFAULTBUILD}-build >/dev/null 2>&1
}

build-site;

This sends me an xmpp message when the build has completed. And does the needful site rebuilding. The xmppipe command I use is really the following script:

#!/usr/bin/perl
# pipes standard in to an xmpp message, sent to the JIDs on the commandline
#
# usage: bash$ `echo "message body" | xmppipe garen@tychoish.com
#
# code shamelessly stolen from:
# http://stackoverflow.com/questions/170503/commandline-jabber-client/170564#170564

use strict;
use warnings;

use Net::Jabber qw(Client);

my $server = "tychoish.com";
my $port = "5222";
my $username = "bot";
my $password = ";
my $resource = "xmppipe";
my @recipients = @ARGV;

my $clnt = new Net::Jabber::Client;

my $status = $clnt->Connect(hostname=>$server, port=>$port);

if (!defined($status)) {
  die "Jabber connect error ($!)\n";
}
my @result = $clnt->AuthSend(username=>$username,
password=>$password,
resource=>$resource);

if ($result[0] ne "ok") {
  die "Jabber auth error: @result\n";
}

my $body = '';
while (<STDIN>) {
  $body .= $_;
}
chomp($body);

foreach my $to (@recipients) {
 $clnt->MessageSend(to=>$to,
 subject=>",
 body=>$body,
 type=>"chat",
 priority=>10);
}

$clnt->Disconnect();

Mark the above as executable and put it in your path somewhere. You’ll want to install the Net::Jabber Perl module, if you haven’t already.

The one final note. If you’re using a tool like gitosis to manage your git repositories, all of the hooks will be executed by the gitosis user. This means that this user will need to have write access the “build” copy of the repository and the public directory as well. You may be able to finesse this with the +s “switch uid” bit, or some clever use of the gitosis user group.

The End.