Key Git Concepts

Git is a very... different kind of software. It's explicitly designed against the paradigm for other programs like it (version control/source management) and to make maters worse most of it's innovations and eccentricities are very difficult to create metaphors and analogies around. This is likely because it takes a non-proscriptive approach to workflow (you can work with your collaborators in any way that makes sense for you) and more importantly it lets people do away with linearity. Git makes it possible, and perhaps even encourages, creators to give up an idea of a singular or linear authorship process.

That sounds great (or is at least passable) in conversation but it is practically unclear. But even when you sit down and can interact with a "real" git repository, it can still be fairly difficult to "grok." And to make matter worse, there are a number of very key concepts that regular users of git acclimate to but that are still difficult to grasp from the ousted. This post, then, attempts to illuminate a couple of these concepts more practically in hopes of making future interactions with git less painful. Consider the following:

The Staging Area

The state of every committed object (i.e. file) as of the last commit is the HEAD. Every commit has a unique identifying hash that you can see when you run git log.

The working tree, or checkout, is the files you interact with inside of the local repository. You can checkout different branches, so that you're not working in the "master" (default or trunk) branch of the repository, which is mostly an issue when collaborating with other people.

If you want to commit something to the repository, it must first be "staged" or added with the git add command. Use git status to see what files are staged and what files are not staged. The output of git diff generates the difference between the HEAD plus all staged changes, and all unstaged changes. To see the difference between all staged changes and HEAD use the "git diff --cached".

The staging area makes it possible to construct commits in very granular sorts of ways. The staging area makes it possible to use commits, less like "snapshots" of the entire tree of a repository, and rather as discrete objects with that contain a single atomic change set. This relationship to commits is enhanced by the ability to do "squash merges" and squash a series of commits in a rebase, but it starts with the staging area.

If you've staged files incorrectly you can use the git reset command to reset this process. Used alone, reset is a non destructive operation.

Branches

The ability to work effectively in branches is the fundamental function of git, and probably also the easiest to be confused by. A branch in git, fundamentally, is just a different HEAD in the same repository. Branches within a single repository allow you to work on specific sets of changes (e.g. "topics") and track other people's changes, without needing to make modifications to the "master" or default branch of the repository.

The major confusion of branches springs from git's ability to treat every branch of every potentially related repository as a branch of each other. Therefore it's possible to push to and pull from multiple remote branches from a single remote repository and to push to and pull from multiple repositories. Ignore this for a moment (or several) and remember:

A branch just means your local repository has more than one "HEAD" against which you can create commits and "diff" your working checkout. When something happens in one of these branches that's worth saving or preserving or sharing, you can either publish this branch or merge it into the "master" branch, and publishes these changes.

The goal of git is to construct a sequence of commits that represent the progress of a project. Branches are a tool that allow you to isolate changes within tree's until you're ready to merge them together. When the differences between HEAD and your working copy becomes to difficult to manage using git add and git reset, create a branch and go from there.

Rebase

Rebasing git repositories is scary, because the operation forces you to rewrite the history of a repository to "cherry pick" and reorder commits in a way leads to a useful progression and collection of atomic moments in a project's history. As opposed to the tools that git replaces, "the git way" suggests that one ought to "commit often" because all commits are local operations, and this makes it possible to use the commit history to facilitate experimentation and very small change sets that the author of a body of code (or text!) can revert or amend over time.

Rebasing, allows you to take the past history objects, presumably created frequently during the process of working (i.e. to save a current state) and compress this history into a set of changes (patches) that reflect a more usable history once the focus of work has moved on. I've read and heard objects to git on the basis that it allows developers to "rewrite history," and individuals shouldn't be able to perform destructive operations on the history of a repository. The answer to this is twofold:

  • Git, and version control isn't necessarily supposed to provide an consistently reliable history of a projects code. It's supposed to manage the code, and provide useful tools to managing and using the history of a project. Because of the way the staging area works, sometimes commits are made out of order or a "logical history object" is split into two actual objects. Rebasing makes these non-issues.
  • Features like rebasing are really intended to happen before commits are published, in most cases. Developers will make a series of commits and then, while still working locally, rebase the repository to build a useful history and then publish those changes to his collaborators. So it's not so much that rebasing allows or encourages revisionist histories, but that it allows developers to control the state of their present or the relative near future.

Bonus: The git stash

The git stash isn't so much a concept that's difficult to grasp, but a tool for interacting with the features describe above that is pretty easy to get. Imagine one of the following cases:

You're making changes to a repository, you're not ready to commit, but someone writes you an email, and says that they need you to quickly change 10 or 12 strings in a couple of files (some of which you're in the middle of editing,) and they need this change published very soon. You can't commit what you've edited as that might break something you're unwilling to risk breaking. How do you make the changes you need to make without committing your in-progress changes?

You're working in a topic branch, you've changed a number of files, and suddenly realized that you need to be working in a different branch. You can't commit your changes and merge them into the branch you need to be using that would disrupt the history of that branch. How do you save current changes and then import them to another branch without committing?

Basically invoke git stash which saves the difference between the index (e.g. HEAD) and the current state of the working directory. Then do whatever you need to do (change branches, pull new changes, do some other work,) and then invoke git stash pop and everything that was included in your stash will be applied to the new working copy. It's sort of like a temporary commit. There's a lot of additional functionality within git stash but, that's an entirely distinct bag of worms.

Onward and Upward!

comments powered by Disqus