Stuntman And Lover Of Cheese

by Dan Bostonweeks

Subversion to Git Roadbump: Canonical Version Numbers

Recently, a major project at work that’s in the middle of continued development was switched from Subversion to Git. There was a bit of wailing, and some gnashing of teeth, but with a lot of planning and testing beforehand the switch went smoothly. Of course, with even the best of intentions of making life easier for the engineers there were some rocky spots after the dust settled. The major bump we’ve seen so far is the lack of canonical revision numbers and confusion on lesser technical people that rely on the code base.

First, a little background on Subversion. Subversion keeps an positive number for every commit to the central repository. A commit could be one line in one file or one thousand lines in twenty files. Each one is a new revision. Something along the lines of r101, r102, r54234, you get the idea. Somewhere along the way after the company started and was using Subversion people began to rely on these version numbers. A producer could tell if a build marked with a revision number contained a fix or not. QA was able to assign or close bugs based on what build they were testing and if the engineers told them the fix revision.

All was right with the world and then Git came along and made it even better. Git is a far superior experience for the engineers (and that’s a whole other article). Git uses SHA1 hashes to denote revision numbers. The long form is 40 characters like 96132ce6bab4c27aa4cb40d8655cc8563f19b46b and the short form is the first seven characters like 96132ce. Far more complex than the revision numbers from Subversion, and that’s where the plot thickens.

In the end it turns out that SHA1 hashes cause some people’s eyes to cross and their brains to smoke. In the ensuing chaos bugs were getting closed and re-opened incorrectly. People were having trouble communicating what builds to look at, or even telling if a build contained a fix or not. Compare r101 and r102, you know the fix is in 101 so 102 or higher has the fix. In Git it’s 96132ce and eaf872a where you can’t tell right off if the latter is newer than the former.

The production staff was at the breaking point and came to us with a plea, make it easier to compare revisions. While we could have ignored it and forced them to learn to deal with Git revisions that is a path to darkness and anger. We don’t do that. We’re co-workers and we like each other and together we make cool stuff. So digging into Git I went, sure that someone had solved this issue in an elegant way already.

Sure enough, even the Git project itself has a canonicalish revision numbering scheme. Taking a cue from the Git masters I looked at git describe. The describe functionality of Git takes your current revision and counts backwards to the previous tag and then presents the tag and that number along with a short format revision hash to the user. Cool, we’re looking for counting. Running it on a release branch we get a silly thing like “beforemergeafterpatches-123-ge47ac1”.

That’s great, but that tag isn’t necessarily interesting from a production point of view. Thankfully Git lets us specify a match parameter for the tag name. The Git project tags versions with vX.Y.Z when they start working on that revision. When they match for that tag they get something like “v1.0.1-45-g67e4ad”. That’s much more reasonable and very much what we’re looking for. Something along the lines of:

% git describe --match 'v[0-9]*' --abbrev=4

The project we’re working on is following it’s own branching model that’s been worked out ahead of time and each release to be worked on is in it’s own branch (if you don’t have a solid branching model I suggest you check out nvie for a good place to start). At the creation of the branch the engineer also makes an annotated tag that follows a set pattern we can match against. This gives us gives us a nice and tidy string like ‘gamename_1.23-58-g3fad587’, perfect for our producer and QA needs while also giving engineers enough information to get to a spot in the repository. As a bonus we also know what version of the app this commit is working towards, and that helps with builds.

Now we have a canonicalish bit that can be passed around from engineering to production to QA and back. Engineers (and other repository users) can generate this information on the fly. Builds will be tagged with this string so they match what the engineers are telling people. People will be happier. Rainbows will shine. Puppies will lick your face. You’ll get to move on and do cool stuff.