Git commit hygiene
Suppose you read about the notions of git public versus private history, and have decided to reduce the git commit noise on your public branch. Or maybe you just stumbled in by serendipity. Regardless, shipping a clean git history requires modifying at least the private git history.
Discussions around modifying commit history often occur around
the use of git rebase
. The upshot of many rebase discussions
seems to be two-fold:
- “rebasing is too hard to understand,” and
- “rebasing changes commit history which is Bad so rebasing is pointless anyway.”
Item 2 is addressed by understanding the difference between public and private git history. Clarifying git rebase to address Item 1 is the subject of the following.
Wanting a nice, clean git history, only a few techniques are sufficient to master:
- Squashing on merge
- Interactive rebasing
- Amending commits
- Force pushing
- Retrieving from the reflog
None of these are any more difficult as reconciling merge conflicts in long running branches, a process which can take hours and require communication with one or more other developers. Each technique is useful and worth learning on its own. Competence in all these techniques allows rebasing with impunity and achieving an elegant, clean git commit history.
Before proceeding any further, these techniques are proven to work in “normal sized” git repositories, that is, repositories small enough the normal git operations proceed quickly. Apparently, some very large companies run “monorepos.” This article does not address anything to do with monorepos.
Squashing on merge
Github provides 3 ways to merge one branch into another:
-
Merge: this is the “classic” technique prevelant in most git tutorials. Merging copies all of the commits in one branch into the other branch. This can produce a complex history of commits, as each branch and its commits forms a separate path the the repository.
-
Merge and squash: this technique squashes all the commits in a branch to a single commit, which is then merged as usual.
-
Rebase and squash: according to Accelerate, this is the method preferred by high performing teams. Branches are first rebased onto the head of master, then merged with fast forward. Fast forward does not add a separate merge commit to the history, it simply applies the commit to the head of master. The result is a clean linear history, with every commit passing all continuous integration criteria, and is easily revertible.
Using Method 3 eliminate history from the master branch…private history consisting of code with typos, failed tests, and any other issues which would fail a code review.
Interactive rebasing
Of everything to do with Git, in my experience, rebasing generates the most fear and loathing. I am not sure why. Rebasing is no more difficult than merging, and in many cases keeping a clean, rebased history is much easier than managing a complex merge history.
Rebase can (and usually does) “rewrite history.” Which is a good thing!
Committing early and often provides a margin of safety when developing. Rebasing allows condensing all the sidetracks, failed experiments, and untested commits into one or at least few well-organized and clean commits suitable for public presentation.
Putting it another way, merging a complete, private git history into the public shared branch is somewhat akin to publishing a book with all its rough drafts. It’s just not necessary.
When to squash
I find squashing local commits is really useful in the following two cases:
- When the branch has a sequence of non-informative commits.
- Clean up local history before making a pull reuqest.
The first case is self-explanatory, more or less “take out the trash.”
The second case extends the first case to restructure or reorder commits on a branch. For example, if I want to demonstrate (say) how test-driven development proceeds, I’ll squash like commits into each other, possibly amending along the way, as will be discussed next.
Amending commits
Amend work on HEAD, use interactive rebase to modify commits further back in the commit history.
- rewriting the commit message on local HEAD with
git commit --amend
. This will open your usual commit editor, where you can change the commit message. Saving and quitting as usual also updates the SHA, and you will need to force push to remote. - fixing typos, errors, any other non-informative change.
- keeping the working branch clean with
git commit --amend --no-edit
. This is phenomenally useful at keeping private changes private. There is no need to build a commit history with a long sequence of “fixed typo,” “added test,” and similar. Nobody cares about that stuff!
Atlassian has an excellent article on rewriting git history.
Force pushing
Force pushing is another topic which seems to induce strong opinions along the lines “Don’t Ever Force Push.” But that’s simplistic advice. Instead, let’s consider when force pushing is inadvisable. There are at least two cases:
- Long running shared branches are more difficult to collaborate on when any of the contributors force push changes.
- The trunk branch, typically called “master” or “main,” which ships to production should only be force pushed in extraordinary circumstances.
Force pushing is a useful part of the git toolkit, and has a legimate and critical part of many workflows. It’s a useful skill to acquire.
Retrieving from the reflog
This is not the place for an extensive explanation of the git reflog, the man page has the gory details. What is appropriate here is a few use cases:
- Recovering a bad amended commit. Once in a while I amend a commit on to the wrong branch. For example, I may add work to master when it should be on a feature branch. One way to rectify this is to check out the feature branch, then find the relevant commit in the reflog and cherry pick it on the the feature branch.
- Recovering an unfortunate squash. Since every commit is in the reflog, it’s easy to checkout just the commit which should not have been squashed.
Finding commits in the reflog is one of many reasons I always prefix commits with a word or two providing context to the commit. For example, consider the following two commit messages:
storyname - fix scheduling time test
is a lot easier to find in the reflog thanfix test
I’m sure there are other use cases, these are the two which I find myself using a few times a year.
Upshot: the reflog is as useful as the you make it, so make it easy to use!
Scaling
All the above is well and good, but leaves the question of scaling unanswered: “How large of a team or code base will this support?”
- Collapsing private history works at any scale. This can be done locally with collapsed history force push when a pull request is opened.
- From personal experience, rebasing on HEAD of master works well into the dozens of developers merging dozens of pull requests (which are then automatically deployed to production). It does requires that the CI pipeline be fast and reliable. Flaky tests, and a suite which runs more than 10 minutes are triggers to invest effort into CI.
If this process breaks at scale, it might be worth reflecting on the size of the code base and team committing to the code base.
Practice!
If you aren’t familiar with git’s reflog, it’s worth some time reading the reflog man page. A great followup is the rebase man page.
- Look through your reflog.
- Squash a bunch of commits into one commit.
- Reset the head of a branch one commit back, and redo a commit
sequence using
git add -p
.
Here is a great article from Atlassian on git rebase.
Summary
Adopting a practice of squashing and rebasing commits has the following effects:
- It moves the responsibility for commit hygiene further to the
edge, it becomes the engineer’s responsibility, not the build engineer
or operations person, to ensure pull requests are clean, conflict-free
and ready to merge. This may get resistance when
rebase
is perceived as “hard.” - It may encourage atomic and minimal pull requests, which are easier to review and easier to debug.
This has been an excellent writing experience. While I’m huge fan of rebasing, I’ve come to the conclusion while writing this article that it’s not something many people will choose to have in their toolkits. And that’s fine.
The upshot is that if I don’t have to maintain the git history, if I’m not responsible for merging and solving merge conflicts, then I really don’t care.