header-sample
Rebase Merge Software Development Version Control Git
March 2nd, 2020 - Dave Strock

Rebase vs Merge, Part 3

In Rebase vs Merge Part 2 we talked about the advantages and disadvantages of the two main strategies for handling branching software development. We showed the power of being able to travel backward in the timeline of your software’s construction and alter that timeline to do things like maintain independence while fixing code conflicts, and to enable crafting a clear story that surfaces the important details while not overwhelming with every construction step, however we didn’t come to a conclusion on which is the best strategy.

Best of Both Worlds

We didn’t come to a conclusion because there is no single “best” strategy. Since each method has advantages, it's likely that the best solution is to use both methods. Many teams find it convenient to use web-based code collaboration tools like Gitlab and Github, largely for the simplicity they provide to the branching process. They allow teams to push up a branch of changes and create a special area, usually called a "Pull Request" or “Merge Request”, for performing code reviews and discussions, and then a nice “Merge” button to get that branch’s changes down to the master branch. Making changes out in the open like this is one of the best ways to catch bugs and bring people up to speed quickly, but if we stick to solely the merge strategy we will end up with all of its negatives, namely the need to deal with conflicts in a way that isn’t as easily testable and the cluttering of the development history with minor “fixed it” style messages that add no value to readers in the future.

This is where a mixed approach becomes particularly useful. We can continue getting the benefits of these web tools, while also crafting cleaner, more easily tested code changes by employing the rebase strategy. One particularly effective strategy is to create new branches and Merge Requests for features immediately, before any code is even written. This way, as the code is changed, the commits can be pushed up for immediate review. This can quickly catch mistakes in understanding or facilitate discussion of better approaches to achieving the desired results, but it doesn’t work if developers are scared of making sure their commits are perfect. At this early stage of development, we know as little about the feature as we’re likely to ever know, so it’s a bad idea to have anything resembling the idea of permanence floating around. Instead, we want developers to feel free enough to push up incomplete or messy changes to get feedback early and often.

This is where the ability to rebase changes on a branch to cleanup the history can become so powerful. You get the same end result of a clean history as if you took extra time to be meticulous at every step, without that requirement to maintain a meticulous workflow at all times. Combine that with the ability to handle conflicts without slowing down changes to master and you have a well defined software development process that is tailored to the specific goals and needs of the separate parts of that process.

Lifecycle Differences

It can even sometimes make sense to alter the usage of these strategies temporally, based on what is exactly needed at this point in the code base’s lifecycle. Early in a feature's development, it can be very useful to meticulously lay every card out on the table, making commits for each new thing that is learned or functionality that is added. Like noted above, this can be a large boon to team communication, but it can also help developers understand exactly where they are in their understanding of the feature.

Are you effortlessly making perfect commits? You probably understand the problem well.

Are you making lots of commits with comments like “WIP”, “fixed it”, and “really fixed it this time”? Maybe your understanding is still increasing rapidly and you haven’t plateaued to the point where perfect commits are yet possible.

These are both great bits of feedback that are very difficult to get back from others, and nearly impossible to get when you save all your committing for the very end, but it’s very easy to get a large-grained feel for the current state by just looking at your commit history.

What not to do

So far we’ve discussed the different strategies and ways you can use them to your benefit, and we’ve even discussed a few drawbacks on each, but we haven’t yet discussed what you should avoid in these strategies.

The biggest rule of thumb is “Don’t change shared history”.

Remember from Part 2 that the time traveling abilities of rebase are a very sharp tool that can easily cut you or your team if you’re not aware of its effects. Say we notice a bug in the latest commit on master and, since we don’t like those ugly “fixed it” commits, we decide use our rebase power to fix it. It doesn’t seem like any harm was done.

            
$ git log --all
* 7c8bea0    (35 minutes)  <Dave Strock>   (HEAD -> master) Add 6
* 37432bc    (2 minutes)   <Dave Strock>   Add five correctly
* bd0f5cd    (57 minutes)  <Dave Strock>   Added 4.5
* ecd8aa2    (23 hours)    <Dave Strock>   Four is more
* 3bd999e    (23 hours)    <Dave Strock>   3 is what we need
*   b5a1877  (23 hours)    <Dave Strock>   Merge branch 'add-2'
|\
| * b18a734  (24 hours)    <Dave Strock>   Even better file
* | f0ee5c2  (23 hours)    <Dave Strock>   Start at 0
|/
* 8785557    (25 hours)    <Dave Strock>   The best file
* 95878fc    (27 hours)    <Dave Strock>   Initial Commit
$ cat file
0
1
2
3
4
4.5
5
66
$ emacs file
$ git add file
$ git commit --amend -C HEAD
$ git log --all
* 707ae1b    (39 minutes)  <Dave Strock>   (HEAD -> master) Add 6
* 37432bc    (2 minutes)   <Dave Strock>   Add five correctly
* bd0f5cd    (57 minutes)  <Dave Strock>   Added 4.5
* ecd8aa2    (23 hours)    <Dave Strock>   Four is more
* 3bd999e    (23 hours)    <Dave Strock>   3 is what we need
*   b5a1877  (23 hours)    <Dave Strock>   Merge branch 'add-2'
|\
| * b18a734  (24 hours)    <Dave Strock>   Even better file
* | f0ee5c2  (23 hours)    <Dave Strock>   Start at 0
|/
* 8785557    (25 hours)    <Dave Strock>   The best file
* 95878fc    (27 hours)    <Dave Strock>   Initial Commit
            
          

Whoah, wait, what is this amend C HEAD stuff? The amend flag to commit is just a shorthand that means “Create a new commit that combines the changes I’m committing now with the changes in the previous commit.” It is identical in effect to the example in Part 2 where we created a commit to fix the bug and then used interactive rebase to squash it into the previous commit. The -C HEAD part just means “use the commit message from HEAD”, which is the commit we want to fix.

So now we have a fixed bug, however while we were fixing that bug on master, our teammate created a branch off of the original commit on master, the one you found a bug in. They went on to implement a big feature with many commits.

            
> git log --all
* c516341    (8 minutes)   <Dave Strock>   (HEAD -> add-7) Simplify to 7
* 0f5d81c    (9 minutes)   <Dave Strock>   Make it Seven
* ed0c134    (10 minutes)  <Dave Strock>   Add seven
* 7c8bea0    (40 minutes)  <Dave Strock>   (origin/master, origin/HEAD, master) Add 6
* 37432bc    (2 minutes)   <Dave Strock>   Add five correctly
* bd0f5cd    (57 minutes)  <Dave Strock>   Added 4.5
* ecd8aa2    (23 hours)    <Dave Strock>   Four is more
* 3bd999e    (23 hours)    <Dave Strock>   3 is what we need
*   b5a1877  (23 hours)    <Dave Strock>   Merge branch 'add-2'
|\
| * b18a734  (24 hours)    <Dave Strock>   Even better file
* | f0ee5c2  (23 hours)    <Dave Strock>   Start at 0
|/
* 8785557    (25 hours)    <Dave Strock>   The best file
* 95878fc    (27 hours)    <Dave Strock>   Initial Commit
> git push origin add-7
            
          

Notice how our teammate’s “Add 6” commit has a hash of 7c8bea0 rather than the 707ae1b hash of our fixed commit. Once they are ready to merge their changes in, they realize there is a problem: Two “Add 6” commits!

            
> git log --all
* c516341    (8 minutes)   <Dave Strock>   (HEAD -> add-7) Simplify to 7
* 0f5d81c    (9 minutes)   <Dave Strock>   Make it Seven
* ed0c134    (10 minutes)  <Dave Strock>   Add seven
* 7c8bea0    (40 minutes)  <Dave Strock>   (origin/master, origin/HEAD, master) Add 6
* 37432bc    (2 minutes)   <Dave Strock>   Add five correctly
* bd0f5cd    (57 minutes)  <Dave Strock>   Added 4.5
* ecd8aa2    (23 hours)    <Dave Strock>   Four is more
* 3bd999e    (23 hours)    <Dave Strock>   3 is what we need
*   b5a1877  (23 hours)    <Dave Strock>   Merge branch 'add-2'
|\
| * b18a734  (24 hours)    <Dave Strock>   Even better file
* | f0ee5c2  (23 hours)    <Dave Strock>   Start at 0
|/
* 8785557    (25 hours)    <Dave Strock>   The best file
* 95878fc    (27 hours)    <Dave Strock>   Initial Commit
> git push origin add-7
            
          

This is because we removed that old commit and replaced it with a new one, but only in our version of the repository. Our teammate’s timeline didn’t change like that, so their changes depend on the original commit. You changed shared history. Then when our teammate pushed their changes, git saw 37432bc as the common parent, which we can see as the log output looking as if the add-7 branch starts one commit earlier than it did. Git saw that 7c8bea0 didn’t exist on origin/master, so when our teammate pushed the branch, git pushed 7c8bea0 as well.

This can be fixed with a little effort, see below, but it’s much better if your processes can help you avoid the need for such a thing. Any time you are thinking about doing a rebase that would rewrite commits, ask yourself whether it's possible for someone else to have started work based on any of those commits. Many teams will simply disallow rebasing on master to avoid this problem, but it can be a problem anywhere. If you are working on a branch with another developer, when you are ready to rebase for whatever reason, it's a good idea to have a quick chat with the other developer to let them know your plan. This lets them avoid starting work off of those commits that are going to change, and lets them notify you in the case that they already did. Sometimes it can even make sense to treat a branch being changed by multiple developers similar to master and have everyone create branches off of the shared branch. Then the shared branch is changed only via merges.

How to fix changes to shared history

There are a few valid reasons to do so, but it's almost always a bad idea to change shared history. The universe being what it is, if a process allows for the existence of a problem, it probably will happen at some point so we have to be prepared for the event in which someone changes shared history. We keep knives and sharpeners in the kitchen, but we also keep bandages.

Good team communication can mitigate most of this, so if you are struggling with this problem frequently you probably want to address that rather than try to disallow the usage of the rebase strategy.

Let's say your teammate notified you that they are going to rebase a feature branch to clean some things up. If you have any outstanding changes that you have not pushed up the branch yet, let them know this and if possible try to get your changes in there before the rebase is done. If you don’t have any changes in the wings, consider taking a short break from changes to allow your teammate time to cleanup the branch.

When you find out that a rebase has been done and you don’t have any orphaned commits, the fix is quite easy. We just use the reset command to tell git to make our local repository look like the remote repository that was just changed:

            
$ git fetch
$ git checkout feature-branch
$ git reset --hard origin/feature-branch
            
          

Sidebar: Fetch, Don’t Pull

Astute readers may notice that we used the git fetch command above. This is because git pull is actually a combination of git fetch and git merge. As we’ve pointed out numerous times, we prefer to keep separate things separate so we’d rather not do both at the same time. If it were up to us, we’d probably remove the git pull command all together, because it can be extremely dangerous if not understood well.

It's a good idea to get in the habit of only using fetch. For teams using Gitlab or similar this makes even more sense because they do all their merging through the web UI so there is never a need to do both fetching down new commits and merging them in a single step. The fetch command is also great for doing code reviews and other things where you aren’t yet sure that you want to merge the code, but that is beyond the scope of this series.

The --hard option tells reset to be ruthless in forcing your current branch and the working directory to look exactly like the specified branch. In this case, we’re telling git to make our local feature-branch look exactly like the remote feature-branch which is denoted by adding the remote name origin/feature-branch.

It's important to understand that if you have any local commits on your local branch, or non-committed changes in your working directory, those changes will be removed when you perform a reset. This is the worst case of dealing with time traveling: unpredicted timeline changes.

When Reset Isn’t Enough

When a reset would remove commits you don’t want removed, the first thing to do is communicate with your team to make sure it was intentional. Since these tools are so sharp, it's worth commenting to your teammate that you found blood, as they may not have noticed that they cut themselves.

Returning to our add-7 example from above, if we tried to merge the add-7 branch to master now, there would be a conflict we would have to resolve. If 'Add 6' was a simple change, maybe it’s easy to resolve the conflict, but we’d rather avoid that hassle. Instead, we can use rebase to just remove the extra ‘Add 6’ commit, which will remove the conflict.

            
$ git checkout add-7
$ git rebase -I 37432bc
pick 7c8bea0 Add 6
pick ed0c134 Add seven
pick 0f5d81c Make it Seven
pick c516341 Simplify to 7


# Rebase 37432bc..c516341 onto 37432bc (4 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup <commit> = like "squash", but discard this commit's log message
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# .       create a merge commit using the original merge commit's
# .       message (or the oneline, if no original merge commit was
# .       specified). Use -c <commit> to reword the commit message.
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
            
          

This time, instead of the squash command, we just want to remove a commit. As the comments show, this is done by simple deleting the line with the commit we want to delete.

            
pick ed0c134 Add seven
pick 0f5d81c Make it Seven
pick c516341 Simplify to 7


# Rebase 37432bc..c516341 onto 37432bc (4 commands)
#
# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup <commit> = like "squash", but discard this commit's log message
# x, exec <command> = run command (the rest of the line) using shell
# b, break = stop here (continue rebase later with 'git rebase --continue')
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# .       create a merge commit using the original merge commit's
# .       message (or the oneline, if no original merge commit was
# .       specified). Use -c <commit> to reword the commit message.
#
# These lines can be re-ordered; they are executed from top to bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
            
          

So now it seems like we’re good to go. We have a master branch with the correct ‘add-6’, and we have an ‘add-7’ branch that doesn’t contain that conflicting original ‘add-6’. However, we haven’t really fixed it yet because each of the commits to add 7 relied on the fact that the new entry was added after ‘66’. Git just stores diffs, it doesn’t actually understand the changes you’re making. So now if you try to merge the ‘add-7’ branch, you’ll find that there are still conflicts. First, it will ask you to correct adding ‘seven’ after ‘66’, then it will ask about adding ‘Seven’ after ‘66’. The conflicting diff will look something like this:

            

0
1
2
3
4
4.5
5
66
Seven

            
          

This becomes a bigger pain the more commits you have after the correction. You’ll have to fix each commit up the chain by effectively applying the same patch to each subsequent commit. With this trivial example that’s not too bad, but as soon as you have code that requires thought it’s easy to screw up such a repetitive task. So, let’s abort the merge and let git help us out again.

Rerere - Automated repetition

Git has a handy tool for helping with repetitive patch applications for cases like this, but it is disabled by default. There are probably some cases where enabling this tool is problematic, but beyond slight performance hit we’ve not run into them so that is beyond the scope of this article.

The command is called ‘rerere’, which stands for “Reuse recorded resolution”. Enable it by editing your .gitconfig file (usually in your home directory) to add rerere.enabled = true.

After a conflict is found on a merge or rebase, you can use git rerere to record the state of the conflict. Then once it has been resolved, git rerere will record the resolution. The power comes in when we’re resolving a string of conflicts that all need to be resolved in the same way, like to our add-7 problem above where we need to change ‘66’ to ‘6’ in a bunch of commits so that the diffs can find where to make the changes. If we let git rerere record the changes we make while resolving the first conflict, we can then let git just reuse those changes on subsequent commits.

The best part is that git doesn’t even force you to type git rerere at the appropriate times because it just automatically runs it, if its enabled, anytime it runs into a conflict that could be resolved by the reusable patch. This allows us to resolve this entire add-7 conflict chain by going all the way back to when it first was noticed and then just doing a single merge:

            
$ git co master
Previous HEAD position was c516341 Simplify to 7
Switched to branch 'master'
$ git merge add-7
Auto-merging file
CONFLICT (content): Merge conflict in file
Recorded preimage for 'file'
Automatic merge failed; fix conflicts and then commit the result.
$ emacs file
$ git add file; git commit -m “Merge add-7”
Recorded resolution for 'file'.
[master 3cf4e41] Merge add-7
$ git log
$ git r
*   3cf4e41  (85 seconds)  <Dave Strocl>   (HEAD -> master) Merge add-7
|\
| * c516341  (3 months)    <Dave Strocl>   (add-7) Simplify to 7
| * 0f5d81c  (3 months)    <Dave Strocl>   Make it Seven
| * ed0c134  (3 months)    <Dave Strocl>   Add seven
| * 7c8bea0  (3 months)    <Dave Strocl>   Add 6
* | 707ae1b  (3 months)    <Dave Strocl>   Add 6
|/
* 37432bc    (3 months)    <Dave Strocl>   Add five correctly the first time
            
          

Notice the new line “Recorded preimage for ‘file’”. This is git rerere telling us that it stored the current state of the conflict so that it can determine the resolution. Then once we’ve resolved the conflict and committed it, we see “Recorded resolution for ‘file’” which is git rerere recording the resolution. What it doesn’t show you is that it used that stored resolution to resolve merging in the final 2 commits (0f5d81c and c516341) without your involvement. The almost completely removes one of the biggest cautionary results from the use of the rebase idea.

Conclusion

As we’ve shown, git rebase is an incredibly powerful tool that can greatly increase the capabilities available to software developers. With that power comes opportunity for both benefit and detriment, but hopefully we’ve shown enough of both the advantages, and how to resolve the hopefully rare problems that can arise when using such power, that you will be able to use these tools without much downside.


YOU MAY ALSO LIKE

may also like
Jan 24, 2020 - By Dave S.

Rebase vs Merge, Part 1