BLOG

Blog > Rebase vs Merge, Part 1

Rebase Merge Software Development Version Control Git

Jan 24th, 2020 - Dave Strock

Rebase vs Merge, Part 1: Is Time Travel Already Possible?

Thankfully, revision control systems have become a ubiquitous and expected tool in modern software development. No longer are we arguing about whether or not we should use revision control. Instead we are arguing about *how* we should use revision control.

This was a very large step for the industry that often goes unnoticed. Not having to deal with multiple copies of a code base or having to manually apply code patches is huge win for productivity, and removing the drudgery of those error prone processes is a boon for morale. Now, thanks to advancements in revision control technology, many of us are now on the other side of yet another large step forward, in which revision control becomes less of a tool to control the source code, and more of a tool to help craft the source code into its best form.

The first concept that took large steps in this direction was branches, which gave developers a bit of independence from the changes of the mainline code base. Allowing developers to not constantly re-integrate their code with upstream changes can save a lot of repetitive effort that adds no real value. This gave developers the ability to more easily think at a higher level about what changes are on-going and when they will make their way to the product.

We primarily work with ruby on rails here at Entrision but these principals apply to any codebase in any language. All code bases tell a story. Without revision control, that history was much harder to find and often took the form of changelogs and roadmaps and other such files that happened to be left around in the copy you are looking through. When that changed to a full accounting of every change to the code base, the quality of that story started to diverge wildly. Early on, many repositories became minutia collections; simple lists of each file changed, maybe with a brief note.

This blog post is about the ways in which rebase and merge can be used not only to make development easier, but more importantly to convey more meaning into the future.

First, The Basics

This blog post will refer generically to the concepts of rebase and merge, as they are applicable to many revision control systems, but git will be tool used for examples.

We’ll assume you know the basics of git, but we’ll lay down enough to set a solid starting point. In git, a branch is made from the current point in the repository with the checkout command. First we’ll create a repo with a single file, then create a branch:

            
$ git init .
$ echo "1" > file                       # Gotta start somewhere
$ git add file
$ git commit -m "The best file!"
$ git checkout -b add-2                 # The add-2 branch will be even better!

Assuming that development continues on both master and add-2,

  
$ echo "2" >> file                       # 2 is better than 1
$ git add file
$ git commit -m "Even better file!"
$
$ git checkout master
$ echo -e "0\n1" > file                  # Forgot to start at 0
$ git add file
$ git commit -m "Start at zero"

there will come a point in time that we need to get the changes from add-2 on to the master branch.

            
$ git log —all
* cba8efd    (3 seconds)   <Dave Strock>   (HEAD -> master) Start at zero
| * 575b69c  (70 seconds)  <Dave Strock>   (add-2) Even better file!
|/
* 59e4575    (2 minutes)   <Dave Strock>   The best file!
* 763d779    (4 minutes)   <Dave Strock>   Initial commit

The simplest way to do this is with the merge command, which will merge the branch given as an argument into whatever branch is currently checked out:

            
$ git checkout master
$ git merge add-2
Auto-merging file
Merge made by the 'recursive' strategy.
file | 1 +
1 file changed, 1 insertion(+)

This results in a single file containing three lines: 0, 1, and 2. The repository history then looks like this:

            

$ git log --all
*   02ac03a  (18 minutes)  <Dave Strock>   (HEAD -> master) Merge branch 'add-2'
|\
| * 575b69c  (21 minutes)  <Dave Strock>   (add-2) Even better file!
* | cba8efd  (20 minutes)  <Dave Strock>   Start at zero
|/
* 59e4575    (22 minutes)  <Dave Strock>   The best file!
* 763d779    (24 minutes)  <Dave Strock>   Initial commit

Notice the new commit "Merge branch 'add-2'" listed as the head commit. We didn’t tell git to make a commit, nor did we give it a message, instead git created this commit to merge the content of the two branches (master and add-2) and it had to make a new commit to do so.

The other option for merging two branches is to use rebase. Instead of merging the contents of both branches together by creating a new commit, what if we could rearrange the commits to simply show the changes that were made, even if they were made on branch? For that, we’ll need time travel, which is exactly what git rebase is capable of.

In this example, with respect to time, the file was modified to add a line with 2 (on add-2) temporally *before* the line with 0 was added (on master). You can see this is the log output, which is out of order: "Even better file!" was created 21 minutes ago, while "Start at 0" was created 20 minutes, even though it is listed right after (that is, above) "The best file!" which was 22 minutes ago.

Ideally, we’d have added 0 to start with, before we even created the add-2 branch. Then we wouldn’t even need a merge, since only one of the two branches changed. What if we could just change that history of the repository to make things cleaner?

First, let’s make two new changes similar to the first set, by creating a new branch and adding a new value to the file. Then we make a different change to the file on master:


$ git checkout -b add-4
$ echo "4" >> file
$ git add file && git commit -m "Four is more"
$ git checkout master
$ echo "3" >> file
$ git add file && git commit -m "Three is what we need"
$ git log --all
* 3bd999e    (5 minutes)   <Dave Strock>   (HEAD -> master) Three is what we need
| * 8c4492e  (10 minutes)  <Dave Strock>   (add-4) Four is more
|/
*   02ac03a  (20 minutes)  <Dave Strock>   (HEAD -> master) Merge branch 'add-2'
|\
| * 575b69c  (23 minutes)  <Dave Strock>   Even better file!
* | cba8efd  (22 minutes)  <Dave Strock>   Start at zero
|/
* 59e4575    (24 minutes)  <Dave Strock>   The best file!
* 763d779    (26 minutes)  <Dave Strock>   Initial commit

Now, instead of using git merge to create a merge commit, we want to alter the repository’s timeline to make it look like the add-4 branch was created after the "Three is what we need" change was made. Then once we go to merge add-4, there won’t be any changes on master that are not also on add-4 so there is no chance for a conflict. However, the important part is that we’re not just changing the repository log, we’re actually going back in time and altering the timeline so that when we get back to the present time, things will have worked out differently and we’ll see a different result.

First, we’ll checkout the branch that we want to change the timeline of. Then we’ll tell git which timeline we want to move add-4 relative to, in this case master. We’re going to move the commit made on add-4 into the future slightly so that it was created after "Three is what we need" instead of before.


$ git checkout add-4
$ git rebase master
First, rewinding head to replay your work on top of it...
Applying: Four is more
Using index info to reconstruct a base tree...
M    file
Falling back to patching base and 3-way merge...
Auto-merging file
CONFLICT (content): Merge conflict in file
error: Failed to merge in the changes.
Patch failed at 0001 Four is more

Well, that doesn’t feel good, does it? We just wanted to muck about with time, we weren’t expecting nasty repercussions like merge conflicts. However, it's impossible to ignore conflicting changes (unless you want to delete one of them) so we’ll have to deal with them sooner or later. Rebase forces, or allows you to deal with merge conflicts up front, before the code is getting written on to master. So let’s edit our file to resolve the conflict and see what the repository looks like:



$ emacs file
$ git add file
$ git rebase --continue
Applying: Four is more
$ git log --all
* ecd8aa2    (6 minutes)  <Dave Strock>    (HEAD -> add-4) Four is more
* 3bd999e    (6 minutes)  <Dave Strock>    (master) Three is what we need
*   02ac03a  (21 minutes)  <Dave Strock>   Merge branch 'add-2'
|\
| * 575b69c  (24 minutes)  <Dave Strock>   Even better file!
* | cba8efd  (23 minutes)  <Dave Strock>   Start at zero
|/
* 59e4575    (25 minutes)  <Dave Strock>   The best file!
* 763d779    (27 minutes)  <Dave Strock>   Initial commit

The astute reader will notice that the "Four is more" change is not yet part of the master branch, but we’ll get to that. The most important changes to notice are the timestamps and commit hashes of the two commits we’re working with, both of which have changed for the "Four is more" commit. This is because we successfully changed the timeline! In our new reality, the add-4 branch was created after our file was modified by the "Three is what we need" commit.

Since there are no changes on master that are not on add-4, we know that merging add-4 to master cannot have any conflicts. So let's merge it.


$ git checkout master
$ git merge add-4
Updating 3bd999e..ecd8aa2
Fast-forward
file | 1 +
1 file changed, 1 insertion(+)
$ git log --all
* ecd8aa2    (6 minutes)  <Dave Strock>    (HEAD -> master, add-4) Four is more
* 3bd999e    (6 minutes)  <Dave Strock>    Three is what we need
*   02ac03a  (21 minutes)  <Dave Strock>   Merge branch 'add-2'
|\
| * 575b69c  (24 minutes)  <Dave Strock>   Even better file!
* | cba8efd  (23 minutes)  <Dave Strock>   Start at zero
|/
* 59e4575    (25 minutes)  <Dave Strock>   The best file!
* 763d779    (27 minutes)  <Dave Strock>   Initial commit

Notice that the only thing that changed is that master now points to our "Four is more" commit, instead of "Three is what we need". No conflicts, no worries, and no additional merge commits required. Just the commits we actually created. This is what git means when it outputs "Fast-forward"; it was able to just move the master branch pointer and didn’t have to make any other changes to the repository.

At first you may be thinking "Sure time travel is fun, but we just did more work for nearly the same result", and you’d be right. However, this technique becomes more useful, maybe even essential, as projects grow larger and more complex and as more people are involved in changing them.

Check back for part 2, where we dive into the power, and responsibilities that come along with this new power.