What does ‘git merge –abort is equivalent to git reset –merge when MERGE_HEAD is present.’ of Git merge man page mean?

I guess MERGE_HEAD is only present when a merge is in progress which also is the case when I am resolving conflicts.

Correct. Git knows that a git merge is happening (and has not yet finished) because that file exists. (That there are, or at least were, conflicts is usually how you got there. You may or may not have resolved them all by now: the conflict information is stored in what Git calls, variously depending on who / what is doing the calling, the index, the staging area, or sometimes the cache.)

When the file does exist, it contains the hash ID of the other commit—the one that’s not the HEAD commit—that git merge is merging into the commit that is the HEAD commit. (The hash ID of the merge-base comit is not stored anywhere.) The git commit command uses this hash ID to know how to set the second parent hash ID in the next commit you make. Using git merge --continue runs git commit for you, with the added bonus that it won’t run git commit if you aren’t doing a merge. In other words, the --continue option is a more user-friendly way to spell the git commit that finishes the merge after fixing conflicts.

What additional feature has … git merge --abort when MERGE_HEAD is gone?

None. In fact, it just fails in this case:

$ git merge --abort
fatal: There is no merge to abort (MERGE_HEAD missing).

If MERGE_HEAD isn’t missing, git merge --abort just runs git reset --merge. In other words, it’s just a more user-friendly way to spell that command, with the added bonus that it won’t run it if you aren’t doing a merge.

What additional feature has git reset --merge … when MERGE_HEAD is gone?

The manual page tries to explain this with the following text:

Resets the index and updates the files in the working tree that are different between <commit> and HEAD, but keeps those which are different between the index and working tree (i.e. which have changes which have not been added). If a file that is different between <commit> and the index has unstaged changes, reset is aborted.

This, alas, is full of Git jargon. The <commit> here is the argument you gave to git reset, e.g.:

git reset --merge a123456

says to do a reset to the commit with hash ID a123456.... If you omit it—as you would for this case—it defaults to HEAD, as if you ran:

git reset --merge HEAD

so where the documentation says <commit> you can just think of this as saying HEAD, which is the commit you are on right now.

How the index works, especially during merges

To understand the rest of it requires understanding what is really in the index. It may help out a bit if you run:

git ls-files --stage

This produces a lot of output in a big repository. I’ve done it here with my copy of a Git repository for Git, and snipped out just a little bit of the output:

100755 5b927b76fe58e5284d7b74e2f9fd8cc0b1f07764 0 t/t0041-usage.sh

These lines show (much of—use --debug to get even more) what goes into each index entry:

  • a file mode and Git blob hash ID (100644 c3c976d... or 100755 5b927b7...)
  • stage number, usually zero
  • the file’s full path name

These “blob” object hash IDs are how Git really stores files. Instead of starting with the name and then giving the contents, Git starts with the name—well, the commit hash plus the name—and finds a blob hash ID. The contents for file builtin/merge.c or t/t0041-usage.sh are stored separately, as that object of type blob. (A commit, with its hash ID, is an object of type commit.)

The hash ID for a blob object is determined entirely by the file content—well, more precisely, by computing a checksum over the string blob 3269\0..., where the ... part is the file contents, and 3269 is the size of the contents in bytes. If a later commit needs to re-use t/t0041-usage.sh unchanged—and almost every Git commit in the Git repository for Git does—it just says to use 5b927b7... again. So if a file never changes once it gets into a Git repository, it’s only ever stored once, no matter how many commits use it. Of course, if we change the file, Git stores a new blob object under a new blob hash—or, if we’ve changed the file to match any file ever previously stored, Git re-uses that existing blob.

In other words, the blob’s hash ID uniquely represents the file’s content. Every file has been reduced from its real text, whatever that is, to a short-ish, unique hash ID. This has an obvious potential problem (see How does the newly found SHA-1 collision affect Git?) but in practice, it works fine.

The stage number is how Git decides if there are any merge conflicts. Each file name, in the index, has up to four slots. These slots are numbered: zero means unconflicted file, and 1, 2, or 3 means conflicted file. Slot number 1 holds the blob hash of the file that is from the merge base commit. Slot 2 holds the blob hash of the file from the current (HEAD or --ours) commit, and slot 3 holds the blob hash of the file from the other (--theirs or MERGE_HEAD) commit.

When you run git merge other, Git extracts all the files—really, all the blob hashes—from each of the three inputs to the merge. The merge base is the common commit from which both you and they started. The --ours commit is the commit you’re on—these files were already in the index, at slot 0, so they’re just moved to slot 2. The --theirs commit is from your other argument, and they’re put into the index at slot 3. The merge code can then start1 by comparing blob hashes:

  • If the hashes in slots 2 and 3 are the same, you and they did the same thing to that file. It doesn’t even matter what’s in slot 1—all three hashes match, nobody did anything, or hash #1 isn’t the same as these two, but we did the same thing so the merge is trivial. This file’s merge is easy-peasy: just drop the right hash into slot 0 and erase slots 1-3.
  • The hashes in slots 2 and 3 are different.
    • Is the hash in slot 1 the same as that in slot 2? If so, use the blob from slot 3. That is, we didn’t do anything to the file, but they did something: take theirs.
    • Is the hash in slot 1 the same as that in slot 3? If so, use the blob from slot 2. That is, they didn’t do anything to the file, but we did something: take ours.

The remaining case is the hard one, which can (not necessarily “does”) result in conflicts. The hashes in all three slots differ, so both we and they made some change(s) to this file, vs the common starting-point merge-base version of this file. Here, Git will do the usual line-by-line git diff to figure out who changed which line(s). This is a low level merge, implemented by ll-merge.c (the high level part was figuring out the three commit hashes, and getting the right blob hashes into the right staging slots).

Wherever we touched different lines of the merge base copy, Git combines these changes by taking one copy of each. Wherever we touched the same lines, Git combines these by marking them up as a conflict, unless of course we made exactly the same change to those lines. The merged file, with conflict markers, goes into the work-tree: the three copies in the three staging slots remain there.

If there really was a conflict, as detected during this low-level merge operation, Git will stop the merge with conflicts and make you fix the problem. If not, Git will put the merged file into slot zero and erase the three higher level entries. If so, Git leaves you with this mess in the index and work-tree. You fix it up. Then you run git add file, and that erases slots 1-3 (after writing the new-or-reused blob hash to slot zero, and of course storing the blob in the repository database if needed).

There are a few more special cases I have not covered here. In particular, it’s possible for a file to not exist in some of the commits, or to be renamed from an old commit to a new one. In these cases, some of the index slots wind up “empty”: for instance, if we remove file bad.txt and they make a change to bad.txt, the #2 slot will be empty. These result in high level merge conflicts (such as “modify/delete”)—the low level merge code never runs at all, and the merge stops with a conflict. In this case, the emptiness of an index slot indicates that the file wasn’t there after all (“there” being whichever of the three inputs it was—merge base, ours, or theirs).

1Internally, for efficiency, Git doesn’t actually do any of this slot-shuffling until it has to. That is, there’s some code that has the three hashes in separate variables. It does the comparing first, and only for the potential-conflict need-to-run-low-level-merge case, does it do the index-slot-writing. Otherwise it just picks the winning blob ID and writes slot zero’s hash where it already lives. It’s easier, though, to imagine it as separate phases.

Now we can understand the Git jargon

We now know what these various index slots are and do, and that the index holds hashes (blob IDs) while the work-tree holds the actual file contents, decanted out of the blob objects. We also know that commits store files by storing blob hashes and file-name pairs. (We’ve glossed over a detail here: commits use a third object type, the tree object, to do this. The index is what you get if you combine all the sub-trees listed in a top level tree, and the trees are made by splitting each sub-directory that’s in the index into its own separate object. We can mostly ignore this, because except for the extra staging slots available in an index, the two are easily inter-converted.)

So, let’s go back to the man-page text:

Resets the index and updates the files in the working tree that are different between <commit> and HEAD, but keeps those which are different between the index and working tree (i.e. which have changes which have not been added).

In other words, for each index entry:

  • If slots 1-3 are in use, discard them: just put the blob hash from the commit back into slot zero and replace the work-tree file with the one from the commit.

If a file that is different between <commit> and the index has unstaged changes, reset is aborted.

  • But if there’s something in slot zero, and we’d have to replace it with something from the specified commit, compare the index and work-tree content (expand the blob and compare it to the work-tree file). If those don’t match, don’t do anything at all (stop before resetting any slots-1-through-3 stuff).
  • Otherwise—i.e., if we do have to replace a slot zero entry, and we’ve verified that it’s OK because there are no unstaged changes—go ahead and replace the slot-zero entry and the work-tree file.

In other words, the text might be better (though a bit longer) if it said:2

  • First, run a test on each existing index entry that’s at slot zero:
    • If the index entry matches the hash we’ll take from , we’re OK.
    • Otherwise, make sure the blob in the index matches the actual file in the work-tree. If not, abort this reset entirely.
  • Then, having made sure all is OK here, run this for each index entry:
    • If the index entry/ies is/are not at slot zero, replace them with one that is, taken from <commit>, replacing the work-tree file in the process. (If that file doesn’t exist in <commit>, remove the index entry and work-tree file entirely.)
    • Otherwise (index entry is at slot zero), if the index blob does not match the commit’s, replace the index entry and the work-tree file. (If that file doesn’t exist in <commit>, remove the index entry and work-tree file entirely.)

In any case, this git reset, if not aborted, removes the MERGE_HEAD file if it exists, so that the merge is no longer happening.

2My longer description here is much more explicit, and therefore possibly incorrect in some corner cases. I have not specifically tested whether, given entries in slots 1, 2, and/or 3, Git does any comparing of work-tree contents. Instead, I’ve just assumed that since the point of git reset --merge is to abort a merge, Git itself assumes that “dirty” work-tree files are from a low-level merge, and wipes them out. That has to be the case if all three slots are full, but maybe if some slots are empty and there was a high-level conflict, Git is more careful.

Leave a Comment