Patchwork merge: add automatic tag merge algorithm

login
register
mail settings
Submitter Angel Ezquerra
Date Feb. 18, 2014, 11:28 p.m.
Message ID <f3eb8304d9bb59e78b50.1392766117@Angel-PC.localdomain>
Download mbox | patch
Permalink /patch/3692/
State Deferred
Headers show

Comments

Angel Ezquerra - Feb. 18, 2014, 11:28 p.m.
# HG changeset patch
# User Angel Ezquerra <angel.ezquerra@gmail.com>
# Date 1392597815 -3600
#      Mon Feb 17 01:43:35 2014 +0100
# Node ID f3eb8304d9bb59e78b50b42a9341a2063e1cb451
# Parent  7648e9aef6eeab00a0946e877690e94fb12d389b
merge: add automatic tag merge algorithm

Try to automatically merge conflicting .hgtags files using the following
algorithm:
- keep all the tags that are on only one of the merge parents
- out of the common tags:
    - keep those that are exactly the same (including history) in both parents
    - keep those whose "full tag history" (i.e. current node plus tag history)
    is a super-set of the other merge parent "full tag history"
- If there are no tags left the merge was successful.
- If there are any tags left there was a conflict and the automated merge failed
    - When this happens, fall back to a regular file merge, which in most cases
    should open a merge tool (at which point the user can manually resolve the
    conflicts).

# Notes:
- The algorithm assumes that tags are never removed from the .hgtags file. This
should be true in most cases (particularly if the user does not normally modify
the .hgtags file on its own). We could improve this merge algorithm to handle
that case as well.
- When the automated merge algorithm is successful, the merged tags are written
in alphabetical order. There main reason is that tags._readtags returns a
regular unordered python dict. We could change it to a sorteddict instead.
If we keep it this way we should probably change the hg tag command so that it
sorts tags when it modifies the .hgtags file as well.
- All tests pass without modification. The added test passes as well (while it
would fail without this change)
Mads Kiilerich - Feb. 22, 2014, 4:59 p.m.
On 02/19/2014 12:28 AM, Angel Ezquerra wrote:
> # HG changeset patch
> # User Angel Ezquerra <angel.ezquerra@gmail.com>
> # Date 1392597815 -3600
> #      Mon Feb 17 01:43:35 2014 +0100
> # Node ID f3eb8304d9bb59e78b50b42a9341a2063e1cb451
> # Parent  7648e9aef6eeab00a0946e877690e94fb12d389b
> merge: add automatic tag merge algorithm
>
> Try to automatically merge conflicting .hgtags files using the following
> algorithm:
> - keep all the tags that are on only one of the merge parents
> - out of the common tags:
>      - keep those that are exactly the same (including history) in both parents
>      - keep those whose "full tag history" (i.e. current node plus tag history)
>      is a super-set of the other merge parent "full tag history"
> - If there are no tags left the merge was successful.
> - If there are any tags left there was a conflict and the automated merge failed
>      - When this happens, fall back to a regular file merge, which in most cases
>      should open a merge tool (at which point the user can manually resolve the
>      conflicts).
>
> # Notes:
> - The algorithm assumes that tags are never removed from the .hgtags file. This
> should be true in most cases (particularly if the user does not normally modify
> the .hgtags file on its own). We could improve this merge algorithm to handle
> that case as well.
> - When the automated merge algorithm is successful, the merged tags are written
> in alphabetical order. There main reason is that tags._readtags returns a
> regular unordered python dict. We could change it to a sorteddict instead.
> If we keep it this way we should probably change the hg tag command so that it
> sorts tags when it modifies the .hgtags file as well.
> - All tests pass without modification. The added test passes as well (while it
> would fail without this change)

I think there are some essential disadvantages with this approach:
* It will completely rewrite the .hgtags, giving a messy history and 
making it almost impossible to merge manually with old un-sorted .hgtags.
* Manual fixing of .hgtags will no longer be possible ... and it will 
regress in handling of cases where it has been done.

I think it would be better if the algorithm started out by checking that 
both parents starts out as the ancestor and only have been appended to. 
If that is the case your new checks don't show any conflicts, it should 
take the new changes from one side and apply that on top of the other. 
That would be less intrusive and mostly address my concerns.

I have clarified http://mercurial.selenic.com/wiki/TagDesign a bit - it 
should be updated if a patch like this is accepted.

The tests should make sure to cover tricky cases like divergent change 
and removal of tags and that it is handled 'safely'.

/Mads
Angel Ezquerra - Feb. 23, 2014, 2:19 a.m.
On Sat, Feb 22, 2014 at 5:59 PM, Mads Kiilerich <mads@kiilerich.com> wrote:
> On 02/19/2014 12:28 AM, Angel Ezquerra wrote:
>>
>> # HG changeset patch
>> # User Angel Ezquerra <angel.ezquerra@gmail.com>
>> # Date 1392597815 -3600
>> #      Mon Feb 17 01:43:35 2014 +0100
>> # Node ID f3eb8304d9bb59e78b50b42a9341a2063e1cb451
>> # Parent  7648e9aef6eeab00a0946e877690e94fb12d389b
>> merge: add automatic tag merge algorithm
>>
>> Try to automatically merge conflicting .hgtags files using the following
>> algorithm:
>> - keep all the tags that are on only one of the merge parents
>> - out of the common tags:
>>      - keep those that are exactly the same (including history) in both
>> parents
>>      - keep those whose "full tag history" (i.e. current node plus tag
>> history)
>>      is a super-set of the other merge parent "full tag history"
>> - If there are no tags left the merge was successful.
>> - If there are any tags left there was a conflict and the automated merge
>> failed
>>      - When this happens, fall back to a regular file merge, which in most
>> cases
>>      should open a merge tool (at which point the user can manually
>> resolve the
>>      conflicts).
>>
>> # Notes:
>> - The algorithm assumes that tags are never removed from the .hgtags file.
>> This
>> should be true in most cases (particularly if the user does not normally
>> modify
>> the .hgtags file on its own). We could improve this merge algorithm to
>> handle
>> that case as well.
>> - When the automated merge algorithm is successful, the merged tags are
>> written
>> in alphabetical order. There main reason is that tags._readtags returns a
>> regular unordered python dict. We could change it to a sorteddict instead.
>> If we keep it this way we should probably change the hg tag command so
>> that it
>> sorts tags when it modifies the .hgtags file as well.
>> - All tests pass without modification. The added test passes as well
>> (while it
>> would fail without this change)
>
>
> I think there are some essential disadvantages with this approach:
> * It will completely rewrite the .hgtags, giving a messy history and making
> it almost impossible to merge manually with old un-sorted .hgtags.
> * Manual fixing of .hgtags will no longer be possible ... and it will
> regress in handling of cases where it has been done.
>
> I think it would be better if the algorithm started out by checking that
> both parents starts out as the ancestor and only have been appended to. If
> that is the case your new checks don't show any conflicts, it should take
> the new changes from one side and apply that on top of the other. That would
> be less intrusive and mostly address my concerns.
>
> I have clarified http://mercurial.selenic.com/wiki/TagDesign a bit - it
> should be updated if a patch like this is accepted.
>
> The tests should make sure to cover tricky cases like divergent change and
> removal of tags and that it is handled 'safely'.
>
> /Mads

Mads,

thanks a lot for your review and your comments.

Sorting the tags when writing them is not strictly necessary. I was
expecting to get negative comments on that particular detail, but I
wanted to get a feel about the general direction of the patch.

The main reason why I sorted the merged tags when writing them to the
.hgtags file is that the _readtags method in tags.py loads tags into a
regular, unsorted python dictionary. Thus it does not preserve the
original tag order. There are several ways to work around this. I
think the best one would be to change tags._readtags to use a sortdict
instead of a regular dict. The sortdict class is currently defined in
config.py so perhaps it would be best to first move it to its own
separate file if we did so.

If we used a sortdict there would be no need to sort the merged tags
when saving them. I don't think there would be any serious performance
impact of doing so thanks to all the tag catching that we do.

I have reworked this patch into a patch series in which I first move
sortdict into its own file, then I use this sortdict in tags._readtags
and finally I introduce the automated merge, with the key difference
that there is no need to sort the tags after a successful merge
anymore. I will send this V2 patch series soon (unless you give me
some new comments before I do).

A nice side benefit of this change would be that we could also change
(and improve!) the way the tag command reads and writes tags. We could
reuse the tags._readtags function and create a a new writetags
function as well. Currently the tag command is "append only" in the
sense that it just adds new entries at the end of the .hgtags file.
This has several undesirable consequences:

- It makes merge conflicts very common (this is what this patch we are
discussing addresses) so in a sense this is moot.
- It creates duplicate entries in the .hgtags file which make manually
merging .hgtags files very confusing. For example, if you first add a
tag A, then add a tag B and finally move the tag A the final .hgtags
file will look something like this:

FIRST_A_REVID A
FIRST_B_REVID B
FIRST_A_REVID A
SECOND_A_REVID A

This is quite weird and hard to understand. Instead it would be best
(IMHO) if the result were:

FIRST_A_REVID A
SECOND_A_REVID A
FIRST_B_REVID B

This would be easier to understand, and easier to merge.

I think this would be an easy (and desirable) fix. I also think that
it would be backwards compatible. The only drawback that I can think
of is that we would need to write the whole .hgtags file when tagging.
I doubt this would be a problem even with a huge number of tags
(specially since tagging is not such a common operation anyway (maybe
at Facebook's scale this would be a problem though?). I will write a
patch for this and add it to my V2 patch series of this patch, unless
you think it is a bad idea.

Also, I agree that I need to add a few more tests to check the cases
in which the merge algorithm is expected to succeed and fail.

Cheers,

Angel
Mads Kiilerich - Feb. 23, 2014, 1:01 p.m.
On 02/23/2014 03:19 AM, Angel Ezquerra wrote:
> On Sat, Feb 22, 2014 at 5:59 PM, Mads Kiilerich <mads@kiilerich.com> wrote:
> Sorting the tags when writing them is not strictly necessary.

Then I suggest leaving it out ... or saving it for later where it can be 
discussed by its own merits.

> I have reworked this patch into a patch series in which I first move
> sortdict into its own file, then I use this sortdict in tags._readtags
> and finally I introduce the automated merge, with the key difference
> that there is no need to sort the tags after a successful merge
> anymore. I will send this V2 patch series soon (unless you give me
> some new comments before I do).

It is not clear to me how using a sortdict would solve the problem ... 
and whether it would be better than alternative simpler solutions. 
Patches might explain it to me.

Another comment would be that I think it would be better to have this in 
filemerge, perhaps as 'internal:tagsmerge' where it in first milestone 
could be enabled with [merge-patterns] .hgignore=internal:tagsmerge .

> A nice side benefit of this change would be that we could also change
> (and improve!) the way the tag command reads and writes tags. We could
> reuse the tags._readtags function and create a a new writetags
> function as well. Currently the tag command is "append only" in the
> sense that it just adds new entries at the end of the .hgtags file.
> This has several undesirable consequences:
>
> - It makes merge conflicts very common (this is what this patch we are
> discussing addresses) so in a sense this is moot.
> - It creates duplicate entries in the .hgtags file which make manually
> merging .hgtags files very confusing. For example, if you first add a
> tag A, then add a tag B and finally move the tag A the final .hgtags
> file will look something like this:
>
> FIRST_A_REVID A
> FIRST_B_REVID B
> FIRST_A_REVID A
> SECOND_A_REVID A
>
> This is quite weird and hard to understand. Instead it would be best
> (IMHO) if the result were:
>
> FIRST_A_REVID A
> SECOND_A_REVID A
> FIRST_B_REVID B
>
> This would be easier to understand, and easier to merge.

You are aware that it was a deliberate design decision to do it that 
way? That ensures that you do get a conflict in some cases that 
otherwise would merge cleanly but incorrectly.

> I think this would be an easy (and desirable) fix. I also think that
> it would be backwards compatible. The only drawback that I can think
> of is that we would need to write the whole .hgtags file when tagging.
> I doubt this would be a problem even with a huge number of tags
> (specially since tagging is not such a common operation anyway (maybe
> at Facebook's scale this would be a problem though?). I will write a
> patch for this and add it to my V2 patch series of this patch, unless
> you think it is a bad idea.

Tagging could have been done differently with different trade-offs. I 
think the changes you are proposing are too invasive and not 
sufficiently backward compatible for this user-facing file, especially 
considering that they don't add significant value. Automatic tag merging 
could however still be nice and valuable.

If we were to make allmost backward compatible changes to .hgtags then I 
would suggest considering taking the full step and leaving history out 
of the sorted file.

/Mads
Matt Mackall - Feb. 23, 2014, 6:10 p.m.
On Wed, 2014-02-19 at 00:28 +0100, Angel Ezquerra wrote:
> # HG changeset patch
> # User Angel Ezquerra <angel.ezquerra@gmail.com>
> # Date 1392597815 -3600
> #      Mon Feb 17 01:43:35 2014 +0100
> # Node ID f3eb8304d9bb59e78b50b42a9341a2063e1cb451
> # Parent  7648e9aef6eeab00a0946e877690e94fb12d389b
> merge: add automatic tag merge algorithm

FYI, this is perhaps the third time a patch to do this has been
presented. Have you looked at the previous attempts? How does this
compare?
Angel Ezquerra - Feb. 25, 2014, 8:09 p.m.
On Sun, Feb 23, 2014 at 7:10 PM, Matt Mackall <mpm@selenic.com> wrote:
> On Wed, 2014-02-19 at 00:28 +0100, Angel Ezquerra wrote:
>> # HG changeset patch
>> # User Angel Ezquerra <angel.ezquerra@gmail.com>
>> # Date 1392597815 -3600
>> #      Mon Feb 17 01:43:35 2014 +0100
>> # Node ID f3eb8304d9bb59e78b50b42a9341a2063e1cb451
>> # Parent  7648e9aef6eeab00a0946e877690e94fb12d389b
>> merge: add automatic tag merge algorithm
>
> FYI, this is perhaps the third time a patch to do this has been
> presented. Have you looked at the previous attempts? How does this
> compare?

Thanks for the heads up. This has led me into a nice archeological
expedition down mercurial's history. It is quite interesting to see
how we got to the current global tag resolution algorithm. I think
that now I understand it better and I appreciate the subtle corner
cases that must be taken into account when calculating the global tag
set.

Probably my search-fu failed me, but I did not find any actual patches
proposing a concrete hagtags merge algorithm. However I found a lot of
discussions on how one such algorithm should behave. I also read a lot
about the different ways in which a potential tag merge algorithm
could fail.

To answer your original question, the algorithm I propose has the
advantage that it does not try to resolve all possible tag merge
conflict scenarios. Instead it focuses and automatically merging a
couple of conflict types which have obvious solutions (i.e that could
be done mechanically by a user by following a simple recipe). One of
them is IMHO the most common type of tag merge conflict, so fixing it
will give a lot of bang for the buck. In particular, the proposed
algorithm limits itself to automatically handling the following two
scenarios:

1. Two (or more) _different_ tags are added (and potentially also
moved / removed) on top of the existing tags in the two different
topological branches being merged: Currently this _always_ results in
a silly merge conflict, one that a user can mechanically resolve. The
algorithm just detects that particular scenario and makes the obvious
decision, which is to put these new "tag histories" at the end of the
hgtags file.

2. The same tag is added / removed / moved back an forth in the two
branches that are being merged. However, the set of operations on that
particular tag in one branch is a subset of the operations done on the
other branch. In that case the algorithm will also do the obvious,
which is to keep the biggest set of tag ops.

In all other cases in which we detect a conflict we revert to a
regular text merge (i.e. the current behavior). In particular I
explicitly avoided handling the case in which a given tag diverges
between the two branches being merged. That is the case that seems to
pop up most often on all past discussions on this matter. I have a
couple ideas on how this could be handled in some particular cases
(e.g. we could take the base .hgtags file into account and/or we could
try to make sure that the merged .hgtags is a subset of the "global
tags" that is calculated in tags.py). However that can be left for
another time.

Note that contrary to what my V1 patch says the algorithm does not
depend at all on changing the ordering of the .hgtags file. My V1
patch does sort the tags alphabetically but that was only because
tags._readtags does not return the tag info in order. I am preparing
V2 of my patch which does not need to do this anymore (by using a
sortdict rather than a regular dict).

Cheers,

Angel

Patch

diff --git a/mercurial/merge.py b/mercurial/merge.py
--- a/mercurial/merge.py
+++ b/mercurial/merge.py
@@ -10,6 +10,7 @@ 
 from mercurial import obsolete
 import error, util, filemerge, copies, subrepo, worker, dicthelpers
 import errno, os, shutil
+import tags
 
 class mergestate(object):
     '''track 3-way merge state of individual files'''
@@ -519,7 +520,13 @@ 
                 subrepo.submerge(repo, wctx, mctx, wctx.ancestor(mctx),
                                  overwrite)
                 continue
-            audit(fd)
+            elif fd == '.hgtags':  # tags need merging
+                if not tags.tagmerge(repo, wctx, mctx):
+                    merged += 1
+                    ms.mark(fd, 'r')
+                    continue
+            else:
+                audit(fd)
             r = ms.resolve(fd, wctx, mctx)
             if r is not None and r > 0:
                 unresolved += 1
diff --git a/mercurial/tags.py b/mercurial/tags.py
--- a/mercurial/tags.py
+++ b/mercurial/tags.py
@@ -298,3 +298,53 @@ 
         cachefile.close()
     except (OSError, IOError):
         pass
+
+def tagmerge(repo, p1ctx, p2ctx):
+    '''Merge the tags of two revisions'''
+    ui = repo.ui
+    ui.note(_('merging .hgtags'))
+    p1tags = _readtags(
+        ui, repo, p1ctx['.hgtags'].data().splitlines(), "p1 tags")
+    p2tags = _readtags(
+        ui, repo, p2ctx['.hgtags'].data().splitlines(), "p2 tags")
+
+    numconflicts = 0
+    mergedtags = p1tags.copy()
+    for name, nodehist in p2tags.iteritems():
+        if name not in mergedtags:
+            mergedtags[name] = nodehist
+            continue
+        # there is no conflict unless both tags point to different revisions
+        # and have a non overlapping tag history
+        p1node, p1hist = mergedtags[name]
+        p2node, p2hist = nodehist
+        p1fullhist = [p1node] + p1hist
+        p2fullhist = [p2node] + p2hist
+        if p1fullhist == p2fullhist:
+            # mergetags already contains this tag data which it got from p1
+            continue
+        if p1fullhist == p2fullhist[:len(p1fullhist)]:
+            # p1 tag history is a subset of p2 tag history
+            mergednode, mergedhist = p2node, p2hist
+        elif p2fullhist == p1fullhist[:len(p2fullhist)]:
+            # p2 tag history is a subset of p1 tag history
+            mergednode, mergedhist = p1node, p1hist
+        else:
+            numconflicts += 1
+            continue
+        mergedtags[name] = mergednode, mergedhist
+
+    if not numconflicts:
+        # Write the merged .hgtags file
+        fp = repo.wfile('.hgtags', 'wb')
+        for name in sorted(mergedtags):
+            node, hist = mergedtags[name]
+            hist.append(node)
+            for nd in hist:
+                fp.write('%s %s\n' % (hex(nd), name))
+        fp.close()
+        ui.note(_('.hgtags merged successfully'))
+        return None
+
+    ui.warn(_('.hgtags merge failed (%d tag conflicts)') % numconflicts)
+    return numconflicts
diff --git a/tests/test-tag.t b/tests/test-tag.t
--- a/tests/test-tag.t
+++ b/tests/test-tag.t
@@ -271,6 +271,53 @@ 
   abort: cannot tag null revision
   [255]
 
+merging tags automatically
+
+  $ hg up -C new-topo-head
+  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ hg tag -f new-topo-head-2
+  $ hg status -mard
+  $ hg merge -r 15
+  0 files updated, 1 files merged, 0 files removed, 0 files unresolved
+  (branch merge, don't forget to commit)
+  $ hg status -mard
+  M .hgtags
+  $ hg resolve -l
+  R .hgtags
+  $ cat .hgtags
+  75a534207be6b03576e0c7a4fa5708d045f1c876 custom-tag
+  acb14030fe0a21b60322c440ad2d20cf7685a376 foobar
+  0f26aaea6f74c3ed6c4aad8844403c9ba128d23a new-topo-head
+  0f26aaea6f74c3ed6c4aad8844403c9ba128d23a new-topo-head-2
+  a0eea09de1eeec777b46f2085260a373b2fbc293 newline
+  fc93d2ea1cd78e91216c6cfbbf26747c10ce11ae tag-and-branch-same-name
+  $ hg diff -r 'p1()' .hgtags
+  diff -r 0b98c6042b51 .hgtags
+  --- a/.hgtags	Thu Jan 01 00:00:00 1970 +0000
+  +++ b/.hgtags	Tue Feb 18 23:21:31 2014 +0000
+  @@ -1,5 +1,6 @@
+  +75a534207be6b03576e0c7a4fa5708d045f1c876 custom-tag
+   acb14030fe0a21b60322c440ad2d20cf7685a376 foobar
+  +0f26aaea6f74c3ed6c4aad8844403c9ba128d23a new-topo-head
+  +0f26aaea6f74c3ed6c4aad8844403c9ba128d23a new-topo-head-2
+   a0eea09de1eeec777b46f2085260a373b2fbc293 newline
+   fc93d2ea1cd78e91216c6cfbbf26747c10ce11ae tag-and-branch-same-name
+  -75a534207be6b03576e0c7a4fa5708d045f1c876 custom-tag
+  -0f26aaea6f74c3ed6c4aad8844403c9ba128d23a new-topo-head-2
+  $ hg diff -r 'p2()' .hgtags
+  diff -r ae5915ac112b .hgtags
+  --- a/.hgtags	Thu Jan 01 00:00:00 1970 +0000
+  +++ b/.hgtags	Tue Feb 18 23:21:31 2014 +0000
+  @@ -1,5 +1,6 @@
+  +75a534207be6b03576e0c7a4fa5708d045f1c876 custom-tag
+   acb14030fe0a21b60322c440ad2d20cf7685a376 foobar
+  +0f26aaea6f74c3ed6c4aad8844403c9ba128d23a new-topo-head
+  +0f26aaea6f74c3ed6c4aad8844403c9ba128d23a new-topo-head-2
+   a0eea09de1eeec777b46f2085260a373b2fbc293 newline
+   fc93d2ea1cd78e91216c6cfbbf26747c10ce11ae tag-and-branch-same-name
+  -75a534207be6b03576e0c7a4fa5708d045f1c876 custom-tag
+  -0f26aaea6f74c3ed6c4aad8844403c9ba128d23a new-topo-head
+
   $ cd ..
 
 tagging on an uncommitted merge (issue2542)