Patchwork backout: handle file moves correctly (issue1932)

login
register
mail settings
Submitter Mateusz Kwapich
Date Jan. 8, 2015, 9:39 p.m.
Message ID <b140af105987b9db7a29.1420753140@dev1429.prn1.facebook.com>
Download mbox | patch
Permalink /patch/7393/
State Changes Requested
Headers show

Comments

Mateusz Kwapich - Jan. 8, 2015, 9:39 p.m.
# HG changeset patch
# User Mateusz Kwapich <mitrandir@fb.com>
# Date 1420752870 28800
#      Thu Jan 08 13:34:30 2015 -0800
# Node ID b140af105987b9db7a296d24584a557bdc67fc6e
# Parent  7ad155e13f0f51df8e986a0ec4e58ac9a0ccedbb
backout: handle file moves correctly (issue1932)

The merge logic (ab)used by backout wasn't detecting file moves when the
parameter which should be common ancestor of merge heads was in fact child of
one of them.

This commit is fixing that by if-ing that exact case. Only backout is affected
by this change.
Matt Mackall - Jan. 8, 2015, 11:06 p.m.
On Thu, 2015-01-08 at 13:39 -0800, Mateusz Kwapich wrote:
> # HG changeset patch
> # User Mateusz Kwapich <mitrandir@fb.com>
> # Date 1420752870 28800
> #      Thu Jan 08 13:34:30 2015 -0800
> # Node ID b140af105987b9db7a296d24584a557bdc67fc6e
> # Parent  7ad155e13f0f51df8e986a0ec4e58ac9a0ccedbb
> backout: handle file moves correctly (issue1932)

> The merge logic (ab)used by backout

It's actually not abuse. It's very important core functionality.

>  wasn't detecting file moves when the
> parameter which should be common ancestor of merge heads was in fact child of
> one of them.

In the abstract, merge is simply the operation that given three state
vectors A, X, Y, creates a new state M = A + (X - A) + (Y - A) (where +
is "patch" and - is "diff"). It doesn't matter what the relationships
between the three inputs are. Don't get too distracted that we call 'A'
the 'ancestor' in the code, it's simply more convenient to reason about
in terms of a normal merge.

However, various parts of the merge and copy code were indeed originally
written with the assumption that a would be an ancestor of x and y and
made various assumptions about revlog ordering, which
has implications for every single thing that calls merge in a
non-vanilla fashion: backout, rebase, graft, evolve, histedit, etc..

Most of that is fixed. The primary offender at this point is
mergecopies(). I started down the path to replacing that by writing
pathcopies() a few years back, which doesn't care about direction in
history and deals with most of our non-merge copy-detection needs.

But finishing the job requires rewriting mergecopies() based on
pathcopies() or otherwise making it direction-agnostic... which is what
I've told everyone who's asked about merge+copy bugs for the past few
years.

Since you only mention "backout" in your subject and only hit a very
specific special case, I suspect you're missing most of the related
problems here and are just making things that much more confusing for
whoever comes along to do the larger fix.
Mateusz Kwapich - Jan. 8, 2015, 11:26 p.m.
On 1/8/15, 3:06 PM, "Matt Mackall" <mpm@selenic.com> wrote:

>On Thu, 2015-01-08 at 13:39 -0800, Mateusz Kwapich wrote:
>> # HG changeset patch
>> # User Mateusz Kwapich <mitrandir@fb.com>
>> # Date 1420752870 28800
>> #      Thu Jan 08 13:34:30 2015 -0800
>> # Node ID b140af105987b9db7a296d24584a557bdc67fc6e
>> # Parent  7ad155e13f0f51df8e986a0ec4e58ac9a0ccedbb
>> backout: handle file moves correctly (issue1932)
>
>> The merge logic (ab)used by backout
>
>It's actually not abuse. It's very important core functionality.
>
>>  wasn't detecting file moves when the
>> parameter which should be common ancestor of merge heads was in fact
>>child of
>> one of them.
>
>In the abstract, merge is simply the operation that given three state
>vectors A, X, Y, creates a new state M = A + (X - A) + (Y - A) (where +
>is "patch" and - is "diff"). It doesn't matter what the relationships
>between the three inputs are. Don't get too distracted that we call 'A'
>the 'ancestor' in the code, it's simply more convenient to reason about
>in terms of a normal merge.
>
>However, various parts of the merge and copy code were indeed originally
>written with the assumption that a would be an ancestor of x and y and
>made various assumptions about revlog ordering, which
>has implications for every single thing that calls merge in a
>non-vanilla fashion: backout, rebase, graft, evolve, histedit, etc..
>
>Most of that is fixed. The primary offender at this point is
>mergecopies(). I started down the path to replacing that by writing
>pathcopies() a few years back, which doesn't care about direction in
>history and deals with most of our non-merge copy-detection needs.
>
>But finishing the job requires rewriting mergecopies() based on
>pathcopies() or otherwise making it direction-agnostic... which is what
>I've told everyone who's asked about merge+copy bugs for the past few
>years.
>
>Since you only mention "backout" in your subject and only hit a very
>specific special case, I suspect you're missing most of the related
>problems here and are just making things that much more confusing for
>whoever comes along to do the larger fix.
>
>-- 
>Mathematics is the supreme nostalgia of our time.

Hello Matt,

Thanks for detailed info about merge logic. I see that mergecopies() needs
a direction agnostic rewrite. This patch came from the need for working
backouting moves and copies. It¹s definitely more hotfix than a permanent
solution and yes: it¹s not making the mergecopies() more readable. But I
think I works good in this particular case and it¹s fixing a backout
behavior to match users expectations. So I think before somebody rewrites
mergecopies() in near future (maybe me? :-) ) it will be good to have
backout working correctly.

What do you think?

Best,
Mateusz
Pierre-Yves David - Jan. 9, 2015, 6:47 a.m.
On 01/08/2015 03:26 PM, Mateusz Kwapich wrote:
>
> On 1/8/15, 3:06 PM, "Matt Mackall" <mpm@selenic.com> wrote:
>
>> On Thu, 2015-01-08 at 13:39 -0800, Mateusz Kwapich wrote:
>>> # HG changeset patch
>>> # User Mateusz Kwapich <mitrandir@fb.com>
>>> # Date 1420752870 28800
>>> #      Thu Jan 08 13:34:30 2015 -0800
>>> # Node ID b140af105987b9db7a296d24584a557bdc67fc6e
>>> # Parent  7ad155e13f0f51df8e986a0ec4e58ac9a0ccedbb
>>> backout: handle file moves correctly (issue1932)
>>
>>> The merge logic (ab)used by backout
>>
>> It's actually not abuse. It's very important core functionality.
>>
>>>   wasn't detecting file moves when the
>>> parameter which should be common ancestor of merge heads was in fact
>>> child of
>>> one of them.
>>
>> In the abstract, merge is simply the operation that given three state
>> vectors A, X, Y, creates a new state M = A + (X - A) + (Y - A) (where +
>> is "patch" and - is "diff"). It doesn't matter what the relationships
>> between the three inputs are. Don't get too distracted that we call 'A'
>> the 'ancestor' in the code, it's simply more convenient to reason about
>> in terms of a normal merge.
>>
>> However, various parts of the merge and copy code were indeed originally
>> written with the assumption that a would be an ancestor of x and y and
>> made various assumptions about revlog ordering, which
>> has implications for every single thing that calls merge in a
>> non-vanilla fashion: backout, rebase, graft, evolve, histedit, etc..
>>
>> Most of that is fixed. The primary offender at this point is
>> mergecopies(). I started down the path to replacing that by writing
>> pathcopies() a few years back, which doesn't care about direction in
>> history and deals with most of our non-merge copy-detection needs.
>>
>> But finishing the job requires rewriting mergecopies() based on
>> pathcopies() or otherwise making it direction-agnostic... which is what
>> I've told everyone who's asked about merge+copy bugs for the past few
>> years.
>>
>> Since you only mention "backout" in your subject and only hit a very
>> specific special case, I suspect you're missing most of the related
>> problems here and are just making things that much more confusing for
>> whoever comes along to do the larger fix.
>>
>> --
>> Mathematics is the supreme nostalgia of our time.
>
> Hello Matt,
>
> Thanks for detailed info about merge logic. I see that mergecopies() needs
> a direction agnostic rewrite. This patch came from the need for working
> backouting moves and copies. It¹s definitely more hotfix than a permanent
> solution and yes: it¹s not making the mergecopies() more readable. But I
> think I works good in this particular case and it¹s fixing a backout
> behavior to match users expectations. So I think before somebody rewrites
> mergecopies() in near future (maybe me? :-) ) it will be good to have
> backout working correctly.

Matt have a good point, this copy logic is broken for age and someone 
needs to eventually fix it once and for all. There is a lot of other bug 
affecting people right now related to this (graft, histedit, rebase, 
etc). As Mateusz already spent time analyzing the issue and 
understanding the code, Mateusz is a good candidate for building a fix.

However, If Mateusz want ammo to be lazy, he could point that this issue 
with rename is probably a regression from 3.0.
Pierre-Yves David - Jan. 9, 2015, 7:04 a.m.
On 01/08/2015 01:39 PM, Mateusz Kwapich wrote:
> +
> +def _related(f1, f2, limit):
> +    # Walk back to common ancestor to see if the two files originate
> +    # from the same file. Since workingfilectx's rev() is None it messes
> +    # up the integer comparison logic, hence the pre-step check for
> +    # None (f1 and f2 can only be workingfilectx's initially).
> +
> +    if f1 == f2:
> +        return f1 # a match
> +
> +    g1, g2 = f1.ancestors(), f2.ancestors()
> +    try:
> +        f1r, f2r = f1.rev(), f2.rev()
> +
> +        if f1r is None:
> +            f1 = g1.next()
> +        if f2r is None:
> +            f2 = g2.next()
> +
> +        while True:
> +            f1r, f2r = f1.rev(), f2.rev()
> +            if f1r > f2r:
> +                f1 = g1.next()
> +            elif f2r > f1r:
> +                f2 = g2.next()
> +            elif f1 == f2:
> +                return f1 # a match
> +            elif f1r == f2r or f1r < limit or f2r < limit:
> +                return False # copy no longer relevant
> +    except StopIteration:
> +        return False
> +
> +
>   def checkcopies(ctx, f, m1, m2, ca, limit, diverge, copy, fullcopy):
>       """
>       check possible copies of f from m1 to m2
> @@ -391,37 +439,6 @@
>
>       ma = ca.manifest()
>
> -    def _related(f1, f2, limit):
> -        # Walk back to common ancestor to see if the two files originate
> -        # from the same file. Since workingfilectx's rev() is None it messes
> -        # up the integer comparison logic, hence the pre-step check for
> -        # None (f1 and f2 can only be workingfilectx's initially).
> -
> -        if f1 == f2:
> -            return f1 # a match
> -
> -        g1, g2 = f1.ancestors(), f2.ancestors()
> -        try:
> -            f1r, f2r = f1.rev(), f2.rev()
> -
> -            if f1r is None:
> -                f1 = g1.next()
> -            if f2r is None:
> -                f2 = g2.next()
> -
> -            while True:
> -                f1r, f2r = f1.rev(), f2.rev()
> -                if f1r > f2r:
> -                    f1 = g1.next()
> -                elif f2r > f1r:
> -                    f2 = g2.next()
> -                elif f1 == f2:
> -                    return f1 # a match
> -                elif f1r == f2r or f1r < limit or f2r < limit:
> -                    return False # copy no longer relevant
> -        except StopIteration:
> -            return False
> -

Note: having the extraction of the _related function in the same patch 
is harmful to this patch readability. It is hard to understand what is 
really happening here. The usual way is to extract the closure in a 
previous changeset to clarify the patch with more complex semantic.

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -271,8 +271,23 @@ 
     for f in u1:
         checkcopies(ctx, f, m1, m2, ca, limit, diverge, copy, fullcopy)
 
-    for f in u2:
-        checkcopies(ctx, f, m2, m1, ca, limit, diverge, copy, fullcopy)
+    if c2 not in ca.parents():
+        for f in u2:
+            checkcopies(ctx, f, m2, m1, ca, limit, diverge, copy, fullcopy)
+    else:
+        # we are abusing merge logic during backout and so need to
+        # do different and hacky copy detection in that case
+        st = c2.status(ca)
+        for f in st.added:
+            parents = ca[f].parents()
+            if len(parents) == 1:
+                # yay, it's copy
+                path = parents[0].path()
+                fullcopy[path] = f
+                if f in m1:
+                    if _related(c1[f], c2[path], c2.rev()):
+                        copy[path] = f
+                diverge.setdefault(f, []).append(path)
 
     renamedelete = {}
     renamedelete2 = set()
@@ -374,6 +389,39 @@ 
 
     return copy, movewithdir, diverge, renamedelete
 
+
+def _related(f1, f2, limit):
+    # Walk back to common ancestor to see if the two files originate
+    # from the same file. Since workingfilectx's rev() is None it messes
+    # up the integer comparison logic, hence the pre-step check for
+    # None (f1 and f2 can only be workingfilectx's initially).
+
+    if f1 == f2:
+        return f1 # a match
+
+    g1, g2 = f1.ancestors(), f2.ancestors()
+    try:
+        f1r, f2r = f1.rev(), f2.rev()
+
+        if f1r is None:
+            f1 = g1.next()
+        if f2r is None:
+            f2 = g2.next()
+
+        while True:
+            f1r, f2r = f1.rev(), f2.rev()
+            if f1r > f2r:
+                f1 = g1.next()
+            elif f2r > f1r:
+                f2 = g2.next()
+            elif f1 == f2:
+                return f1 # a match
+            elif f1r == f2r or f1r < limit or f2r < limit:
+                return False # copy no longer relevant
+    except StopIteration:
+        return False
+
+
 def checkcopies(ctx, f, m1, m2, ca, limit, diverge, copy, fullcopy):
     """
     check possible copies of f from m1 to m2
@@ -391,37 +439,6 @@ 
 
     ma = ca.manifest()
 
-    def _related(f1, f2, limit):
-        # Walk back to common ancestor to see if the two files originate
-        # from the same file. Since workingfilectx's rev() is None it messes
-        # up the integer comparison logic, hence the pre-step check for
-        # None (f1 and f2 can only be workingfilectx's initially).
-
-        if f1 == f2:
-            return f1 # a match
-
-        g1, g2 = f1.ancestors(), f2.ancestors()
-        try:
-            f1r, f2r = f1.rev(), f2.rev()
-
-            if f1r is None:
-                f1 = g1.next()
-            if f2r is None:
-                f2 = g2.next()
-
-            while True:
-                f1r, f2r = f1.rev(), f2.rev()
-                if f1r > f2r:
-                    f1 = g1.next()
-                elif f2r > f1r:
-                    f2 = g2.next()
-                elif f1 == f2:
-                    return f1 # a match
-                elif f1r == f2r or f1r < limit or f2r < limit:
-                    return False # copy no longer relevant
-        except StopIteration:
-            return False
-
     of = None
     seen = set([f])
     for oc in ctx(f, m1[f]).ancestors():
diff --git a/tests/test-backout.t b/tests/test-backout.t
--- a/tests/test-backout.t
+++ b/tests/test-backout.t
@@ -125,6 +125,25 @@ 
   commit: (clean)
   update: (current)
 
+backout should handle moves correctly
+  $ echo one > a
+  $ hg commit -A -d '4 0' -m "recreate a"
+  adding a
+  $ hg mv a b
+  $ hg commit -d '5 0' -m "move"
+  $ echo two >> b
+  $ hg commit -d '6 0' -m "append b"
+  $ hg backout -d '7 0' -r 'desc("append b")'
+  reverting b
+  changeset 7:08b301958a07 backs out changeset 6:106def198235
+  $ hg backout -r 'desc("move")'
+  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  changeset ceebf4199a3f backed out, don't forget to commit.
+  $ hg status -C
+  A a
+    b
+  R b
+
 across branch
 
   $ cd ..