Patchwork [STABLE] graft: support grafting across renames (issue4028)

login
register
mail settings
Submitter Gábor Stefanik
Date July 25, 2016, 6:16 p.m.
Message ID <395d3fa2ac89fad199c9.1469470592@waste.org>
Download mbox | patch
Permalink /patch/15985/
State Changes Requested
Headers show

Comments

Gábor Stefanik - July 25, 2016, 6:16 p.m.
# HG changeset patch
# User Gábor Stefanik <gabor.stefanik@nng.com>
# Date 1469470573 -7200
#      Mon Jul 25 20:16:13 2016 +0200
# Branch stable
# Node ID 395d3fa2ac89fad199c99cc137bf801502292325
# Parent  9c2cc107547fd701a7604349632f2f590366f17c
graft: support grafting across renames (issue4028)

Graft performs a merge with a false common ancestor, which must be taken into
account when tracking renames. Compute the real common ancestor in this case,
and track renames between the real and false common ancestors in reverse.
Yuya Nishihara - July 26, 2016, 3:07 p.m.
On Mon, 25 Jul 2016 13:16:32 -0500, Gábor Stefanik wrote:
> # HG changeset patch
> # User Gábor Stefanik <gabor.stefanik@nng.com>
> # Date 1469470573 -7200
> #      Mon Jul 25 20:16:13 2016 +0200
> # Branch stable
> # Node ID 395d3fa2ac89fad199c99cc137bf801502292325
> # Parent  9c2cc107547fd701a7604349632f2f590366f17c
> graft: support grafting across renames (issue4028)
>
> Graft performs a merge with a false common ancestor, which must be taken into
> account when tracking renames. Compute the real common ancestor in this case,
> and track renames between the real and false common ancestors in reverse.

I don't know the detail of the copy tracing, I can't tell if this strategy
is correct.

Many tests fails with this patch.

> --- a/mercurial/copies.py
> +++ b/mercurial/copies.py
> @@ -327,13 +327,22 @@
>          return {}, {}, {}, {}
>      repo.ui.debug("  searching for copies back to rev %d\n" % limit)
>  
> +    # graft gives us a false common ancestor, we need to find a real one
> +    true_ca = ca
> +    if not (ca.descendant(c1) and ca.descendant(c2)):
> +        true_ca = c1.ancestor(c2)

It appears that ca.descendant(c1) doesn't work if c1 is a workingctx. And
ctx.descendant() won't be an instant operation.

>      for f in u1:
> -        checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1, fullcopy1)
> +        checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1a, fullcopy1a)
> +        checkcopies(c1, f, m1, m2, true_ca, limit, diverge, copy1b, fullcopy1b)
> +        copy1 = dict((set(copy1a.items()) & set(copy1b.items())) |
> +            (set([(y, x) for (x, y) in copy1a.items()])-
> +             set([(y, x) for (x, y) in copy1b.items()])))

I'm not sure if we can invert a dest-src dict which is keyed by dest. Also,
it smells in that copy1 is built from (x, y) dict and (y, x) dict.

And I guess checkcopies() isn't fast. If ca == true_ca, 2nd checkcopies()
call is redundant.
Matt Mackall - July 26, 2016, 5:39 p.m.
On Wed, 2016-07-27 at 00:07 +0900, Yuya Nishihara wrote:
> On Mon, 25 Jul 2016 13:16:32 -0500, Gábor Stefanik wrote:
> > 
> > # HG changeset patch
> > # User Gábor Stefanik <gabor.stefanik@nng.com>
> > # Date 1469470573 -7200
> > #      Mon Jul 25 20:16:13 2016 +0200
> > # Branch stable
> > # Node ID 395d3fa2ac89fad199c99cc137bf801502292325
> > # Parent  9c2cc107547fd701a7604349632f2f590366f17c
> > graft: support grafting across renames (issue4028)
> > 
> > Graft performs a merge with a false common ancestor, which must be taken
> > into
> > account when tracking renames. Compute the real common ancestor in this
> > case,
> > and track renames between the real and false common ancestors in reverse.
> I don't know the detail of the copy tracing, I can't tell if this strategy
> is correct.
> 
> Many tests fails with this patch.

Yes, I'm afraid it's much trickier than this.

Basically, the main copy tracing function for merges assumes all merges go
forward in history because it was written before we had any form of rebasing or
grafting.

  ancestor --> a - merge
    \            /
     --------> b 

But when we graft or rebase, we're doing a merge that looks like this:

   ancestor--<--a_parent-> a - b'
        \                     /
         ------------------> b

..and twisting the graph to look like this:

a_parent -------------> a - b'
    \                      /
     --->-ancestor -----> b

The stretch between a_parent and ancestor is backwards in the real history but
gets flipped for our effective merge DAG. That means a rename of x to y in that
span effectively becomes a rename from y to x for the purposes of our operation.
But the spans from a_parent to a and from ancestor to b remain unflipped.
Teaching mergecopies() how to do this amounts to a complete rewrite.

The situation with copies is even more confusing (search the list archives for a
discussion of "ypoc"s).

And it's actually even trickier than this, since the merge ancestor is actually
not a sufficiently large scope for finding relevant copies! See the comment
here:

https://www.selenic.com/hg/file/tip/mercurial/copies.py#l19

Working out how to properly combine -that- with the DAG twist above is non-
trivial.

(You may be wondering: but how does Git do it? Git doesn't store rename/copy
data, it uses heuristics to guess which files have been renamed without even
inspecting the DAG. We'll probably add this capability at some point.)
-- 
Mathematics is the supreme nostalgia of our time.
Augie Fackler - July 27, 2016, 1:14 p.m.
On Mon, Jul 25, 2016 at 01:16:32PM -0500, Gábor Stefanik wrote:
> # HG changeset patch
> # User Gábor Stefanik <gabor.stefanik@nng.com>
> # Date 1469470573 -7200
> #      Mon Jul 25 20:16:13 2016 +0200
> # Branch stable
> # Node ID 395d3fa2ac89fad199c99cc137bf801502292325
> # Parent  9c2cc107547fd701a7604349632f2f590366f17c
> graft: support grafting across renames (issue4028)

Can you construct a test? This feels like the sort of thing that is very likely to regress.

>
> Graft performs a merge with a false common ancestor, which must be taken into
> account when tracking renames. Compute the real common ancestor in this case,
> and track renames between the real and false common ancestors in reverse.
>
> diff --git a/mercurial/copies.py b/mercurial/copies.py
> --- a/mercurial/copies.py
> +++ b/mercurial/copies.py
> @@ -327,13 +327,22 @@
>          return {}, {}, {}, {}
>      repo.ui.debug("  searching for copies back to rev %d\n" % limit)
>
> +    # graft gives us a false common ancestor, we need to find a real one
> +    true_ca = ca
> +    if not (ca.descendant(c1) and ca.descendant(c2)):
> +        true_ca = c1.ancestor(c2)
> +
>      m1 = c1.manifest()
>      m2 = c2.manifest()
> -    ma = ca.manifest()
> +    ma = true_ca.manifest()
>
> -    copy1, copy2, = {}, {}
> +    copy1, copy2 = {}, {}
> +    copy1a, copy2a = {}, {}
> +    copy1b, copy2b = {}, {}
>      movewithdir1, movewithdir2 = {}, {}
>      fullcopy1, fullcopy2 = {}, {}
> +    fullcopy1a, fullcopy2a = {}, {}
> +    fullcopy1b, fullcopy2b = {}, {}
>      diverge = {}
>
>      # find interesting file sets from manifests
> @@ -343,10 +352,18 @@
>      bothnew = sorted(addedinm1 & addedinm2)
>
>      for f in u1:
> -        checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1, fullcopy1)
> +        checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1a, fullcopy1a)
> +        checkcopies(c1, f, m1, m2, true_ca, limit, diverge, copy1b, fullcopy1b)
> +        copy1 = dict((set(copy1a.items()) & set(copy1b.items())) |
> +            (set([(y, x) for (x, y) in copy1a.items()])-
> +             set([(y, x) for (x, y) in copy1b.items()])))
>
>      for f in u2:
> -        checkcopies(c2, f, m2, m1, ca, limit, diverge, copy2, fullcopy2)
> +        checkcopies(c2, f, m2, m1, ca, limit, diverge, copy2a, fullcopy2a)
> +        checkcopies(c2, f, m2, m1, true_ca, limit, diverge, copy2b, fullcopy2b)
> +        copy2 = dict((set(copy2a.items()) & set(copy2b.items())) |
> +            (set([(y, x) for (x, y) in copy2a.items()])-
> +             set([(y, x) for (x, y) in copy2b.items()])))
>
>      copy = dict(copy1.items() + copy2.items())
>      movewithdir = dict(movewithdir1.items() + movewithdir2.items())
> @@ -373,6 +390,8 @@
>      for f in bothnew:
>          checkcopies(c1, f, m1, m2, ca, limit, bothdiverge, _copy, _fullcopy)
>          checkcopies(c2, f, m2, m1, ca, limit, bothdiverge, _copy, _fullcopy)
> +        checkcopies(c1, f, m1, m2, true_ca, limit, bothdiverge, _copy, _fullcopy)
> +        checkcopies(c2, f, m2, m1, true_ca, limit, bothdiverge, _copy, _fullcopy)
>      for of, fl in bothdiverge.items():
>          if len(fl) == 2 and fl[0] == fl[1]:
>              copy[fl[0]] = of # not actually divergent, just matching renames
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - July 27, 2016, 3:05 p.m.
On Tue, 26 Jul 2016 12:39:50 -0500, Matt Mackall wrote:
> On Wed, 2016-07-27 at 00:07 +0900, Yuya Nishihara wrote:
> > On Mon, 25 Jul 2016 13:16:32 -0500, Gábor Stefanik wrote:
> > > # HG changeset patch
> > > # User Gábor Stefanik <gabor.stefanik@nng.com>
> > > # Date 1469470573 -7200
> > > #      Mon Jul 25 20:16:13 2016 +0200
> > > # Branch stable
> > > # Node ID 395d3fa2ac89fad199c99cc137bf801502292325
> > > # Parent  9c2cc107547fd701a7604349632f2f590366f17c
> > > graft: support grafting across renames (issue4028)
> > > 
> > > Graft performs a merge with a false common ancestor, which must be taken
> > > into
> > > account when tracking renames. Compute the real common ancestor in this
> > > case,
> > > and track renames between the real and false common ancestors in reverse.
> > I don't know the detail of the copy tracing, I can't tell if this strategy
> > is correct.
> > 
> > Many tests fails with this patch.
> 
> Yes, I'm afraid it's much trickier than this.
> 
> Basically, the main copy tracing function for merges assumes all merges go
> forward in history because it was written before we had any form of rebasing or
> grafting.
> 
>   ancestor --> a - merge
>     \            /
>      --------> b 
> 
> But when we graft or rebase, we're doing a merge that looks like this:
> 
>    ancestor--<--a_parent-> a - b'
>         \                     /
>          ------------------> b
> 
> ..and twisting the graph to look like this:
> 
> a_parent -------------> a - b'
>     \                      /
>      --->-ancestor -----> b
> 
> The stretch between a_parent and ancestor is backwards in the real history but
> gets flipped for our effective merge DAG. That means a rename of x to y in that
> span effectively becomes a rename from y to x for the purposes of our operation.
> But the spans from a_parent to a and from ancestor to b remain unflipped.
> Teaching mergecopies() how to do this amounts to a complete rewrite.

Ah, thanks. I think I finally understand the basic model.

> The situation with copies is even more confusing (search the list archives for a
> discussion of "ypoc"s).

For reference,

https://www.mercurial-scm.org/pipermail/mercurial-devel/2012-December/047183.html

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -327,13 +327,22 @@ 
         return {}, {}, {}, {}
     repo.ui.debug("  searching for copies back to rev %d\n" % limit)
 
+    # graft gives us a false common ancestor, we need to find a real one
+    true_ca = ca
+    if not (ca.descendant(c1) and ca.descendant(c2)):
+        true_ca = c1.ancestor(c2)
+    
     m1 = c1.manifest()
     m2 = c2.manifest()
-    ma = ca.manifest()
+    ma = true_ca.manifest()
 
-    copy1, copy2, = {}, {}
+    copy1, copy2 = {}, {}
+    copy1a, copy2a = {}, {}
+    copy1b, copy2b = {}, {}
     movewithdir1, movewithdir2 = {}, {}
     fullcopy1, fullcopy2 = {}, {}
+    fullcopy1a, fullcopy2a = {}, {}
+    fullcopy1b, fullcopy2b = {}, {}
     diverge = {}
 
     # find interesting file sets from manifests
@@ -343,10 +352,18 @@ 
     bothnew = sorted(addedinm1 & addedinm2)
 
     for f in u1:
-        checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1, fullcopy1)
+        checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1a, fullcopy1a)
+        checkcopies(c1, f, m1, m2, true_ca, limit, diverge, copy1b, fullcopy1b)
+        copy1 = dict((set(copy1a.items()) & set(copy1b.items())) |
+            (set([(y, x) for (x, y) in copy1a.items()])-
+             set([(y, x) for (x, y) in copy1b.items()])))
 
     for f in u2:
-        checkcopies(c2, f, m2, m1, ca, limit, diverge, copy2, fullcopy2)
+        checkcopies(c2, f, m2, m1, ca, limit, diverge, copy2a, fullcopy2a)
+        checkcopies(c2, f, m2, m1, true_ca, limit, diverge, copy2b, fullcopy2b)
+        copy2 = dict((set(copy2a.items()) & set(copy2b.items())) |
+            (set([(y, x) for (x, y) in copy2a.items()])-
+             set([(y, x) for (x, y) in copy2b.items()])))
 
     copy = dict(copy1.items() + copy2.items())
     movewithdir = dict(movewithdir1.items() + movewithdir2.items())
@@ -373,6 +390,8 @@ 
     for f in bothnew:
         checkcopies(c1, f, m1, m2, ca, limit, bothdiverge, _copy, _fullcopy)
         checkcopies(c2, f, m2, m1, ca, limit, bothdiverge, _copy, _fullcopy)
+        checkcopies(c1, f, m1, m2, true_ca, limit, bothdiverge, _copy, _fullcopy)
+        checkcopies(c2, f, m2, m1, true_ca, limit, bothdiverge, _copy, _fullcopy)
     for of, fl in bothdiverge.items():
         if len(fl) == 2 and fl[0] == fl[1]:
             copy[fl[0]] = of # not actually divergent, just matching renames