Submitter | Gábor Stefanik |
---|---|
Date | July 25, 2016, 6:16 p.m. |
Message ID | <395d3fa2ac89fad199c9.1469470592@waste.org> |
Download | mbox | patch |
Permalink | /patch/15985/ |
State | Changes Requested |
Headers | show |
Comments
On Mon, 25 Jul 2016 13:16:32 -0500, Gábor Stefanik wrote: > # HG changeset patch > # User Gábor Stefanik <gabor.stefanik@nng.com> > # Date 1469470573 -7200 > # Mon Jul 25 20:16:13 2016 +0200 > # Branch stable > # Node ID 395d3fa2ac89fad199c99cc137bf801502292325 > # Parent 9c2cc107547fd701a7604349632f2f590366f17c > graft: support grafting across renames (issue4028) > > Graft performs a merge with a false common ancestor, which must be taken into > account when tracking renames. Compute the real common ancestor in this case, > and track renames between the real and false common ancestors in reverse. I don't know the detail of the copy tracing, I can't tell if this strategy is correct. Many tests fails with this patch. > --- a/mercurial/copies.py > +++ b/mercurial/copies.py > @@ -327,13 +327,22 @@ > return {}, {}, {}, {} > repo.ui.debug(" searching for copies back to rev %d\n" % limit) > > + # graft gives us a false common ancestor, we need to find a real one > + true_ca = ca > + if not (ca.descendant(c1) and ca.descendant(c2)): > + true_ca = c1.ancestor(c2) It appears that ca.descendant(c1) doesn't work if c1 is a workingctx. And ctx.descendant() won't be an instant operation. > for f in u1: > - checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1, fullcopy1) > + checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1a, fullcopy1a) > + checkcopies(c1, f, m1, m2, true_ca, limit, diverge, copy1b, fullcopy1b) > + copy1 = dict((set(copy1a.items()) & set(copy1b.items())) | > + (set([(y, x) for (x, y) in copy1a.items()])- > + set([(y, x) for (x, y) in copy1b.items()]))) I'm not sure if we can invert a dest-src dict which is keyed by dest. Also, it smells in that copy1 is built from (x, y) dict and (y, x) dict. And I guess checkcopies() isn't fast. If ca == true_ca, 2nd checkcopies() call is redundant.
On Wed, 2016-07-27 at 00:07 +0900, Yuya Nishihara wrote: > On Mon, 25 Jul 2016 13:16:32 -0500, Gábor Stefanik wrote: > > > > # HG changeset patch > > # User Gábor Stefanik <gabor.stefanik@nng.com> > > # Date 1469470573 -7200 > > # Mon Jul 25 20:16:13 2016 +0200 > > # Branch stable > > # Node ID 395d3fa2ac89fad199c99cc137bf801502292325 > > # Parent 9c2cc107547fd701a7604349632f2f590366f17c > > graft: support grafting across renames (issue4028) > > > > Graft performs a merge with a false common ancestor, which must be taken > > into > > account when tracking renames. Compute the real common ancestor in this > > case, > > and track renames between the real and false common ancestors in reverse. > I don't know the detail of the copy tracing, I can't tell if this strategy > is correct. > > Many tests fails with this patch. Yes, I'm afraid it's much trickier than this. Basically, the main copy tracing function for merges assumes all merges go forward in history because it was written before we had any form of rebasing or grafting. ancestor --> a - merge \ / --------> b But when we graft or rebase, we're doing a merge that looks like this: ancestor--<--a_parent-> a - b' \ / ------------------> b ..and twisting the graph to look like this: a_parent -------------> a - b' \ / --->-ancestor -----> b The stretch between a_parent and ancestor is backwards in the real history but gets flipped for our effective merge DAG. That means a rename of x to y in that span effectively becomes a rename from y to x for the purposes of our operation. But the spans from a_parent to a and from ancestor to b remain unflipped. Teaching mergecopies() how to do this amounts to a complete rewrite. The situation with copies is even more confusing (search the list archives for a discussion of "ypoc"s). And it's actually even trickier than this, since the merge ancestor is actually not a sufficiently large scope for finding relevant copies! See the comment here: https://www.selenic.com/hg/file/tip/mercurial/copies.py#l19 Working out how to properly combine -that- with the DAG twist above is non- trivial. (You may be wondering: but how does Git do it? Git doesn't store rename/copy data, it uses heuristics to guess which files have been renamed without even inspecting the DAG. We'll probably add this capability at some point.) -- Mathematics is the supreme nostalgia of our time.
On Mon, Jul 25, 2016 at 01:16:32PM -0500, Gábor Stefanik wrote: > # HG changeset patch > # User Gábor Stefanik <gabor.stefanik@nng.com> > # Date 1469470573 -7200 > # Mon Jul 25 20:16:13 2016 +0200 > # Branch stable > # Node ID 395d3fa2ac89fad199c99cc137bf801502292325 > # Parent 9c2cc107547fd701a7604349632f2f590366f17c > graft: support grafting across renames (issue4028) Can you construct a test? This feels like the sort of thing that is very likely to regress. > > Graft performs a merge with a false common ancestor, which must be taken into > account when tracking renames. Compute the real common ancestor in this case, > and track renames between the real and false common ancestors in reverse. > > diff --git a/mercurial/copies.py b/mercurial/copies.py > --- a/mercurial/copies.py > +++ b/mercurial/copies.py > @@ -327,13 +327,22 @@ > return {}, {}, {}, {} > repo.ui.debug(" searching for copies back to rev %d\n" % limit) > > + # graft gives us a false common ancestor, we need to find a real one > + true_ca = ca > + if not (ca.descendant(c1) and ca.descendant(c2)): > + true_ca = c1.ancestor(c2) > + > m1 = c1.manifest() > m2 = c2.manifest() > - ma = ca.manifest() > + ma = true_ca.manifest() > > - copy1, copy2, = {}, {} > + copy1, copy2 = {}, {} > + copy1a, copy2a = {}, {} > + copy1b, copy2b = {}, {} > movewithdir1, movewithdir2 = {}, {} > fullcopy1, fullcopy2 = {}, {} > + fullcopy1a, fullcopy2a = {}, {} > + fullcopy1b, fullcopy2b = {}, {} > diverge = {} > > # find interesting file sets from manifests > @@ -343,10 +352,18 @@ > bothnew = sorted(addedinm1 & addedinm2) > > for f in u1: > - checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1, fullcopy1) > + checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1a, fullcopy1a) > + checkcopies(c1, f, m1, m2, true_ca, limit, diverge, copy1b, fullcopy1b) > + copy1 = dict((set(copy1a.items()) & set(copy1b.items())) | > + (set([(y, x) for (x, y) in copy1a.items()])- > + set([(y, x) for (x, y) in copy1b.items()]))) > > for f in u2: > - checkcopies(c2, f, m2, m1, ca, limit, diverge, copy2, fullcopy2) > + checkcopies(c2, f, m2, m1, ca, limit, diverge, copy2a, fullcopy2a) > + checkcopies(c2, f, m2, m1, true_ca, limit, diverge, copy2b, fullcopy2b) > + copy2 = dict((set(copy2a.items()) & set(copy2b.items())) | > + (set([(y, x) for (x, y) in copy2a.items()])- > + set([(y, x) for (x, y) in copy2b.items()]))) > > copy = dict(copy1.items() + copy2.items()) > movewithdir = dict(movewithdir1.items() + movewithdir2.items()) > @@ -373,6 +390,8 @@ > for f in bothnew: > checkcopies(c1, f, m1, m2, ca, limit, bothdiverge, _copy, _fullcopy) > checkcopies(c2, f, m2, m1, ca, limit, bothdiverge, _copy, _fullcopy) > + checkcopies(c1, f, m1, m2, true_ca, limit, bothdiverge, _copy, _fullcopy) > + checkcopies(c2, f, m2, m1, true_ca, limit, bothdiverge, _copy, _fullcopy) > for of, fl in bothdiverge.items(): > if len(fl) == 2 and fl[0] == fl[1]: > copy[fl[0]] = of # not actually divergent, just matching renames > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On Tue, 26 Jul 2016 12:39:50 -0500, Matt Mackall wrote: > On Wed, 2016-07-27 at 00:07 +0900, Yuya Nishihara wrote: > > On Mon, 25 Jul 2016 13:16:32 -0500, Gábor Stefanik wrote: > > > # HG changeset patch > > > # User Gábor Stefanik <gabor.stefanik@nng.com> > > > # Date 1469470573 -7200 > > > # Mon Jul 25 20:16:13 2016 +0200 > > > # Branch stable > > > # Node ID 395d3fa2ac89fad199c99cc137bf801502292325 > > > # Parent 9c2cc107547fd701a7604349632f2f590366f17c > > > graft: support grafting across renames (issue4028) > > > > > > Graft performs a merge with a false common ancestor, which must be taken > > > into > > > account when tracking renames. Compute the real common ancestor in this > > > case, > > > and track renames between the real and false common ancestors in reverse. > > I don't know the detail of the copy tracing, I can't tell if this strategy > > is correct. > > > > Many tests fails with this patch. > > Yes, I'm afraid it's much trickier than this. > > Basically, the main copy tracing function for merges assumes all merges go > forward in history because it was written before we had any form of rebasing or > grafting. > > ancestor --> a - merge > \ / > --------> b > > But when we graft or rebase, we're doing a merge that looks like this: > > ancestor--<--a_parent-> a - b' > \ / > ------------------> b > > ..and twisting the graph to look like this: > > a_parent -------------> a - b' > \ / > --->-ancestor -----> b > > The stretch between a_parent and ancestor is backwards in the real history but > gets flipped for our effective merge DAG. That means a rename of x to y in that > span effectively becomes a rename from y to x for the purposes of our operation. > But the spans from a_parent to a and from ancestor to b remain unflipped. > Teaching mergecopies() how to do this amounts to a complete rewrite. Ah, thanks. I think I finally understand the basic model. > The situation with copies is even more confusing (search the list archives for a > discussion of "ypoc"s). For reference, https://www.mercurial-scm.org/pipermail/mercurial-devel/2012-December/047183.html
Patch
diff --git a/mercurial/copies.py b/mercurial/copies.py --- a/mercurial/copies.py +++ b/mercurial/copies.py @@ -327,13 +327,22 @@ return {}, {}, {}, {} repo.ui.debug(" searching for copies back to rev %d\n" % limit) + # graft gives us a false common ancestor, we need to find a real one + true_ca = ca + if not (ca.descendant(c1) and ca.descendant(c2)): + true_ca = c1.ancestor(c2) + m1 = c1.manifest() m2 = c2.manifest() - ma = ca.manifest() + ma = true_ca.manifest() - copy1, copy2, = {}, {} + copy1, copy2 = {}, {} + copy1a, copy2a = {}, {} + copy1b, copy2b = {}, {} movewithdir1, movewithdir2 = {}, {} fullcopy1, fullcopy2 = {}, {} + fullcopy1a, fullcopy2a = {}, {} + fullcopy1b, fullcopy2b = {}, {} diverge = {} # find interesting file sets from manifests @@ -343,10 +352,18 @@ bothnew = sorted(addedinm1 & addedinm2) for f in u1: - checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1, fullcopy1) + checkcopies(c1, f, m1, m2, ca, limit, diverge, copy1a, fullcopy1a) + checkcopies(c1, f, m1, m2, true_ca, limit, diverge, copy1b, fullcopy1b) + copy1 = dict((set(copy1a.items()) & set(copy1b.items())) | + (set([(y, x) for (x, y) in copy1a.items()])- + set([(y, x) for (x, y) in copy1b.items()]))) for f in u2: - checkcopies(c2, f, m2, m1, ca, limit, diverge, copy2, fullcopy2) + checkcopies(c2, f, m2, m1, ca, limit, diverge, copy2a, fullcopy2a) + checkcopies(c2, f, m2, m1, true_ca, limit, diverge, copy2b, fullcopy2b) + copy2 = dict((set(copy2a.items()) & set(copy2b.items())) | + (set([(y, x) for (x, y) in copy2a.items()])- + set([(y, x) for (x, y) in copy2b.items()]))) copy = dict(copy1.items() + copy2.items()) movewithdir = dict(movewithdir1.items() + movewithdir2.items()) @@ -373,6 +390,8 @@ for f in bothnew: checkcopies(c1, f, m1, m2, ca, limit, bothdiverge, _copy, _fullcopy) checkcopies(c2, f, m2, m1, ca, limit, bothdiverge, _copy, _fullcopy) + checkcopies(c1, f, m1, m2, true_ca, limit, bothdiverge, _copy, _fullcopy) + checkcopies(c2, f, m2, m1, true_ca, limit, bothdiverge, _copy, _fullcopy) for of, fl in bothdiverge.items(): if len(fl) == 2 and fl[0] == fl[1]: copy[fl[0]] = of # not actually divergent, just matching renames