Patchwork [3,of,3,STABLE] mergecopies: reuse ancestry context when traversing file history (issue4537)

login
register
mail settings
Submitter Pierre-Yves David
Date March 20, 2015, 7:41 a.m.
Message ID <6e10c9cb2244ea292f71.1426837295@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/8199/
State Accepted
Commit 9372180b8c9ba4b38f04ac1177190b33aa0ec8a9
Headers show

Comments

Pierre-Yves David - March 20, 2015, 7:41 a.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1426836635 25200
#      Fri Mar 20 00:30:35 2015 -0700
# Branch stable
# Node ID 6e10c9cb2244ea292f71d2a6aba462516e76e2ce
# Parent  2863d21af964d0ff594f9cfb7e2411280716398e
mergecopies: reuse ancestry context when traversing file history (issue4537)

Merge copies is traversing file history in search for copies and renames.
Since 3.3 we are doing "linkrev adjustment" to ensure duplicated filelog entry
does not confuse the traversal. This "linkrev adjustment" involved ancestry
testing and walking in the changeset graph. If we do such walk in the changesets
graph for each file, we end up with a 'O(<changesets>x<files>)' complexity
that create massive issue. For examples, grafting a changeset in Mozilla's repo
moved from 6 seconds to more than 3 minutes.

There is a mechanism to reuse such ancestors computation between all files. But
it has to be manually set up in situation were it make sense to take such
shortcut. This changesets set this mechanism up and bring back the graph time
from 3 minutes to 8 seconds.

To do so, we need a bigger control on the way 'filectx' are instantiated during
each 'checkcopies' calls that 'mergecopies' is doing. We add a new 'setupctx'
that configure and return a 'filectx' factory. The function make sure the
ancestry context is properly created and the factory make sure it is properly
installed on returned 'filectx'.
Matt Mackall - March 20, 2015, 7:22 p.m.
On Fri, 2015-03-20 at 00:41 -0700, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1426836635 25200
> #      Fri Mar 20 00:30:35 2015 -0700
> # Branch stable
> # Node ID 6e10c9cb2244ea292f71d2a6aba462516e76e2ce
> # Parent  2863d21af964d0ff594f9cfb7e2411280716398e
> mergecopies: reuse ancestry context when traversing file history (issue4537)

Queued for stable, thanks.

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -244,18 +244,44 @@  def mergecopies(repo, c1, c2, ca):
         return {}, {}, {}, {}
     m1 = c1.manifest()
     m2 = c2.manifest()
     ma = ca.manifest()
 
-    def makectx(f, n):
-        if len(n) != 20: # in a working context?
-            if c1.rev() is None:
-                return c1.filectx(f)
-            return c2.filectx(f)
-        return repo.filectx(f, fileid=n)
 
-    ctx = util.lrucachefunc(makectx)
+    def setupctx(ctx):
+        """return a 'makectx' function suitable for checkcopies usage from ctx
+
+        We have to resetup the function building 'filectx' for each
+        'checkcopies' to ensure the linkrev adjustement is properly setup for
+        each. Linkrev adjustement is important to avoid bug in rename
+        detection. Moreover, having a proper '_ancestrycontext' setup ensures
+        the performance impact of this adjustment is kept limited. Without it,
+        each file could do a full dag traversal making the time complexity of
+        the operation explode (see issue4537)
+
+        This function exist here mostly to limit the impact on stable. Feel
+        free to refactor on default.
+        """
+        rev = ctx.rev()
+        ac = getattr(ctx, '_ancestrycontext', None)
+        if ac is None:
+            revs = [rev]
+            if rev is None:
+                revs = [p.rev() for p in ctx.parents()]
+            ac = ctx._repo.changelog.ancestors(revs, inclusive=True)
+            ctx._ancestrycontext = ac
+        def makectx(f, n):
+            if len(n) != 20:  # in a working context?
+                if c1.rev() is None:
+                    return c1.filectx(f)
+                return c2.filectx(f)
+            fctx = repo.filectx(f, fileid=n)
+            # setup only needed for filectx not create from a changectx
+            fctx._ancestrycontext = ac
+            fctx._descendantrev = rev
+            return fctx
+        return util.lrucachefunc(makectx)
     copy = {}
     movewithdir = {}
     fullcopy = {}
     diverge = {}
 
@@ -270,13 +296,15 @@  def mergecopies(repo, c1, c2, ca):
     if u2:
         repo.ui.debug("  unmatched files in other:\n   %s\n"
                       % "\n   ".join(u2))
 
     for f in u1:
+        ctx = setupctx(c1)
         checkcopies(ctx, f, m1, m2, ca, limit, diverge, copy, fullcopy)
 
     for f in u2:
+        ctx = setupctx(c2)
         checkcopies(ctx, f, m2, m1, ca, limit, diverge, copy, fullcopy)
 
     renamedelete = {}
     renamedelete2 = set()
     diverge2 = set()
@@ -295,11 +323,13 @@  def mergecopies(repo, c1, c2, ca):
     if bothnew:
         repo.ui.debug("  unmatched files new in both:\n   %s\n"
                       % "\n   ".join(bothnew))
     bothdiverge, _copy, _fullcopy = {}, {}, {}
     for f in bothnew:
+        ctx = setupctx(c1)
         checkcopies(ctx, f, m1, m2, ca, limit, bothdiverge, _copy, _fullcopy)
+        ctx = setupctx(c2)
         checkcopies(ctx, f, m2, m1, ca, limit, bothdiverge, _copy, _fullcopy)
     for of, fl in bothdiverge.items():
         if len(fl) == 2 and fl[0] == fl[1]:
             copy[fl[0]] = of # not actually divergent, just matching renames