Patchwork [V7] copy: add flag for disabling copy tracing

login
register
mail settings
Submitter Durham Goode
Date Aug. 12, 2015, 4:26 a.m.
Message ID <5cc00a1d99579c69a3d4.1439353593@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/10194/
State Accepted
Headers show

Comments

Durham Goode - Aug. 12, 2015, 4:26 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1422386787 28800
#      Tue Jan 27 11:26:27 2015 -0800
# Branch stable
# Node ID 5cc00a1d99579c69a3d4807a58042c74055446f6
# Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
copy: add flag for disabling copy tracing

Copy tracing can be up to 80% of rebase time when rebasing stacks of commits in
large repos (hundreds of thousands of files).  This provides the option of
turning off the majority of copy tracing. It does not turn off _forwardcopies()
since that is used to carry copy information inside a commit across a rebase.

This will affect the situation where a user edits a file, then rebases on top of
commits that have moved that file. The move will not be detected and the user
will have to manually resolve the issue (possibly by redoing the rebase with
this flag off).

The reason to have a flag instead of trying to fix the actual copy tracing
performance is that copy tracing is fundamentally an O(number of files in the
repo) operation.  In order to know if file X in the rebase source was copied
anywhere, we have to walk the filelog for every new file that exists in the
rebase destination (i.e. a file in the destination that is not in the common
ancestor).  Without an index that lets us trace forward (i.e. from file Y in the
common ancestor forward to the rebase destination), it will never be an O(number
of changes in my branch) operation.

In mozilla-central, rebasing a 3 commit stack across 20,000 revs goes from 39s
to 11s.
Augie Fackler - Aug. 12, 2015, 9:15 p.m.
On Tue, Aug 11, 2015 at 09:26:33PM -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1422386787 28800
> #      Tue Jan 27 11:26:27 2015 -0800
> # Branch stable
> # Node ID 5cc00a1d99579c69a3d4807a58042c74055446f6
> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
> copy: add flag for disabling copy tracing
>
> Copy tracing can be up to 80% of rebase time when rebasing stacks of commits in
> large repos (hundreds of thousands of files).  This provides the option of
> turning off the majority of copy tracing. It does not turn off _forwardcopies()
> since that is used to carry copy information inside a commit across a rebase.
>
> This will affect the situation where a user edits a file, then rebases on top of
> commits that have moved that file. The move will not be detected and the user
> will have to manually resolve the issue (possibly by redoing the rebase with
> this flag off).
>
> The reason to have a flag instead of trying to fix the actual copy tracing
> performance is that copy tracing is fundamentally an O(number of files in the
> repo) operation.  In order to know if file X in the rebase source was copied
> anywhere, we have to walk the filelog for every new file that exists in the
> rebase destination (i.e. a file in the destination that is not in the common
> ancestor).  Without an index that lets us trace forward (i.e. from file Y in the
> common ancestor forward to the rebase destination), it will never be an O(number
> of changes in my branch) operation.
>
> In mozilla-central, rebasing a 3 commit stack across 20,000 revs goes from 39s
> to 11s.

LGTM - thanks for adding some supporting documentation in the log message!

(I still hold out hope that some indexing work can be done on
copies/renames and that we'll be able to make that O(n) shrink and
disappear in the common case, and therefore not require
correctness-losing hack options in the future. But that's something we
can iterate on as we go from here.)

>
> diff --git a/mercurial/copies.py b/mercurial/copies.py
> --- a/mercurial/copies.py
> +++ b/mercurial/copies.py
> @@ -185,6 +185,9 @@ def _forwardcopies(a, b, match=None):
>      return cm
>
>  def _backwardrenames(a, b):
> +    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
> +        return {}
> +
>      # Even though we're not taking copies into account, 1:n rename situations
>      # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
>      # arbitrarily pick one of the renames.
> @@ -258,6 +261,12 @@ def mergecopies(repo, c1, c2, ca):
>      if c2.node() is None and c1.node() == repo.dirstate.p1():
>          return repo.dirstate.copies(), {}, {}, {}
>
> +    # Copy trace disabling is explicitly below the node == p1 logic above
> +    # because the logic above is required for a simple copy to be kept across a
> +    # rebase.
> +    if repo.ui.configbool('experimental', 'disablecopytrace'):
> +        return {}, {}, {}, {}
> +
>      limit = _findlimit(repo, c1.rev(), c2.rev())
>      if limit is None:
>          # no common ancestor, no copies
> @@ -507,7 +516,12 @@ def duplicatecopies(repo, rev, fromrev,
>      copies between fromrev and rev.
>      '''
>      exclude = {}
> -    if skiprev is not None:
> +    if (skiprev is not None and
> +        not repo.ui.configbool('experimental', 'disablecopytrace')):
> +        # disablecopytrace skips this line, but not the entire function because
> +        # the line below is O(size of the repo) during a rebase, while the rest
> +        # of the function is much faster (and is required for carrying copy
> +        # metadata across the rebase anyway).
>          exclude = pathcopies(repo[fromrev], repo[skiprev])
>      for dst, src in pathcopies(repo[fromrev], repo[rev]).iteritems():
>          # copies.pathcopies returns backward renames, so dst might not
> diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
> --- a/tests/test-copy-move-merge.t
> +++ b/tests/test-copy-move-merge.t
> @@ -59,4 +59,107 @@ file c
>    1
>    2
>
> +Test disabling copy tracing
> +
> +- first verify copy metadata was kept
> +
> +  $ hg up -qC 2
> +  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
> +  rebasing 2:add3f11052fa "other" (tip)
> +  merging b and a to b
> +  merging c and a to c
> +
> +  $ cat b
> +  0
> +  1
> +  2
> +
> +- next verify copy metadata is lost when disabled
> +
> +  $ hg strip -r . --config extensions.strip=
> +  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  saved backup bundle to $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
> +  $ hg up -qC 2
> +  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
> +  rebasing 2:add3f11052fa "other" (tip)
> +  remote changed a which local deleted
> +  use (c)hanged version or leave (d)eleted? c
> +
> +  $ cat b
> +  1
> +  2
> +
>    $ cd ..
> +
> +Verify disabling copy tracing still keeps copies from rebase source
> +
> +  $ hg init copydisable
> +  $ cd copydisable
> +  $ touch a
> +  $ hg ci -Aqm 'add a'
> +  $ touch b
> +  $ hg ci -Aqm 'add b, c'
> +  $ hg cp b x
> +  $ echo x >> x
> +  $ hg ci -qm 'copy b->x'
> +  $ hg up -q 1
> +  $ touch z
> +  $ hg ci -Aqm 'add z'
> +  $ hg log -G -T '{rev} {desc}\n'
> +  @  3 add z
> +  |
> +  | o  2 copy b->x
> +  |/
> +  o  1 add b, c
> +  |
> +  o  0 add a
> +
> +  $ hg rebase -d . -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
> +  rebasing 2:6adcf8c12e7d "copy b->x"
> +  saved backup bundle to $TESTTMP/copydisable/.hg/strip-backup/6adcf8c12e7d-ce4b3e75-backup.hg (glob)
> +  $ hg up -q 3
> +  $ hg log -f x -T '{rev} {desc}\n'
> +  3 copy b->x
> +  1 add b, c
> +
> +  $ cd ../
> +
> +Verify we duplicate existing copies, instead of detecting them
> +
> +  $ hg init copydisable3
> +  $ cd copydisable3
> +  $ touch a
> +  $ hg ci -Aqm 'add a'
> +  $ hg cp a b
> +  $ hg ci -Aqm 'copy a->b'
> +  $ hg mv b c
> +  $ hg ci -Aqm 'move b->c'
> +  $ hg up -q 0
> +  $ hg cp a b
> +  $ echo b >> b
> +  $ hg ci -Aqm 'copy a->b (2)'
> +  $ hg log -G -T '{rev} {desc}\n'
> +  @  3 copy a->b (2)
> +  |
> +  | o  2 move b->c
> +  | |
> +  | o  1 copy a->b
> +  |/
> +  o  0 add a
> +
> +  $ hg rebase -d 2 -s 3 --config extensions.rebase= --config experimental.disablecopytrace=True
> +  rebasing 3:47e1a9e6273b "copy a->b (2)" (tip)
> +  saved backup bundle to $TESTTMP/copydisable3/.hg/strip-backup/47e1a9e6273b-2d099c59-backup.hg (glob)
> +
> +  $ hg log -G -f b
> +  @  changeset:   3:76024fb4b05b
> +  |  tag:         tip
> +  |  user:        test
> +  |  date:        Thu Jan 01 00:00:00 1970 +0000
> +  |  summary:     copy a->b (2)
> +  |
> +  o  changeset:   0:ac82d8b1f7c4
> +     user:        test
> +     date:        Thu Jan 01 00:00:00 1970 +0000
> +     summary:     add a
> +
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -185,6 +185,9 @@  def _forwardcopies(a, b, match=None):
     return cm
 
 def _backwardrenames(a, b):
+    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
+        return {}
+
     # Even though we're not taking copies into account, 1:n rename situations
     # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
     # arbitrarily pick one of the renames.
@@ -258,6 +261,12 @@  def mergecopies(repo, c1, c2, ca):
     if c2.node() is None and c1.node() == repo.dirstate.p1():
         return repo.dirstate.copies(), {}, {}, {}
 
+    # Copy trace disabling is explicitly below the node == p1 logic above
+    # because the logic above is required for a simple copy to be kept across a
+    # rebase.
+    if repo.ui.configbool('experimental', 'disablecopytrace'):
+        return {}, {}, {}, {}
+
     limit = _findlimit(repo, c1.rev(), c2.rev())
     if limit is None:
         # no common ancestor, no copies
@@ -507,7 +516,12 @@  def duplicatecopies(repo, rev, fromrev, 
     copies between fromrev and rev.
     '''
     exclude = {}
-    if skiprev is not None:
+    if (skiprev is not None and
+        not repo.ui.configbool('experimental', 'disablecopytrace')):
+        # disablecopytrace skips this line, but not the entire function because
+        # the line below is O(size of the repo) during a rebase, while the rest
+        # of the function is much faster (and is required for carrying copy
+        # metadata across the rebase anyway).
         exclude = pathcopies(repo[fromrev], repo[skiprev])
     for dst, src in pathcopies(repo[fromrev], repo[rev]).iteritems():
         # copies.pathcopies returns backward renames, so dst might not
diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
--- a/tests/test-copy-move-merge.t
+++ b/tests/test-copy-move-merge.t
@@ -59,4 +59,107 @@  file c
   1
   2
 
+Test disabling copy tracing
+
+- first verify copy metadata was kept
+
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
+  rebasing 2:add3f11052fa "other" (tip)
+  merging b and a to b
+  merging c and a to c
+
+  $ cat b
+  0
+  1
+  2
+
+- next verify copy metadata is lost when disabled
+
+  $ hg strip -r . --config extensions.strip=
+  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  saved backup bundle to $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 2:add3f11052fa "other" (tip)
+  remote changed a which local deleted
+  use (c)hanged version or leave (d)eleted? c
+
+  $ cat b
+  1
+  2
+
   $ cd ..
+
+Verify disabling copy tracing still keeps copies from rebase source
+
+  $ hg init copydisable
+  $ cd copydisable
+  $ touch a
+  $ hg ci -Aqm 'add a'
+  $ touch b
+  $ hg ci -Aqm 'add b, c'
+  $ hg cp b x
+  $ echo x >> x
+  $ hg ci -qm 'copy b->x'
+  $ hg up -q 1
+  $ touch z
+  $ hg ci -Aqm 'add z'
+  $ hg log -G -T '{rev} {desc}\n'
+  @  3 add z
+  |
+  | o  2 copy b->x
+  |/
+  o  1 add b, c
+  |
+  o  0 add a
+  
+  $ hg rebase -d . -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 2:6adcf8c12e7d "copy b->x"
+  saved backup bundle to $TESTTMP/copydisable/.hg/strip-backup/6adcf8c12e7d-ce4b3e75-backup.hg (glob)
+  $ hg up -q 3
+  $ hg log -f x -T '{rev} {desc}\n'
+  3 copy b->x
+  1 add b, c
+
+  $ cd ../
+
+Verify we duplicate existing copies, instead of detecting them
+
+  $ hg init copydisable3
+  $ cd copydisable3
+  $ touch a
+  $ hg ci -Aqm 'add a'
+  $ hg cp a b
+  $ hg ci -Aqm 'copy a->b'
+  $ hg mv b c
+  $ hg ci -Aqm 'move b->c'
+  $ hg up -q 0
+  $ hg cp a b
+  $ echo b >> b
+  $ hg ci -Aqm 'copy a->b (2)'
+  $ hg log -G -T '{rev} {desc}\n'
+  @  3 copy a->b (2)
+  |
+  | o  2 move b->c
+  | |
+  | o  1 copy a->b
+  |/
+  o  0 add a
+  
+  $ hg rebase -d 2 -s 3 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 3:47e1a9e6273b "copy a->b (2)" (tip)
+  saved backup bundle to $TESTTMP/copydisable3/.hg/strip-backup/47e1a9e6273b-2d099c59-backup.hg (glob)
+
+  $ hg log -G -f b
+  @  changeset:   3:76024fb4b05b
+  |  tag:         tip
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     copy a->b (2)
+  |
+  o  changeset:   0:ac82d8b1f7c4
+     user:        test
+     date:        Thu Jan 01 00:00:00 1970 +0000
+     summary:     add a
+