Patchwork [V4] copy: add flag for disabling copy tracing

login
register
mail settings
Submitter Durham Goode
Date April 14, 2015, 1:55 p.m.
Message ID <5a39559f0caa07ca1a44.1429019727@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/8648/
State Superseded
Commit 38f92d12357cb8d93576c52d5ed93e771e06ef03
Headers show

Comments

Durham Goode - April 14, 2015, 1:55 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1422386787 28800
#      Tue Jan 27 11:26:27 2015 -0800
# Node ID 5a39559f0caa07ca1a44ee67f7f917138b54198f
# Parent  52ff737c63d2b2cb41185549aa9c35bc47317032
copy: add flag for disabling copy tracing

Copy tracing can be up to 80% of rebase time when rebasing stacks
of commits in large repos (hundreds of thousands of files).
This provides the option of turning off the majority of copy tracing. It
does not turn off _forwardcopies() since that is used to carry copy
information inside a commit across a rebase.

This will affect the situation where a user edits a file, then rebases on top of
commits that have moved that file. The move will not be detected and the user
will have to manually resolve the issue (possibly by redoing the rebase with
this flag off).

This takes rebasing a 3 commit stack that's a couple weeks old from 65s to 12s.
Martin von Zweigbergk - April 14, 2015, 4:35 p.m.
On Tue, Apr 14, 2015 at 6:56 AM Durham Goode <durham@fb.com> wrote:

> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1422386787 28800
> #      Tue Jan 27 11:26:27 2015 -0800
> # Node ID 5a39559f0caa07ca1a44ee67f7f917138b54198f
> # Parent  52ff737c63d2b2cb41185549aa9c35bc47317032
> copy: add flag for disabling copy tracing
>
> Copy tracing can be up to 80% of rebase time when rebasing stacks
> of commits in large repos (hundreds of thousands of files).
>

Does most of that come from copies.mergecopies() or
copies.duplicatecopies()? I saw 10% in mergecopies() and 18% in
duplicatecopies(), but that was in a smaller repo and rebasing across a
smaller distance.


> This provides the option of turning off the majority of copy tracing. It
> does not turn off _forwardcopies() since that is used to carry copy
> information inside a commit across a rebase.
>

How is this different from merge.followcopies? (I'm not saying it's not,
I'm just asking.) IIUC, merge.followcopies=False still calls
duplicatecopies(), which is supposed to "reproduce copies from fromrev to
rev in the dirstate". Does your new option turn off make us not duplicate
some copies that would have been found with merge.followcopies=False?


>
> This will affect the situation where a user edits a file, then rebases on
> top of
> commits that have moved that file. The move will not be detected and the
> user
> will have to manually resolve the issue (possibly by redoing the rebase
> with
> this flag off).
>
> This takes rebasing a 3 commit stack that's a couple weeks old from 65s to
> 12s.
>
> diff --git a/mercurial/copies.py b/mercurial/copies.py
> --- a/mercurial/copies.py
> +++ b/mercurial/copies.py
> @@ -186,6 +186,9 @@ def _forwardcopies(a, b):
>      return cm
>
>  def _backwardrenames(a, b):
> +    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
> +        return {}
> +
>      # Even though we're not taking copies into account, 1:n rename
> situations
>      # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
>      # arbitrarily pick one of the renames.
> @@ -258,6 +261,13 @@ def mergecopies(repo, c1, c2, ca):
>      if c2.node() is None and c1.node() == repo.dirstate.p1():
>          return repo.dirstate.copies(), {}, {}, {}
>
> +    if repo.ui.configbool('experimental', 'disablecopytrace'):
> +        # If we're just tracking against the parent, do a simple check.
> +        # This is necessary to ensure copies survive rebasing.
> +        if ca in c2.parents():
> +            return pathcopies(ca, c2), {}, {}, {}
> +        return {}, {}, {}, {}
> +
>      limit = _findlimit(repo, c1.rev(), c2.rev())
>      if limit is None:
>          # no common ancestor, no copies
> @@ -506,7 +516,8 @@ def duplicatecopies(repo, rev, fromrev,
>      copies between fromrev and rev.
>      '''
>      exclude = {}
> -    if skiprev is not None:
> +    if (skiprev is not None and
> +        not repo.ui.configbool('experimental', 'disablecopytrace')):
>          exclude = pathcopies(repo[fromrev], repo[skiprev])
>      for dst, src in pathcopies(repo[fromrev], repo[rev]).iteritems():
>          # copies.pathcopies returns backward renames, so dst might not
> diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
> --- a/tests/test-copy-move-merge.t
> +++ b/tests/test-copy-move-merge.t
> @@ -61,4 +61,73 @@ file c
>    1
>    2
>
> +Test disabling copy tracing
> +
> +- first verify copy metadata was kept
> +
> +  $ hg up -qC 2
> +  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
> +  rebasing 2:add3f11052fa "other" (tip)
> +  merging b and a to b
> +  merging c and a to c
> +
> +  $ cat b
> +  0
> +  1
> +  2
> +
> +- next verify copy metadata is lost when disabled
> +
> +  $ hg strip -r . --config extensions.strip=
> +  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  saved backup bundle to
> $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
> +  $ hg up -qC 2
> +  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config
> experimental.disablecopytrace=True
> +  rebasing 2:add3f11052fa "other" (tip)
> +  remote changed a which local deleted
> +  use (c)hanged version or leave (d)eleted? c
> +
> +  $ cat b
> +  1
> +  2
> +
>    $ cd ..
> +
> +Verify disabling copy tracing still keeps copies from rebase source
> +
> +  $ hg init copydisable
> +  $ cd copydisable
> +  $ touch a
> +  $ hg ci -Aqm a
> +  $ touch b
> +  $ touch c
> +  $ hg ci -Aqm b
> +  $ hg cp b x
> +  $ hg cp c y
> +  $ echo x >> x
> +  $ hg ci -qm xy
> +  $ hg up -q 1
> +  $ hg cp b x
> +  $ touch z
> +  $ hg ci -Aqm xz
> +  $ hg log -G -T '{rev} {desc}\n'
> +  @  3 xz
> +  |
> +  | o  2 xy
> +  |/
> +  o  1 b
> +  |
> +  o  0 a
> +
> +  $ hg rebase -d . -b 2 --config extensions.rebase= --config
> experimental.disablecopytrace=True
> +  rebasing 2:002ca7a9f426 "xy"
> +  merging x
> +  saved backup bundle to
> $TESTTMP/copydisable/.hg/strip-backup/002ca7a9f426-19f9ad1e-backup.hg (glob)
> +  $ hg up -q 3
> +  $ hg log -f x -T '{rev} {desc}\n'
> +  3 xy
> +  2 xz
> +  1 b
> +  $ hg log -f y -T '{rev} {desc}\n'
> +  3 xy
> +  1 b
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
Durham Goode - April 14, 2015, 6:43 p.m.
On 4/14/15 12:35 PM, Martin von Zweigbergk wrote:
>
>
> On Tue, Apr 14, 2015 at 6:56 AM Durham Goode <durham@fb.com 
> <mailto:durham@fb.com>> wrote:
>
>     # HG changeset patch
>     # User Durham Goode <durham@fb.com <mailto:durham@fb.com>>
>     # Date 1422386787 28800
>     #      Tue Jan 27 11:26:27 2015 -0800
>     # Node ID 5a39559f0caa07ca1a44ee67f7f917138b54198f
>     # Parent  52ff737c63d2b2cb41185549aa9c35bc47317032
>     copy: add flag for disabling copy tracing
>
>     Copy tracing can be up to 80% of rebase time when rebasing stacks
>     of commits in large repos (hundreds of thousands of files).
>
>
> Does most of that come from copies.mergecopies() or 
> copies.duplicatecopies()? I saw 10% in mergecopies() and 18% in 
> duplicatecopies(), but that was in a smaller repo and rebasing across 
> a smaller distance.
Both.  I'm seeing 47% mergecopies and 37% duplicatecopies. Admittedly 
we're doing our filelog history traversals through remotefilelog, so our 
perf characteristics may be slightly different from core, but the copy 
tracing operation is O(number of files change between rebase dest and 
common ancestor) so it's fundamentally not scalable on a large repo 
right now.
>
>     This provides the option of turning off the majority of copy
>     tracing. It
>     does not turn off _forwardcopies() since that is used to carry copy
>     information inside a commit across a rebase.
>
>
> How is this different from merge.followcopies? (I'm not saying it's 
> not, I'm just asking.) IIUC, merge.followcopies=False still calls 
> duplicatecopies(), which is supposed to "reproduce copies from fromrev 
> to rev in the dirstate". Does your new option turn off make us not 
> duplicate some copies that would have been found with 
> merge.followcopies=False?
followcopies only affects mergecopies(), not duplicatecopies() (like you 
mentioned).  It also disables copies entirely, while mine does not 
disable it for forward copies, which are needed to preserve copy 
information across rebases.  Originally my config option was a string 
instead of a boolean, so we could have different levels of disablement. 
But since we only have one level right now, Matt asked for it to be a 
bool.  That might come back in the future (like only doing copy 
detection if we noticed a modified file is not present in the rebase dest).
Martin von Zweigbergk - April 14, 2015, 10:53 p.m.
On Tue, Apr 14, 2015 at 6:56 AM Durham Goode <durham@fb.com> wrote:

> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1422386787 28800
> #      Tue Jan 27 11:26:27 2015 -0800
> # Node ID 5a39559f0caa07ca1a44ee67f7f917138b54198f
> # Parent  52ff737c63d2b2cb41185549aa9c35bc47317032
> copy: add flag for disabling copy tracing
>
> Copy tracing can be up to 80% of rebase time when rebasing stacks
> of commits in large repos (hundreds of thousands of files).
> This provides the option of turning off the majority of copy tracing. It
> does not turn off _forwardcopies() since that is used to carry copy
> information inside a commit across a rebase.
>
> This will affect the situation where a user edits a file, then rebases on
> top of
> commits that have moved that file. The move will not be detected and the
> user
> will have to manually resolve the issue (possibly by redoing the rebase
> with
> this flag off).
>
> This takes rebasing a 3 commit stack that's a couple weeks old from 65s to
> 12s.
>
> diff --git a/mercurial/copies.py b/mercurial/copies.py
> --- a/mercurial/copies.py
> +++ b/mercurial/copies.py
> @@ -186,6 +186,9 @@ def _forwardcopies(a, b):
>      return cm
>
>  def _backwardrenames(a, b):
> +    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
> +        return {}
> +
>      # Even though we're not taking copies into account, 1:n rename
> situations
>      # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
>      # arbitrarily pick one of the renames.
> @@ -258,6 +261,13 @@ def mergecopies(repo, c1, c2, ca):
>      if c2.node() is None and c1.node() == repo.dirstate.p1():
>          return repo.dirstate.copies(), {}, {}, {}
>
> +    if repo.ui.configbool('experimental', 'disablecopytrace'):
> +        # If we're just tracking against the parent, do a simple check.
> +        # This is necessary to ensure copies survive rebasing.
> +        if ca in c2.parents():
> +            return pathcopies(ca, c2), {}, {}, {}
> +        return {}, {}, {}, {}
> +
>      limit = _findlimit(repo, c1.rev(), c2.rev())
>      if limit is None:
>          # no common ancestor, no copies
> @@ -506,7 +516,8 @@ def duplicatecopies(repo, rev, fromrev,
>      copies between fromrev and rev.
>      '''
>      exclude = {}
> -    if skiprev is not None:
> +    if (skiprev is not None and
> +        not repo.ui.configbool('experimental', 'disablecopytrace')):
>

I don't follow this bit, so I tried removing it to see what failed, but all
tests passed. What does it do?


>          exclude = pathcopies(repo[fromrev], repo[skiprev])
>      for dst, src in pathcopies(repo[fromrev], repo[rev]).iteritems():
>          # copies.pathcopies returns backward renames, so dst might not
> diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
> --- a/tests/test-copy-move-merge.t
> +++ b/tests/test-copy-move-merge.t
> @@ -61,4 +61,73 @@ file c
>    1
>    2
>
> +Test disabling copy tracing
> +
> +- first verify copy metadata was kept
> +
> +  $ hg up -qC 2
> +  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
> +  rebasing 2:add3f11052fa "other" (tip)
> +  merging b and a to b
> +  merging c and a to c
> +
> +  $ cat b
> +  0
> +  1
> +  2
> +
> +- next verify copy metadata is lost when disabled
> +
> +  $ hg strip -r . --config extensions.strip=
> +  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  saved backup bundle to
> $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
> +  $ hg up -qC 2
> +  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config
> experimental.disablecopytrace=True
> +  rebasing 2:add3f11052fa "other" (tip)
> +  remote changed a which local deleted
> +  use (c)hanged version or leave (d)eleted? c
> +
> +  $ cat b
> +  1
> +  2
> +
>    $ cd ..
> +
> +Verify disabling copy tracing still keeps copies from rebase source
> +
> +  $ hg init copydisable
> +  $ cd copydisable
> +  $ touch a
> +  $ hg ci -Aqm a
> +  $ touch b
> +  $ touch c
> +  $ hg ci -Aqm b
> +  $ hg cp b x
> +  $ hg cp c y
> +  $ echo x >> x
> +  $ hg ci -qm xy
> +  $ hg up -q 1
> +  $ hg cp b x
> +  $ touch z
> +  $ hg ci -Aqm xz
> +  $ hg log -G -T '{rev} {desc}\n'
> +  @  3 xz
> +  |
> +  | o  2 xy
> +  |/
> +  o  1 b
> +  |
> +  o  0 a
> +
> +  $ hg rebase -d . -b 2 --config extensions.rebase= --config
> experimental.disablecopytrace=True
> +  rebasing 2:002ca7a9f426 "xy"
> +  merging x
> +  saved backup bundle to
> $TESTTMP/copydisable/.hg/strip-backup/002ca7a9f426-19f9ad1e-backup.hg (glob)
> +  $ hg up -q 3
> +  $ hg log -f x -T '{rev} {desc}\n'
> +  3 xy
> +  2 xz
> +  1 b
> +  $ hg log -f y -T '{rev} {desc}\n'
> +  3 xy
> +  1 b
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
Durham Goode - April 16, 2015, 9:07 p.m.
On 4/14/15 6:53 PM, Martin von Zweigbergk wrote:
>
>
> On Tue, Apr 14, 2015 at 6:56 AM Durham Goode <durham@fb.com 
> <mailto:durham@fb.com>> wrote:
>
>     # HG changeset patch
>     # User Durham Goode <durham@fb.com <mailto:durham@fb.com>>
>     # Date 1422386787 28800
>     #      Tue Jan 27 11:26:27 2015 -0800
>     # Node ID 5a39559f0caa07ca1a44ee67f7f917138b54198f
>     # Parent  52ff737c63d2b2cb41185549aa9c35bc47317032
>     copy: add flag for disabling copy tracing
>
>     Copy tracing can be up to 80% of rebase time when rebasing stacks
>     of commits in large repos (hundreds of thousands of files).
>     This provides the option of turning off the majority of copy
>     tracing. It
>     does not turn off _forwardcopies() since that is used to carry copy
>     information inside a commit across a rebase.
>
>     This will affect the situation where a user edits a file, then
>     rebases on top of
>     commits that have moved that file. The move will not be detected
>     and the user
>     will have to manually resolve the issue (possibly by redoing the
>     rebase with
>     this flag off).
>
>     This takes rebasing a 3 commit stack that's a couple weeks old
>     from 65s to 12s.
>
>     diff --git a/mercurial/copies.py b/mercurial/copies.py
>     --- a/mercurial/copies.py
>     +++ b/mercurial/copies.py
>     @@ -186,6 +186,9 @@ def _forwardcopies(a, b):
>          return cm
>
>      def _backwardrenames(a, b):
>     +    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
>     +        return {}
>     +
>          # Even though we're not taking copies into account, 1:n
>     rename situations
>          # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
>          # arbitrarily pick one of the renames.
>     @@ -258,6 +261,13 @@ def mergecopies(repo, c1, c2, ca):
>          if c2.node() is None and c1.node() == repo.dirstate.p1():
>              return repo.dirstate.copies(), {}, {}, {}
>
>     +    if repo.ui.configbool('experimental', 'disablecopytrace'):
>     +        # If we're just tracking against the parent, do a simple
>     check.
>     +        # This is necessary to ensure copies survive rebasing.
>     +        if ca in c2.parents():
>     +            return pathcopies(ca, c2), {}, {}, {}
>     +        return {}, {}, {}, {}
>     +
>          limit = _findlimit(repo, c1.rev(), c2.rev())
>          if limit is None:
>              # no common ancestor, no copies
>     @@ -506,7 +516,8 @@ def duplicatecopies(repo, rev, fromrev,
>          copies between fromrev and rev.
>          '''
>          exclude = {}
>     -    if skiprev is not None:
>     +    if (skiprev is not None and
>     +        not repo.ui.configbool('experimental', 'disablecopytrace')):
>
>
> I don't follow this bit, so I tried removing it to see what failed, 
> but all tests passed. What does it do?
Interesting, you're right.  The code path is hit, but the test results 
aren't changed.  I'm going to dig into this farther and resend once I 
have more details (and a test that is affected by that line).

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -186,6 +186,9 @@  def _forwardcopies(a, b):
     return cm
 
 def _backwardrenames(a, b):
+    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
+        return {}
+
     # Even though we're not taking copies into account, 1:n rename situations
     # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
     # arbitrarily pick one of the renames.
@@ -258,6 +261,13 @@  def mergecopies(repo, c1, c2, ca):
     if c2.node() is None and c1.node() == repo.dirstate.p1():
         return repo.dirstate.copies(), {}, {}, {}
 
+    if repo.ui.configbool('experimental', 'disablecopytrace'):
+        # If we're just tracking against the parent, do a simple check.
+        # This is necessary to ensure copies survive rebasing.
+        if ca in c2.parents():
+            return pathcopies(ca, c2), {}, {}, {}
+        return {}, {}, {}, {}
+
     limit = _findlimit(repo, c1.rev(), c2.rev())
     if limit is None:
         # no common ancestor, no copies
@@ -506,7 +516,8 @@  def duplicatecopies(repo, rev, fromrev, 
     copies between fromrev and rev.
     '''
     exclude = {}
-    if skiprev is not None:
+    if (skiprev is not None and
+        not repo.ui.configbool('experimental', 'disablecopytrace')):
         exclude = pathcopies(repo[fromrev], repo[skiprev])
     for dst, src in pathcopies(repo[fromrev], repo[rev]).iteritems():
         # copies.pathcopies returns backward renames, so dst might not
diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
--- a/tests/test-copy-move-merge.t
+++ b/tests/test-copy-move-merge.t
@@ -61,4 +61,73 @@  file c
   1
   2
 
+Test disabling copy tracing
+
+- first verify copy metadata was kept
+
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
+  rebasing 2:add3f11052fa "other" (tip)
+  merging b and a to b
+  merging c and a to c
+
+  $ cat b
+  0
+  1
+  2
+
+- next verify copy metadata is lost when disabled
+
+  $ hg strip -r . --config extensions.strip=
+  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  saved backup bundle to $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 2:add3f11052fa "other" (tip)
+  remote changed a which local deleted
+  use (c)hanged version or leave (d)eleted? c
+
+  $ cat b
+  1
+  2
+
   $ cd ..
+
+Verify disabling copy tracing still keeps copies from rebase source
+
+  $ hg init copydisable
+  $ cd copydisable
+  $ touch a
+  $ hg ci -Aqm a
+  $ touch b
+  $ touch c
+  $ hg ci -Aqm b
+  $ hg cp b x
+  $ hg cp c y
+  $ echo x >> x
+  $ hg ci -qm xy
+  $ hg up -q 1
+  $ hg cp b x
+  $ touch z
+  $ hg ci -Aqm xz
+  $ hg log -G -T '{rev} {desc}\n'
+  @  3 xz
+  |
+  | o  2 xy
+  |/
+  o  1 b
+  |
+  o  0 a
+  
+  $ hg rebase -d . -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 2:002ca7a9f426 "xy"
+  merging x
+  saved backup bundle to $TESTTMP/copydisable/.hg/strip-backup/002ca7a9f426-19f9ad1e-backup.hg (glob)
+  $ hg up -q 3
+  $ hg log -f x -T '{rev} {desc}\n'
+  3 xy
+  2 xz
+  1 b
+  $ hg log -f y -T '{rev} {desc}\n'
+  3 xy
+  1 b