Patchwork [V6] copy: add flag for disabling copy tracing

login
register
mail settings
Submitter Durham Goode
Date Aug. 10, 2015, 6:35 p.m.
Message ID <6ef23dd36dfd7ccd467b.1439231741@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/10186/
State Superseded
Commit 38f92d12357cb8d93576c52d5ed93e771e06ef03
Delegated to: Augie Fackler
Headers show

Comments

Durham Goode - Aug. 10, 2015, 6:35 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1422386787 28800
#      Tue Jan 27 11:26:27 2015 -0800
# Branch stable
# Node ID 6ef23dd36dfd7ccd467b55f0ad59ccbe07e4b08c
# Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
copy: add flag for disabling copy tracing

Copy tracing can be up to 80% of rebase time when rebasing stacks
of commits in large repos (hundreds of thousands of files).
This provides the option of turning off the majority of copy tracing. It
does not turn off _forwardcopies() since that is used to carry copy
information inside a commit across a rebase.

This will affect the situation where a user edits a file, then rebases on top of
commits that have moved that file. The move will not be detected and the user
will have to manually resolve the issue (possibly by redoing the rebase with
this flag off).

This takes rebasing a 3 commit stack that's a couple weeks old from 65s to 12s.
Augie Fackler - Aug. 11, 2015, 7:59 p.m.
On Mon, Aug 10, 2015 at 11:35:41AM -0700, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1422386787 28800
> #      Tue Jan 27 11:26:27 2015 -0800
> # Branch stable
> # Node ID 6ef23dd36dfd7ccd467b55f0ad59ccbe07e4b08c
> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
> copy: add flag for disabling copy tracing

In general I'm okay with this as an experimental flag, but I've got a
laundry list of questions that I think are all easily handled with an
updated commit message and maybe one extra round of benchmarks...

>
> Copy tracing can be up to 80% of rebase time when rebasing stacks
> of commits in large repos (hundreds of thousands of files).

Has anyone put any effort into fixing copy tracing? I'm assuming
there's some potential smarts that could be done by walking the
filelogs for the files in the rebase and looking at only those
changes? Or am I missing something (again, I think, as I seem to
remember chatting about this at a sprint) that might stand to be in
this commit message for future archaeologists?

> This provides the option of turning off the majority of copy tracing. It
> does not turn off _forwardcopies() since that is used to carry copy
> information inside a commit across a rebase.

*nod* One thought: could this be the default behavior when all edited
files exist in both the original and new base for the rebase? That
seems like it would fix the obvious move cases and still provide most
of the win?

>
> This will affect the situation where a user edits a file, then rebases on top of
> commits that have moved that file. The move will not be detected and the user
> will have to manually resolve the issue (possibly by redoing the rebase with
> this flag off).
>
> This takes rebasing a 3 commit stack that's a couple weeks old from 65s to 12s.

On what repository? What kind of scale? How about a 3-patch stack
that's a few thousand changes back on mozilla-central as a benchmark
that's more generally useful?

>
> diff --git a/mercurial/copies.py b/mercurial/copies.py
> --- a/mercurial/copies.py
> +++ b/mercurial/copies.py
> @@ -185,6 +185,9 @@ def _forwardcopies(a, b, match=None):
>      return cm
>
>  def _backwardrenames(a, b):
> +    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
> +        return {}
> +
>      # Even though we're not taking copies into account, 1:n rename situations
>      # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
>      # arbitrarily pick one of the renames.
> @@ -258,6 +261,12 @@ def mergecopies(repo, c1, c2, ca):
>      if c2.node() is None and c1.node() == repo.dirstate.p1():
>          return repo.dirstate.copies(), {}, {}, {}
>
> +    # Copy trace disabling is explicitly below the node == p1 logic above
> +    # because the logic above is required for a simple copy to be kept across a
> +    # rebase.
> +    if repo.ui.configbool('experimental', 'disablecopytrace'):
> +        return {}, {}, {}, {}
> +
>      limit = _findlimit(repo, c1.rev(), c2.rev())
>      if limit is None:
>          # no common ancestor, no copies
> @@ -507,7 +516,12 @@ def duplicatecopies(repo, rev, fromrev,
>      copies between fromrev and rev.
>      '''
>      exclude = {}
> -    if skiprev is not None:
> +    if (skiprev is not None and
> +        not repo.ui.configbool('experimental', 'disablecopytrace')):
> +        # disablecopytrace skips this line, but not the entire function because
> +        # the line below is O(size of the repo) during a rebase, while the rest
> +        # of the function is much faster (and is required for carrying copy
> +        # metadata across the rebase anyway).
>          exclude = pathcopies(repo[fromrev], repo[skiprev])
>      for dst, src in pathcopies(repo[fromrev], repo[rev]).iteritems():
>          # copies.pathcopies returns backward renames, so dst might not
> diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
> --- a/tests/test-copy-move-merge.t
> +++ b/tests/test-copy-move-merge.t
> @@ -59,4 +59,107 @@ file c
>    1
>    2
>
> +Test disabling copy tracing
> +
> +- first verify copy metadata was kept
> +
> +  $ hg up -qC 2
> +  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
> +  rebasing 2:add3f11052fa "other" (tip)
> +  merging b and a to b
> +  merging c and a to c
> +
> +  $ cat b
> +  0
> +  1
> +  2
> +
> +- next verify copy metadata is lost when disabled
> +
> +  $ hg strip -r . --config extensions.strip=
> +  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  saved backup bundle to $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
> +  $ hg up -qC 2
> +  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
> +  rebasing 2:add3f11052fa "other" (tip)
> +  remote changed a which local deleted
> +  use (c)hanged version or leave (d)eleted? c
> +
> +  $ cat b
> +  1
> +  2
> +
>    $ cd ..
> +
> +Verify disabling copy tracing still keeps copies from rebase source
> +
> +  $ hg init copydisable
> +  $ cd copydisable
> +  $ touch a
> +  $ hg ci -Aqm 'add a'
> +  $ touch b
> +  $ hg ci -Aqm 'add b, c'
> +  $ hg cp b x
> +  $ echo x >> x
> +  $ hg ci -qm 'copy b->x'
> +  $ hg up -q 1
> +  $ touch z
> +  $ hg ci -Aqm 'add z'
> +  $ hg log -G -T '{rev} {desc}\n'
> +  @  3 add z
> +  |
> +  | o  2 copy b->x
> +  |/
> +  o  1 add b, c
> +  |
> +  o  0 add a
> +
> +  $ hg rebase -d . -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
> +  rebasing 2:6adcf8c12e7d "copy b->x"
> +  saved backup bundle to $TESTTMP/copydisable/.hg/strip-backup/6adcf8c12e7d-ce4b3e75-backup.hg (glob)
> +  $ hg up -q 3
> +  $ hg log -f x -T '{rev} {desc}\n'
> +  3 copy b->x
> +  1 add b, c
> +
> +  $ cd ../
> +
> +Verify we duplicate existing copies, instead of detecting them
> +
> +  $ hg init copydisable3
> +  $ cd copydisable3
> +  $ touch a
> +  $ hg ci -Aqm 'add a'
> +  $ hg cp a b
> +  $ hg ci -Aqm 'copy a->b'
> +  $ hg mv b c
> +  $ hg ci -Aqm 'move b->c'
> +  $ hg up -q 0
> +  $ hg cp a b
> +  $ echo b >> b
> +  $ hg ci -Aqm 'copy a->b (2)'
> +  $ hg log -G -T '{rev} {desc}\n'
> +  @  3 copy a->b (2)
> +  |
> +  | o  2 move b->c
> +  | |
> +  | o  1 copy a->b
> +  |/
> +  o  0 add a
> +
> +  $ hg rebase -d 2 -s 3 --config extensions.rebase= --config experimental.disablecopytrace=True
> +  rebasing 3:47e1a9e6273b "copy a->b (2)" (tip)
> +  saved backup bundle to $TESTTMP/copydisable3/.hg/strip-backup/47e1a9e6273b-2d099c59-backup.hg (glob)
> +
> +  $ hg log -G -f b
> +  @  changeset:   3:76024fb4b05b
> +  |  tag:         tip
> +  |  user:        test
> +  |  date:        Thu Jan 01 00:00:00 1970 +0000
> +  |  summary:     copy a->b (2)
> +  |
> +  o  changeset:   0:ac82d8b1f7c4
> +     user:        test
> +     date:        Thu Jan 01 00:00:00 1970 +0000
> +     summary:     add a
> +
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
Durham Goode - Aug. 12, 2015, 4:24 a.m.
On 8/11/15 12:59 PM, Augie Fackler wrote:
> On Mon, Aug 10, 2015 at 11:35:41AM -0700, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1422386787 28800
>> #      Tue Jan 27 11:26:27 2015 -0800
>> # Branch stable
>> # Node ID 6ef23dd36dfd7ccd467b55f0ad59ccbe07e4b08c
>> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
>> copy: add flag for disabling copy tracing
> In general I'm okay with this as an experimental flag, but I've got a
> laundry list of questions that I think are all easily handled with an
> updated commit message and maybe one extra round of benchmarks...
>
>> Copy tracing can be up to 80% of rebase time when rebasing stacks
>> of commits in large repos (hundreds of thousands of files).
> Has anyone put any effort into fixing copy tracing? I'm assuming
> there's some potential smarts that could be done by walking the
> filelogs for the files in the rebase and looking at only those
> changes? Or am I missing something (again, I think, as I seem to
> remember chatting about this at a sprint) that might stand to be in
> this commit message for future archaeologists?
The problem is that copy tracing is fundamentally an O(number of files 
in the repo) operation.  In order to know if file X in the rebase source 
was copied anywhere, we have to walk the filelog for every new file that 
exists in the rebase destination (i.e. a file in the destination that is 
not in the common ancestor).  Without an index that lets us trace 
forward (i.e. from file Y in the common ancestor forward to the rebase 
destination), it will never be an O(number of changes in my branch) 
operation.

I'll add this to the commit message and resend.
>
>> This provides the option of turning off the majority of copy tracing. It
>> does not turn off _forwardcopies() since that is used to carry copy
>> information inside a commit across a rebase.
> *nod* One thought: could this be the default behavior when all edited
> files exist in both the original and new base for the rebase? That
> seems like it would fix the obvious move cases and still provide most
> of the win?
That won't catch situations where the file has been copied elsewhere. So 
the original still exists, and the modification wants to be applied to 
the original and the copy at the destination.
>
>> This will affect the situation where a user edits a file, then rebases on top of
>> commits that have moved that file. The move will not be detected and the user
>> will have to manually resolve the issue (possibly by redoing the rebase with
>> this flag off).
>>
>> This takes rebasing a 3 commit stack that's a couple weeks old from 65s to 12s.
> On what repository? What kind of scale? How about a 3-patch stack
> that's a few thousand changes back on mozilla-central as a benchmark
> that's more generally useful?
 From a repo with hundreds of thousands of files, and across >10,000 
commits.  I'll add a benchmark against mozilla-central to the commit 
message and resend.

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -185,6 +185,9 @@  def _forwardcopies(a, b, match=None):
     return cm
 
 def _backwardrenames(a, b):
+    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
+        return {}
+
     # Even though we're not taking copies into account, 1:n rename situations
     # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
     # arbitrarily pick one of the renames.
@@ -258,6 +261,12 @@  def mergecopies(repo, c1, c2, ca):
     if c2.node() is None and c1.node() == repo.dirstate.p1():
         return repo.dirstate.copies(), {}, {}, {}
 
+    # Copy trace disabling is explicitly below the node == p1 logic above
+    # because the logic above is required for a simple copy to be kept across a
+    # rebase.
+    if repo.ui.configbool('experimental', 'disablecopytrace'):
+        return {}, {}, {}, {}
+
     limit = _findlimit(repo, c1.rev(), c2.rev())
     if limit is None:
         # no common ancestor, no copies
@@ -507,7 +516,12 @@  def duplicatecopies(repo, rev, fromrev, 
     copies between fromrev and rev.
     '''
     exclude = {}
-    if skiprev is not None:
+    if (skiprev is not None and
+        not repo.ui.configbool('experimental', 'disablecopytrace')):
+        # disablecopytrace skips this line, but not the entire function because
+        # the line below is O(size of the repo) during a rebase, while the rest
+        # of the function is much faster (and is required for carrying copy
+        # metadata across the rebase anyway).
         exclude = pathcopies(repo[fromrev], repo[skiprev])
     for dst, src in pathcopies(repo[fromrev], repo[rev]).iteritems():
         # copies.pathcopies returns backward renames, so dst might not
diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
--- a/tests/test-copy-move-merge.t
+++ b/tests/test-copy-move-merge.t
@@ -59,4 +59,107 @@  file c
   1
   2
 
+Test disabling copy tracing
+
+- first verify copy metadata was kept
+
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
+  rebasing 2:add3f11052fa "other" (tip)
+  merging b and a to b
+  merging c and a to c
+
+  $ cat b
+  0
+  1
+  2
+
+- next verify copy metadata is lost when disabled
+
+  $ hg strip -r . --config extensions.strip=
+  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  saved backup bundle to $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 2:add3f11052fa "other" (tip)
+  remote changed a which local deleted
+  use (c)hanged version or leave (d)eleted? c
+
+  $ cat b
+  1
+  2
+
   $ cd ..
+
+Verify disabling copy tracing still keeps copies from rebase source
+
+  $ hg init copydisable
+  $ cd copydisable
+  $ touch a
+  $ hg ci -Aqm 'add a'
+  $ touch b
+  $ hg ci -Aqm 'add b, c'
+  $ hg cp b x
+  $ echo x >> x
+  $ hg ci -qm 'copy b->x'
+  $ hg up -q 1
+  $ touch z
+  $ hg ci -Aqm 'add z'
+  $ hg log -G -T '{rev} {desc}\n'
+  @  3 add z
+  |
+  | o  2 copy b->x
+  |/
+  o  1 add b, c
+  |
+  o  0 add a
+  
+  $ hg rebase -d . -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 2:6adcf8c12e7d "copy b->x"
+  saved backup bundle to $TESTTMP/copydisable/.hg/strip-backup/6adcf8c12e7d-ce4b3e75-backup.hg (glob)
+  $ hg up -q 3
+  $ hg log -f x -T '{rev} {desc}\n'
+  3 copy b->x
+  1 add b, c
+
+  $ cd ../
+
+Verify we duplicate existing copies, instead of detecting them
+
+  $ hg init copydisable3
+  $ cd copydisable3
+  $ touch a
+  $ hg ci -Aqm 'add a'
+  $ hg cp a b
+  $ hg ci -Aqm 'copy a->b'
+  $ hg mv b c
+  $ hg ci -Aqm 'move b->c'
+  $ hg up -q 0
+  $ hg cp a b
+  $ echo b >> b
+  $ hg ci -Aqm 'copy a->b (2)'
+  $ hg log -G -T '{rev} {desc}\n'
+  @  3 copy a->b (2)
+  |
+  | o  2 move b->c
+  | |
+  | o  1 copy a->b
+  |/
+  o  0 add a
+  
+  $ hg rebase -d 2 -s 3 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 3:47e1a9e6273b "copy a->b (2)" (tip)
+  saved backup bundle to $TESTTMP/copydisable3/.hg/strip-backup/47e1a9e6273b-2d099c59-backup.hg (glob)
+
+  $ hg log -G -f b
+  @  changeset:   3:76024fb4b05b
+  |  tag:         tip
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     copy a->b (2)
+  |
+  o  changeset:   0:ac82d8b1f7c4
+     user:        test
+     date:        Thu Jan 01 00:00:00 1970 +0000
+     summary:     add a
+