Patchwork [V5] copy: add flag for disabling copy tracing

login
register
mail settings
Submitter Durham Goode
Date Aug. 3, 2015, 6:39 p.m.
Message ID <3ec836fca4d2f2b5d18a.1438627196@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/10089/
State Superseded
Headers show

Comments

Durham Goode - Aug. 3, 2015, 6:39 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1422386787 28800
#      Tue Jan 27 11:26:27 2015 -0800
# Branch stable
# Node ID 3ec836fca4d2f2b5d18a7e1cbfec6ea8a1d93f7b
# Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
copy: add flag for disabling copy tracing

Copy tracing can be up to 80% of rebase time when rebasing stacks
of commits in large repos (hundreds of thousands of files).
This provides the option of turning off the majority of copy tracing. It
does not turn off _forwardcopies() since that is used to carry copy
information inside a commit across a rebase.

This will affect the situation where a user edits a file, then rebases on top of
commits that have moved that file. The move will not be detected and the user
will have to manually resolve the issue (possibly by redoing the rebase with
this flag off).

This takes rebasing a 3 commit stack that's a couple weeks old from 65s to 12s.
Durham Goode - Aug. 3, 2015, 6:47 p.m.
On 8/3/15 11:39 AM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1422386787 28800
> #      Tue Jan 27 11:26:27 2015 -0800
> # Branch stable
> # Node ID 3ec836fca4d2f2b5d18a7e1cbfec6ea8a1d93f7b
> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
> copy: add flag for disabling copy tracing
>
> Copy tracing can be up to 80% of rebase time when rebasing stacks
> of commits in large repos (hundreds of thousands of files).
> This provides the option of turning off the majority of copy tracing. It
> does not turn off _forwardcopies() since that is used to carry copy
> information inside a commit across a rebase.
>
> This will affect the situation where a user edits a file, then rebases on top of
> commits that have moved that file. The move will not be detected and the user
> will have to manually resolve the issue (possibly by redoing the rebase with
> this flag off).
>
> This takes rebasing a 3 commit stack that's a couple weeks old from 65s to 12s.
I sent the last version of this back in April-ish.  Martin pointed out 
that one of the changes didn't actually affect the test. This new 
version has updated tests that should test each of the changes.

We'll likely not disable copy tracing entirely inside our company. We'll 
probably tweak it so that if a rebase is over N000 commits disable copy 
tracing, but smaller rebases keep it.
Pierre-Yves David - Aug. 5, 2015, 1:10 a.m.
On 08/03/2015 11:39 AM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1422386787 28800
> #      Tue Jan 27 11:26:27 2015 -0800
> # Branch stable
> # Node ID 3ec836fca4d2f2b5d18a7e1cbfec6ea8a1d93f7b
> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
> copy: add flag for disabling copy tracing
>
> Copy tracing can be up to 80% of rebase time when rebasing stacks
> of commits in large repos (hundreds of thousands of files).
> This provides the option of turning off the majority of copy tracing. It
> does not turn off _forwardcopies() since that is used to carry copy
> information inside a commit across a rebase.
>
> This will affect the situation where a user edits a file, then rebases on top of
> commits that have moved that file. The move will not be detected and the user
> will have to manually resolve the issue (possibly by redoing the rebase with
> this flag off).
>
> This takes rebasing a 3 commit stack that's a couple weeks old from 65s to 12s.
>
> diff --git a/mercurial/copies.py b/mercurial/copies.py
> --- a/mercurial/copies.py
> +++ b/mercurial/copies.py
> @@ -185,6 +185,9 @@ def _forwardcopies(a, b, match=None):
>       return cm
>
>   def _backwardrenames(a, b):
> +    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
> +        return {}
> +

Is there any reason you don't just patch the four public methods in 
copies? I suspect there is some case we want to not disable it (like 
when preserving data in a rebase or an amend. Can we get details on that 
(why some method are patched or not)?

>       # Even though we're not taking copies into account, 1:n rename situations
>       # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
>       # arbitrarily pick one of the renames.
> @@ -258,6 +261,9 @@ def mergecopies(repo, c1, c2, ca):
>       if c2.node() is None and c1.node() == repo.dirstate.p1():
>           return repo.dirstate.copies(), {}, {}, {}
>
> +    if repo.ui.configbool('experimental', 'disablecopytrace'):
> +        return {}, {}, {}, {}
> +

(small-nit?) Any reason this is not the very first line of the method 
(for consistency?)

>       limit = _findlimit(repo, c1.rev(), c2.rev())
>       if limit is None:
>           # no common ancestor, no copies
> @@ -507,7 +513,8 @@ def duplicatecopies(repo, rev, fromrev,
>       copies between fromrev and rev.
>       '''
>       exclude = {}
> -    if skiprev is not None:
> +    if (skiprev is not None and
> +        not repo.ui.configbool('experimental', 'disablecopytrace')):
>           exclude = pathcopies(repo[fromrev], repo[skiprev])

In the same spirit we probably want a comment explaining why we skip the 
first one, but not the second.

>       for dst, src in pathcopies(repo[fromrev], repo[rev]).iteritems():
>           # copies.pathcopies returns backward renames, so dst might not
Durham Goode - Aug. 5, 2015, 2:03 a.m.
I can resend with more comments about the particular locations I touched.

On 8/4/15, 6:10 PM, "Pierre-Yves David" <pierre-yves.david@ens-lyon.org>
wrote:

>
>
>On 08/03/2015 11:39 AM, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1422386787 28800
>> #      Tue Jan 27 11:26:27 2015 -0800
>> # Branch stable
>> # Node ID 3ec836fca4d2f2b5d18a7e1cbfec6ea8a1d93f7b
>> # Parent  79f0cb97d7537a7c2948f8f9b0a89148825a3a1d
>> copy: add flag for disabling copy tracing
>>
>> Copy tracing can be up to 80% of rebase time when rebasing stacks
>> of commits in large repos (hundreds of thousands of files).
>> This provides the option of turning off the majority of copy tracing. It
>> does not turn off _forwardcopies() since that is used to carry copy
>> information inside a commit across a rebase.
>>
>> This will affect the situation where a user edits a file, then rebases
>>on top of
>> commits that have moved that file. The move will not be detected and
>>the user
>> will have to manually resolve the issue (possibly by redoing the rebase
>>with
>> this flag off).
>>
>> This takes rebasing a 3 commit stack that's a couple weeks old from 65s
>>to 12s.
>>
>> diff --git a/mercurial/copies.py b/mercurial/copies.py
>> --- a/mercurial/copies.py
>> +++ b/mercurial/copies.py
>> @@ -185,6 +185,9 @@ def _forwardcopies(a, b, match=None):
>>       return cm
>>
>>   def _backwardrenames(a, b):
>> +    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
>> +        return {}
>> +
>
>Is there any reason you don't just patch the four public methods in
>copies? I suspect there is some case we want to not disable it (like
>when preserving data in a rebase or an amend. Can we get details on that
>(why some method are patched or not)?

Yea, as you said, the copy functionality is responsible for carrying basic
copy information across a rebase (like if a commit did a copy, we don't
just remember that information and reapply it, we rediscover that
information via a copy trace).  So disabling all public methods here
breaks even the most basic copy information movements.  It's a bit of a
pain.

>
>>       # Even though we're not taking copies into account, 1:n rename
>>situations
>>       # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
>>       # arbitrarily pick one of the renames.
>> @@ -258,6 +261,9 @@ def mergecopies(repo, c1, c2, ca):
>>       if c2.node() is None and c1.node() == repo.dirstate.p1():
>>           return repo.dirstate.copies(), {}, {}, {}
>>
>> +    if repo.ui.configbool('experimental', 'disablecopytrace'):
>> +        return {}, {}, {}, {}
>> +
>
>(small-nit?) Any reason this is not the very first line of the method
>(for consistency?)

The lines above are responsible for carrying individual copy information
across certain operations.  I forget exactly which operations require it
(maybe it's part of revert?), but it is cheap since it's always between
the working copy and it's immediate parent.

>
>>       limit = _findlimit(repo, c1.rev(), c2.rev())
>>       if limit is None:
>>           # no common ancestor, no copies
>> @@ -507,7 +513,8 @@ def duplicatecopies(repo, rev, fromrev,
>>       copies between fromrev and rev.
>>       '''
>>       exclude = {}
>> -    if skiprev is not None:
>> +    if (skiprev is not None and
>> +        not repo.ui.configbool('experimental', 'disablecopytrace')):
>>           exclude = pathcopies(repo[fromrev], repo[skiprev])
>
>In the same spirit we probably want a comment explaining why we skip the
>first one, but not the second.
>
>>       for dst, src in pathcopies(repo[fromrev], repo[rev]).iteritems():
>>           # copies.pathcopies returns backward renames, so dst might not
>
>-- 
>Pierre-Yves David

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -185,6 +185,9 @@  def _forwardcopies(a, b, match=None):
     return cm
 
 def _backwardrenames(a, b):
+    if a._repo.ui.configbool('experimental', 'disablecopytrace'):
+        return {}
+
     # Even though we're not taking copies into account, 1:n rename situations
     # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
     # arbitrarily pick one of the renames.
@@ -258,6 +261,9 @@  def mergecopies(repo, c1, c2, ca):
     if c2.node() is None and c1.node() == repo.dirstate.p1():
         return repo.dirstate.copies(), {}, {}, {}
 
+    if repo.ui.configbool('experimental', 'disablecopytrace'):
+        return {}, {}, {}, {}
+
     limit = _findlimit(repo, c1.rev(), c2.rev())
     if limit is None:
         # no common ancestor, no copies
@@ -507,7 +513,8 @@  def duplicatecopies(repo, rev, fromrev, 
     copies between fromrev and rev.
     '''
     exclude = {}
-    if skiprev is not None:
+    if (skiprev is not None and
+        not repo.ui.configbool('experimental', 'disablecopytrace')):
         exclude = pathcopies(repo[fromrev], repo[skiprev])
     for dst, src in pathcopies(repo[fromrev], repo[rev]).iteritems():
         # copies.pathcopies returns backward renames, so dst might not
diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
--- a/tests/test-copy-move-merge.t
+++ b/tests/test-copy-move-merge.t
@@ -59,4 +59,107 @@  file c
   1
   2
 
+Test disabling copy tracing
+
+- first verify copy metadata was kept
+
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
+  rebasing 2:add3f11052fa "other" (tip)
+  merging b and a to b
+  merging c and a to c
+
+  $ cat b
+  0
+  1
+  2
+
+- next verify copy metadata is lost when disabled
+
+  $ hg strip -r . --config extensions.strip=
+  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  saved backup bundle to $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 2:add3f11052fa "other" (tip)
+  remote changed a which local deleted
+  use (c)hanged version or leave (d)eleted? c
+
+  $ cat b
+  1
+  2
+
   $ cd ..
+
+Verify disabling copy tracing still keeps copies from rebase source
+
+  $ hg init copydisable
+  $ cd copydisable
+  $ touch a
+  $ hg ci -Aqm 'add a'
+  $ touch b
+  $ hg ci -Aqm 'add b, c'
+  $ hg cp b x
+  $ echo x >> x
+  $ hg ci -qm 'copy b->x'
+  $ hg up -q 1
+  $ touch z
+  $ hg ci -Aqm 'add z'
+  $ hg log -G -T '{rev} {desc}\n'
+  @  3 add z
+  |
+  | o  2 copy b->x
+  |/
+  o  1 add b, c
+  |
+  o  0 add a
+  
+  $ hg rebase -d . -b 2 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 2:6adcf8c12e7d "copy b->x"
+  saved backup bundle to $TESTTMP/copydisable/.hg/strip-backup/6adcf8c12e7d-ce4b3e75-backup.hg (glob)
+  $ hg up -q 3
+  $ hg log -f x -T '{rev} {desc}\n'
+  3 copy b->x
+  1 add b, c
+
+  $ cd ../
+
+Verify we duplicate existing copies, instead of detecting them
+
+  $ hg init copydisable3
+  $ cd copydisable3
+  $ touch a
+  $ hg ci -Aqm 'add a'
+  $ hg cp a b
+  $ hg ci -Aqm 'copy a->b'
+  $ hg mv b c
+  $ hg ci -Aqm 'move b->c'
+  $ hg up -q 0
+  $ hg cp a b
+  $ echo b >> b
+  $ hg ci -Aqm 'copy a->b (2)'
+  $ hg log -G -T '{rev} {desc}\n'
+  @  3 copy a->b (2)
+  |
+  | o  2 move b->c
+  | |
+  | o  1 copy a->b
+  |/
+  o  0 add a
+  
+  $ hg rebase -d 2 -s 3 --config extensions.rebase= --config experimental.disablecopytrace=True
+  rebasing 3:47e1a9e6273b "copy a->b (2)" (tip)
+  saved backup bundle to $TESTTMP/copydisable3/.hg/strip-backup/47e1a9e6273b-2d099c59-backup.hg (glob)
+
+  $ hg log -G -f b
+  @  changeset:   3:76024fb4b05b
+  |  tag:         tip
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     copy a->b (2)
+  |
+  o  changeset:   0:ac82d8b1f7c4
+     user:        test
+     date:        Thu Jan 01 00:00:00 1970 +0000
+     summary:     add a
+