Patchwork [V2] copy: add flag for disabling copy tracing

login
register
mail settings
Submitter Durham Goode
Date Feb. 4, 2015, 3:13 a.m.
Message ID <94d5d878251d92e97b47.1423019585@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/7646/
State Superseded
Commit 38f92d12357cb8d93576c52d5ed93e771e06ef03
Headers show

Comments

Durham Goode - Feb. 4, 2015, 3:13 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1422386787 28800
#      Tue Jan 27 11:26:27 2015 -0800
# Node ID 94d5d878251d92e97b47339110848205e54eb960
# Parent  e1dbe0b215ae137eec53ceb12440536d570a83d2
copy: add flag for disabling copy tracing

Copy tracing can be up to 50% of rebase time when rebasing medium sized stacks
of commits (~12 commits) in large repos (hundreds of thousands of files).
This provides the option of turning off the majority of copy tracing. It
does not turn off _forwardcopies() since that is used to carry copy
information inside a commit across a rebase.

This will affect the situation where a user edits a file, then rebases on top of
commits that have moved that file. The move will not be detected and the user
will have to manually resolve the issue.

Once the lazymanifest work goes in, we can make this smarter by only disabling
copy detection when we know all the target files exist in the destination (since
we will be able to cheaply detect this with lazymanifest.diff).

This takes a 12 commit rebase from 70s to 45s. Future patches will bring this
down further to 27s.
Matt Mackall - Feb. 4, 2015, 9:45 p.m.
On Tue, 2015-02-03 at 19:13 -0800, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1422386787 28800
> #      Tue Jan 27 11:26:27 2015 -0800
> # Node ID 94d5d878251d92e97b47339110848205e54eb960
> # Parent  e1dbe0b215ae137eec53ceb12440536d570a83d2
> copy: add flag for disabling copy tracing
> 
> Copy tracing can be up to 50% of rebase time when rebasing medium sized stacks
> of commits (~12 commits) in large repos (hundreds of thousands of files).
> This provides the option of turning off the majority of copy tracing. It
> does not turn off _forwardcopies() since that is used to carry copy
> information inside a commit across a rebase.
> 
> This will affect the situation where a user edits a file, then rebases on top of
> commits that have moved that file. The move will not be detected and the user
> will have to manually resolve the issue.
> 
> Once the lazymanifest work goes in, we can make this smarter by only disabling
> copy detection when we know all the target files exist in the destination (since
> we will be able to cheaply detect this with lazymanifest.diff).
> 
> This takes a 12 commit rebase from 70s to 45s. Future patches will bring this
> down further to 27s.

This is really similar to the existing merge.followcopies, which has
probably bit-rotted. It was put in when copy support was added to bypass
bugs/performance issues, but no one's said anything about copy
performance until quite recently.

Can you take a look at that? Also note its usage of configbool.
Durham Goode - Feb. 6, 2015, 12:44 a.m.
On 2/4/15 1:45 PM, Matt Mackall wrote:
> On Tue, 2015-02-03 at 19:13 -0800, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1422386787 28800
>> #      Tue Jan 27 11:26:27 2015 -0800
>> # Node ID 94d5d878251d92e97b47339110848205e54eb960
>> # Parent  e1dbe0b215ae137eec53ceb12440536d570a83d2
>> copy: add flag for disabling copy tracing
>>
>> Copy tracing can be up to 50% of rebase time when rebasing medium sized stacks
>> of commits (~12 commits) in large repos (hundreds of thousands of files).
>> This provides the option of turning off the majority of copy tracing. It
>> does not turn off _forwardcopies() since that is used to carry copy
>> information inside a commit across a rebase.
>>
>> This will affect the situation where a user edits a file, then rebases on top of
>> commits that have moved that file. The move will not be detected and the user
>> will have to manually resolve the issue.
>>
>> Once the lazymanifest work goes in, we can make this smarter by only disabling
>> copy detection when we know all the target files exist in the destination (since
>> we will be able to cheaply detect this with lazymanifest.diff).
>>
>> This takes a 12 commit rebase from 70s to 45s. Future patches will bring this
>> down further to 27s.
> This is really similar to the existing merge.followcopies, which has
> probably bit-rotted. It was put in when copy support was added to bypass
> bugs/performance issues, but no one's said anything about copy
> performance until quite recently.
>
> Can you take a look at that? Also note its usage of configbool.
>
merge.followcopies has two problems:

1) it only disables it in copies.mergecopies, which doesn't handle the 
case of copies.pathcopies
2) it disables it completely in mergecopies, which means copy info gets 
lost when you do a rebase

I chose a string instead of a boolean so we can have different levels of 
copy tracing.  'disabled' is the most limited mode, then we could have 
other modes like 'ifnecessary' which could run copy tracing only if 
files changed in the source aren't present in the destination (once we 
have lazy manifest).

I can change my flag to be merge.followcopymode if you want.  Though 
keeping it in 'experimental' is probably better until we flush out what 
it should look like.

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -192,6 +192,9 @@  def _forwardcopies(a, b):
     return cm
 
 def _backwardrenames(a, b):
+    if a._repo.ui.config('experimental', 'copytracemode') == 'disabled':
+        return {}
+
     # Even though we're not taking copies into account, 1:n rename situations
     # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
     # arbitrarily pick one of the renames.
@@ -261,6 +264,13 @@  def mergecopies(repo, c1, c2, ca):
     if c2.node() is None and c1.node() == repo.dirstate.p1():
         return repo.dirstate.copies(), {}, {}, {}
 
+    if repo.ui.config('experimental', 'copytracemode') == 'disabled':
+        # If we're just tracking against the parent, do a simple check.
+        # This is necessary to ensure copies survive rebasing.
+        if ca in c2.parents():
+            return pathcopies(ca, c2), {}, {}, {}
+        return {}, {}, {}, {}
+
     limit = _findlimit(repo, c1.rev(), c2.rev())
     if limit is None:
         # no common ancestor, no copies
diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
--- a/tests/test-copy-move-merge.t
+++ b/tests/test-copy-move-merge.t
@@ -61,4 +61,45 @@  file c
   1
   2
 
+Verify disabling copy tracing loses copy info
+
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
+  rebasing 2:add3f11052fa "other" (tip)
+  merging b and a to b
+  merging c and a to c
+
+  $ test -f a
+  [1]
+
+  $ cat b
+  0
+  1
+  2
+
+  $ cat c
+  0
+  1
+  2
+
+  $ hg strip -r . --config extensions.strip=
+  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  saved backup bundle to $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config experimental.copytracemode=disabled
+  rebasing 2:add3f11052fa "other" (tip)
+  remote changed a which local deleted
+  use (c)hanged version or leave (d)eleted? c
+  $ cat a
+  0
+  1
+
+  $ cat b
+  1
+  2
+
+  $ cat c
+  1
+  2
+
   $ cd ..