Patchwork [4,of,4] copy: add flag for disabling copy tracing

login
register
mail settings
Submitter Durham Goode
Date Feb. 3, 2015, 3:20 a.m.
Message ID <39427c27d17d5d13bbe2.1422933602@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/7619/
State Superseded
Commit 38f92d12357cb8d93576c52d5ed93e771e06ef03
Headers show

Comments

Durham Goode - Feb. 3, 2015, 3:20 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1422386787 28800
#      Tue Jan 27 11:26:27 2015 -0800
# Node ID 39427c27d17d5d13bbe24367d458705dd9a6d64a
# Parent  7c2e0203750f9372d77c374ce931e0bb87a54f0f
copy: add flag for disabling copy tracing

Copy tracing can be up to 50% of rebase time when rebasing medium sized stacks
of commits (~12 commits) in large repos (hundreds of thousands of files).
This provides the option of turning off the majority of copy tracing. It
does not turn off _forwardcopies() since that is used to carry copy
information inside a commit across a rebase.

This will affect the situation where a user edits a file, then rebases on top of
commits that have moved that file. The move will not be detected and the user
will have to manually resolve the issue.

Once the lazymanifest work goes in, we can make this smarter by only disabling
copy detection when we know all the target files exist in the destination (since
we will be able to cheaply detect this with lazymanifest.diff).

This takes a 12 commit rebase from 70s to 45s. Future patches will bring this
down further to 27s.
Augie Fackler - Feb. 3, 2015, 3:21 p.m.
On Mon, Feb 02, 2015 at 07:20:02PM -0800, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1422386787 28800
> #      Tue Jan 27 11:26:27 2015 -0800
> # Node ID 39427c27d17d5d13bbe24367d458705dd9a6d64a
> # Parent  7c2e0203750f9372d77c374ce931e0bb87a54f0f
> copy: add flag for disabling copy tracing

I like the spirit of the series. First three are queued.

>
> Copy tracing can be up to 50% of rebase time when rebasing medium sized stacks
> of commits (~12 commits) in large repos (hundreds of thousands of files).
> This provides the option of turning off the majority of copy tracing. It
> does not turn off _forwardcopies() since that is used to carry copy
> information inside a commit across a rebase.
>
> This will affect the situation where a user edits a file, then rebases on top of
> commits that have moved that file. The move will not be detected and the user
> will have to manually resolve the issue.
>
> Once the lazymanifest work goes in, we can make this smarter by only disabling
> copy detection when we know all the target files exist in the destination (since
> we will be able to cheaply detect this with lazymanifest.diff).
>
> This takes a 12 commit rebase from 70s to 45s. Future patches will bring this
> down further to 27s.
>
> diff --git a/mercurial/copies.py b/mercurial/copies.py
> --- a/mercurial/copies.py
> +++ b/mercurial/copies.py
> @@ -192,6 +192,9 @@ def _forwardcopies(a, b):
>      return cm
>
>  def _backwardrenames(a, b):
> +    if a._repo.ui.config('ui', 'copytracemode') == 'disabled':

I don't have a constructive suggestion, but [ui] is the wrong section
for this. I know marmoute has strong opinions about config sections,
maybe he'll have a better idea?

> +        return {}
> +
>      # Even though we're not taking copies into account, 1:n rename situations
>      # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
>      # arbitrarily pick one of the renames.
> @@ -261,6 +264,13 @@ def mergecopies(repo, c1, c2, ca):
>      if c2.node() is None and c1.node() == repo.dirstate.p1():
>          return repo.dirstate.copies(), {}, {}, {}
>
> +    if repo.ui.config('ui', 'copytracemode') == 'disabled':
> +        # If we're just tracking against the parent, do a simple check.
> +        # This is necessary to ensure copies survive rebasing.
> +        if ca in c2.parents():
> +            return pathcopies(ca, c2), {}, {}, {}
> +        return {}, {}, {}, {}
> +
>      limit = _findlimit(repo, c1.rev(), c2.rev())
>      if limit is None:
>          # no common ancestor, no copies
> diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
> --- a/tests/test-copy-move-merge.t
> +++ b/tests/test-copy-move-merge.t
> @@ -61,4 +61,45 @@ file c
>    1
>    2
>
> +Verify disabling copy tracing loses copy info
> +
> +  $ hg up -qC 2
> +  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
> +  rebasing 2:add3f11052fa "other" (tip)
> +  merging b and a to b
> +  merging c and a to c
> +
> +  $ test -f a
> +  [1]
> +
> +  $ cat b
> +  0
> +  1
> +  2
> +
> +  $ cat c
> +  0
> +  1
> +  2
> +
> +  $ hg strip -r . --config extensions.strip=
> +  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  saved backup bundle to $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
> +  $ hg up -qC 2
> +  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config ui.copytracemode=disabled
> +  rebasing 2:add3f11052fa "other" (tip)
> +  remote changed a which local deleted
> +  use (c)hanged version or leave (d)eleted? c
> +  $ cat a
> +  0
> +  1
> +
> +  $ cat b
> +  1
> +  2
> +
> +  $ cat c
> +  1
> +  2
> +
>    $ cd ..
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Pierre-Yves David - Feb. 3, 2015, 4:50 p.m.
On 02/03/2015 03:20 AM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1422386787 28800
> #      Tue Jan 27 11:26:27 2015 -0800
> # Node ID 39427c27d17d5d13bbe24367d458705dd9a6d64a
> # Parent  7c2e0203750f9372d77c374ce931e0bb87a54f0f
> copy: add flag for disabling copy tracing

I believe the main motivation for this patch was extremely inflated by a 
performance regression in 3.3-rev. Now that the regression is fixed I 
think it would be wiser to not disable this important Mercurial feature.
Augie Fackler - Feb. 3, 2015, 7:21 p.m.
On Tue, Feb 3, 2015 at 11:50 AM, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
> I believe the main motivation for this patch was extremely inflated by a
> performance regression in 3.3-rev. Now that the regression is fixed I think
> it would be wiser to not disable this important Mercurial feature.

Sure, but it probably makes sense to skip this step when we know it's
irrelevant, which will be cheap RSN.
Durham Goode - Feb. 4, 2015, 3:07 a.m.
On 2/3/15 8:50 AM, Pierre-Yves David wrote:
>
>
> On 02/03/2015 03:20 AM, Durham Goode wrote:
>> # HG changeset patch
>> # User Durham Goode <durham@fb.com>
>> # Date 1422386787 28800
>> #      Tue Jan 27 11:26:27 2015 -0800
>> # Node ID 39427c27d17d5d13bbe24367d458705dd9a6d64a
>> # Parent  7c2e0203750f9372d77c374ce931e0bb87a54f0f
>> copy: add flag for disabling copy tracing
>
> I believe the main motivation for this patch was extremely inflated by 
> a performance regression in 3.3-rev. Now that the regression is fixed 
> I think it would be wiser to not disable this important Mercurial 
> feature.
>
I just tried it with the latest @ build and the perf has not gotten any 
better (https://bpaste.net/show/b5cbdcab2d47).  The perf win is so big 
that I'd like to experiment with disabling it and see how many users 
complain.  Then we can iterate to find a nice balance (better detection 
with lazymanifest + maybe issue a warning if we think we're losing copy 
information).

The trace copies algorithm is fundamentally not scalable for a large 
working copy (it traces the history of every file that has changed since 
the common ancestor).  It simply cannot operate on a large repo without 
either having forward pointers (i.e. tracing copies forward from the 
common ancestor up to the rebase destination), or a remote server that 
can answer copy questions quickly.  Since neither solution is easy, and 
the perf win so significant, I'd rather disable the feature, and allow 
users to reenable it when they need it.

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -192,6 +192,9 @@  def _forwardcopies(a, b):
     return cm
 
 def _backwardrenames(a, b):
+    if a._repo.ui.config('ui', 'copytracemode') == 'disabled':
+        return {}
+
     # Even though we're not taking copies into account, 1:n rename situations
     # can still exist (e.g. hg cp a b; hg mv a c). In those cases we
     # arbitrarily pick one of the renames.
@@ -261,6 +264,13 @@  def mergecopies(repo, c1, c2, ca):
     if c2.node() is None and c1.node() == repo.dirstate.p1():
         return repo.dirstate.copies(), {}, {}, {}
 
+    if repo.ui.config('ui', 'copytracemode') == 'disabled':
+        # If we're just tracking against the parent, do a simple check.
+        # This is necessary to ensure copies survive rebasing.
+        if ca in c2.parents():
+            return pathcopies(ca, c2), {}, {}, {}
+        return {}, {}, {}, {}
+
     limit = _findlimit(repo, c1.rev(), c2.rev())
     if limit is None:
         # no common ancestor, no copies
diff --git a/tests/test-copy-move-merge.t b/tests/test-copy-move-merge.t
--- a/tests/test-copy-move-merge.t
+++ b/tests/test-copy-move-merge.t
@@ -61,4 +61,45 @@  file c
   1
   2
 
+Verify disabling copy tracing loses copy info
+
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase=
+  rebasing 2:add3f11052fa "other" (tip)
+  merging b and a to b
+  merging c and a to c
+
+  $ test -f a
+  [1]
+
+  $ cat b
+  0
+  1
+  2
+
+  $ cat c
+  0
+  1
+  2
+
+  $ hg strip -r . --config extensions.strip=
+  2 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  saved backup bundle to $TESTTMP/t/.hg/strip-backup/550bd84c0cd3-fc575957-backup.hg (glob)
+  $ hg up -qC 2
+  $ hg rebase --keep -d 1 -b 2 --config extensions.rebase= --config ui.copytracemode=disabled
+  rebasing 2:add3f11052fa "other" (tip)
+  remote changed a which local deleted
+  use (c)hanged version or leave (d)eleted? c
+  $ cat a
+  0
+  1
+
+  $ cat b
+  1
+  2
+
+  $ cat c
+  1
+  2
+
   $ cd ..