Patchwork commit: add config option to limit copy search distance

login
register
mail settings
Submitter Ryan McElroy
Date Dec. 18, 2014, 1:11 a.m.
Message ID <769728b7e5f243b0527d.1418865062@devbig105.prn2.facebook.com>
Download mbox | patch
Permalink /patch/7159/
State Changes Requested
Headers show

Comments

Ryan McElroy - Dec. 18, 2014, 1:11 a.m.
# HG changeset patch
# User Ryan McElroy <rmcelroy@fb.com>
# Date 1418862573 28800
#      Wed Dec 17 16:29:33 2014 -0800
# Node ID 769728b7e5f243b0527d6f374098c537f9ddd2d6
# Parent  39cead85fd58ae6693592074656b284ed736d9bc
commit: add config option to limit copy search distance

Editing history, among other things, can result in a copy source 'going missing'
which causes Mercurial to search back through history for the original of the
file. This search may be in vain, though, because the source of the file is no
longer in the history. In large repositories, this can take hours or days.

This patch adds an optional configuration option (limit.copysearchrevs) to limit
how far back Mercurial will search for the file. A value of None (default) will
maintain the current behavior of searching all the way back while a value of 0
will disable search through any revs. Moderate values should enable the benefits
of this behavior in most cases while avoiding long delays searching through the
history.
Matt Mackall - Dec. 18, 2014, 1:29 a.m.
On Wed, 2014-12-17 at 17:11 -0800, Ryan McElroy wrote:
> # HG changeset patch
> # User Ryan McElroy <rmcelroy@fb.com>
> # Date 1418862573 28800
> #      Wed Dec 17 16:29:33 2014 -0800
> # Node ID 769728b7e5f243b0527d6f374098c537f9ddd2d6
> # Parent  39cead85fd58ae6693592074656b284ed736d9bc
> commit: add config option to limit copy search distance
> 
> Editing history, among other things, can result in a copy source 'going missing'
> which causes Mercurial to search back through history for the original of the
> file. This search may be in vain, though, because the source of the file is no
> longer in the history. In large repositories, this can take hours or days.
> 
> This patch adds an optional configuration option (limit.copysearchrevs) to limit
> how far back Mercurial will search for the file. A value of None (default) will
> maintain the current behavior of searching all the way back while a value of 0
> will disable search through any revs. Moderate values should enable the benefits
> of this behavior in most cases while avoiding long delays searching through the
> history.

Interesting that at the same time we have a patch on the list that
propose removing the limit entirely.
Durham Goode - Dec. 18, 2014, 5:29 p.m.
On 12/17/14 5:29 PM, Matt Mackall wrote:
> On Wed, 2014-12-17 at 17:11 -0800, Ryan McElroy wrote:
>> # HG changeset patch
>> # User Ryan McElroy <rmcelroy@fb.com>
>> # Date 1418862573 28800
>> #      Wed Dec 17 16:29:33 2014 -0800
>> # Node ID 769728b7e5f243b0527d6f374098c537f9ddd2d6
>> # Parent  39cead85fd58ae6693592074656b284ed736d9bc
>> commit: add config option to limit copy search distance
>>
>> Editing history, among other things, can result in a copy source 'going missing'
>> which causes Mercurial to search back through history for the original of the
>> file. This search may be in vain, though, because the source of the file is no
>> longer in the history. In large repositories, this can take hours or days.
>>
>> This patch adds an optional configuration option (limit.copysearchrevs) to limit
>> how far back Mercurial will search for the file. A value of None (default) will
>> maintain the current behavior of searching all the way back while a value of 0
>> will disable search through any revs. Moderate values should enable the benefits
>> of this behavior in most cases while avoiding long delays searching through the
>> history.
> Interesting that at the same time we have a patch on the list that
> propose removing the limit entirely.
>
I think this is a different issue.  The _tracelimit discussion in the 
other thread is about traversing the filelog, while the stuff Ryan is 
changing is actually traversing the entire changelog.

I don't understand the copy tracking code even before Ryan's change.  
It's basically saying "if the copy source isn't in either of the 
dirstate parent manifests, go looking for it".  But if the file being 
copied isn't in either of the parents, why are we treating it as a copy?

Further more, why are we even searching for the copy node?  Doesn't the 
copy data returned by fctx.renamed() give us the exact copy source path 
and copy source file node?  It looks like memfilectx doesn't obey this, 
but that's probably fixable.

I'll play around with this a bit to see if my theory holds out.
Durham Goode - Dec. 18, 2014, 7:21 p.m.
On 12/18/14 9:29 AM, Durham Goode wrote:
>
> On 12/17/14 5:29 PM, Matt Mackall wrote:
>> On Wed, 2014-12-17 at 17:11 -0800, Ryan McElroy wrote:
>>> # HG changeset patch
>>> # User Ryan McElroy <rmcelroy@fb.com>
>>> # Date 1418862573 28800
>>> #      Wed Dec 17 16:29:33 2014 -0800
>>> # Node ID 769728b7e5f243b0527d6f374098c537f9ddd2d6
>>> # Parent  39cead85fd58ae6693592074656b284ed736d9bc
>>> commit: add config option to limit copy search distance
>>>
>>> Editing history, among other things, can result in a copy source 
>>> 'going missing'
>>> which causes Mercurial to search back through history for the 
>>> original of the
>>> file. This search may be in vain, though, because the source of the 
>>> file is no
>>> longer in the history. In large repositories, this can take hours or 
>>> days.
>>>
>>> This patch adds an optional configuration option 
>>> (limit.copysearchrevs) to limit
>>> how far back Mercurial will search for the file. A value of None 
>>> (default) will
>>> maintain the current behavior of searching all the way back while a 
>>> value of 0
>>> will disable search through any revs. Moderate values should enable 
>>> the benefits
>>> of this behavior in most cases while avoiding long delays searching 
>>> through the
>>> history.
>> Interesting that at the same time we have a patch on the list that
>> propose removing the limit entirely.
>>
> I think this is a different issue.  The _tracelimit discussion in the 
> other thread is about traversing the filelog, while the stuff Ryan is 
> changing is actually traversing the entire changelog.
>
> I don't understand the copy tracking code even before Ryan's change.  
> It's basically saying "if the copy source isn't in either of the 
> dirstate parent manifests, go looking for it".  But if the file being 
> copied isn't in either of the parents, why are we treating it as a copy?
>
> Further more, why are we even searching for the copy node? Doesn't the 
> copy data returned by fctx.renamed() give us the exact copy source 
> path and copy source file node?  It looks like memfilectx doesn't obey 
> this, but that's probably fixable.
>
> I'll play around with this a bit to see if my theory holds out.
Ug, it gets even worse.  The look up is 'self[None].ancestors()' but 
localrepo._filecommit is called from commitctx which is totally 
independent of the working copy.  So arbitrary memctx commits are doing 
copy tracing relative to the working copy position.

I tried deleting this copy tracing from localrepo._filecommit, and the 
only thing it seems to break is a special case in graft where you graft 
a copy and it results in an empty commit.  But the existing copy tracing 
is breaks that case in other ways already!  If you have two branches 
that copy foo-to-bar and edit bar in different ways, it completely 
breaks hg log -f because it grafts in the bar copy as a copy from the 
original foo source, instead of an edit to the existing bar copy.

I guess I'll log a task to redo this code.  In the meantime Ryan's 
change will at least let us avoid the massive perf hit.

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1188,9 +1188,16 @@  class localrepository(object):
 
             # find source in nearest ancestor if we've lost track
             if not crev:
+                limit = self.ui.configint('limit', 'copysearchrevs')
                 self.ui.debug(" %s: searching for copy revision for %s\n" %
                               (fname, cfname))
+                revs = 0
                 for ancestor in self[None].ancestors():
+                    revs += 1
+                    if limit is not None and revs >= limit:
+                        self.ui.debug(" giving up search after %d revs\n" %
+                                      limit)
+                        break
                     if cfname in ancestor:
                         crev = ancestor[cfname].filenode()
                         break
diff --git a/tests/test-issue1175.t b/tests/test-issue1175.t
--- a/tests/test-issue1175.t
+++ b/tests/test-issue1175.t
@@ -52,3 +52,134 @@  http://mercurial.selenic.com/bts/issue11
   diff --git a/b b/b
   new file mode 100644
 
+verify that we don't search back forever when limit.copysearchrevs is set
+
+  $ cat << EOF >> $HGRCPATH
+  > [extensions]
+  > histedit=
+  > strip=
+  > EOF
+  $ hg strip -r 0
+  0 files updated, 0 files merged, 3 files removed, 0 files unresolved
+  saved backup bundle to $TESTTMP/.hg/strip-backup/487a0a245cea-backup.hg
+  $ echo a > a
+  $ hg ci -Am "a"
+  adding a
+  $ hg rm a
+  $ hg ci -Am "rm a"
+  $ echo x > x
+  $ hg ci -Am "x"
+  adding x
+  $ echo b > a
+  $ hg ci -Am "a again"
+  adding a
+  $ hg cp a b
+  $ hg commit -Am "cp a b"
+  $ hg log -T "{node|short} {desc}\n"
+  a069f6414ea6 cp a b
+  61202067ef74 a again
+  08eb7efa8483 x
+  61a87486a998 rm a
+  cb9a9f314b8b a
+
+  $ cat << EOF >> $HGRCPATH
+  > [limit]
+  > copysearchrevs = 2
+  > EOF
+
+  $ cat << EOF >> $TESTTMP/.hg/histedit-commands
+  > pick a069f6414ea6
+  > pick 61202067ef74
+  > EOF
+
+  $ hg histedit --debug --commands .hg/histedit-commands 61202067ef74
+  histedit: processing pick a069f6414ea6
+  resolving manifests
+   branchmerge: False, force: False, partial: False
+   ancestor: a069f6414ea6, local: a069f6414ea6+, remote: 08eb7efa8483
+   a: other deleted -> r
+  removing a
+   b: other deleted -> r
+  removing b
+  updating: b 2/2 files (100.00%)
+  0 files updated, 0 files merged, 2 files removed, 0 files unresolved
+    searching for copies back to rev 3
+    unmatched files in other:
+     b
+    all copies found (* = to merge, ! = divergent, % = renamed and deleted):
+     src: 'a' -> dst: 'b' 
+    checking for directory renames
+  resolving manifests
+   branchmerge: True, force: True, partial: False
+   ancestor: 61202067ef74, local: 08eb7efa8483+, remote: a069f6414ea6
+   b: remote created -> g
+  getting b
+  updating: b 1/1 files (100.00%)
+  b
+   b: searching for copy revision for a
+   giving up search after 2 revs
+  warning: can't find ancestor for 'b' copied from 'a'!
+  histedit: processing pick 61202067ef74
+  0 files updated, 0 files merged, 0 files removed, 0 files unresolved
+    searching for copies back to rev 3
+    unmatched files in local:
+     b
+    unmatched files in other:
+     a
+  resolving manifests
+   branchmerge: True, force: True, partial: False
+   ancestor: 08eb7efa8483, local: 2378b72859b9+, remote: 61202067ef74
+   a: remote created -> g
+  getting a
+  updating: a 1/1 files (100.00%)
+  a
+  0 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  histedit: 61202067ef74 is replaced by 9d3c2e4f2a3e
+  histedit: a069f6414ea6 is replaced by 2378b72859b9
+  should strip replaced nodes 61202067ef74, a069f6414ea6
+  2 changesets found
+  list of changesets:
+  61202067ef74879e99408a860ce642f74222fa9b
+  a069f6414ea638a83e399fa76619e45af2821cff
+  bundling: 1/2 changesets (50.00%)
+  bundling: 2/2 changesets (100.00%)
+  bundling: 1/2 manifests (50.00%)
+  bundling: 2/2 manifests (100.00%)
+  bundling: a 1/2 files (50.00%)
+  bundling: b 2/2 files (100.00%)
+  saved backup bundle to $TESTTMP/.hg/strip-backup/61202067ef74-backup.hg
+  2 changesets found
+  list of changesets:
+  2378b72859b95830964246f68a86872c1c6ad20c
+  9d3c2e4f2a3e1a832d09a6bd8899e16711966b44
+  bundling: 1/2 changesets (50.00%)
+  bundling: 2/2 changesets (100.00%)
+  bundling: 1/2 manifests (50.00%)
+  bundling: 2/2 manifests (100.00%)
+  bundling: a 1/2 files (50.00%)
+  bundling: b 2/2 files (100.00%)
+  adding branch
+  adding changesets
+  changesets: 1 chunks
+  add changeset 2378b72859b9
+  changesets: 2 chunks
+  add changeset 9d3c2e4f2a3e
+  adding manifests
+  manifests: 1/2 chunks (50.00%)
+  manifests: 2/2 chunks (100.00%)
+  adding file changes
+  adding a revisions
+  files: 1/2 chunks (50.00%)
+  adding b revisions
+  files: 2/2 chunks (100.00%)
+  added 2 changesets with 2 changes to 2 files
+  invalid branchheads cache (served): tip differs
+  should strip temp nodes 
+  updating the branch cache
+  $ hg log -T "{node|short} {desc}\n"
+  9d3c2e4f2a3e a again
+  2378b72859b9 cp a b
+  08eb7efa8483 x
+  61a87486a998 rm a
+  cb9a9f314b8b a
+