Patchwork [v3] diff: search beyond ancestor when detecting renames

login
register
mail settings
Submitter Mads Kiilerich
Date Jan. 12, 2014, 7:01 p.m.
Message ID <41372107d64bc9071b4f.1389553269@localhost.localdomain>
Download mbox | patch
Permalink /patch/3307/
State Accepted
Commit 243ea5ffdf3111c784ea83ffe5b65c5246eab3ef
Headers show

Comments

Mads Kiilerich - Jan. 12, 2014, 7:01 p.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1384634789 18000
#      Sat Nov 16 15:46:29 2013 -0500
# Node ID 41372107d64bc9071b4ff24046965c3c59d54c38
# Parent  57abc08c399b28aa86248f6db2cc63a7cf0b0f13
diff: search beyond ancestor when detecting renames

This removes an optimization that was introduced in 91eb4512edd0 but was too
aggressive - as indicated by how it changed test-mq-merge.t .

We are walking filelogs to find copy sources and we can thus not be sure to hit
the base revision and find the renamed file there - it could also be in the
first ancestor of the base ... in the filelog.

We are walking the filelog and can thus not easily know when we hit the first
ancestor of the base revision and which filename to look for there. Instead, we
use _findlimit like mergecopies do: The lower bound for how far we have to go
is found from the lowest changelog revision that is an ancestor of only one of
the compared revisions. Any filelog ancestor with a revision number lower than
that revision will be the ancestor of both compared revisions, and there is
thus no reason to go further back than that.
Pierre-Yves David - Jan. 13, 2014, 7:01 a.m.
On Sun, Jan 12, 2014 at 08:01:09PM +0100, Mads Kiilerich wrote:
> # HG changeset patch
> # User Mads Kiilerich <madski@unity3d.com>
> # Date 1384634789 18000
> #      Sat Nov 16 15:46:29 2013 -0500
> # Node ID 41372107d64bc9071b4ff24046965c3c59d54c38
> # Parent  57abc08c399b28aa86248f6db2cc63a7cf0b0f13
> diff: search beyond ancestor when detecting renames
> 
> This removes an optimization that was introduced in 91eb4512edd0 but was too
> aggressive - as indicated by how it changed test-mq-merge.t .

Patch looks good to me. A bit worried about the performance hit. Did you
gathered some number?
Mads Kiilerich - Jan. 13, 2014, 1:34 p.m.
On 01/13/2014 08:01 AM, Pierre-Yves David wrote:
> On Sun, Jan 12, 2014 at 08:01:09PM +0100, Mads Kiilerich wrote:
>> # HG changeset patch
>> # User Mads Kiilerich <madski@unity3d.com>
>> # Date 1384634789 18000
>> #      Sat Nov 16 15:46:29 2013 -0500
>> # Node ID 41372107d64bc9071b4ff24046965c3c59d54c38
>> # Parent  57abc08c399b28aa86248f6db2cc63a7cf0b0f13
>> diff: search beyond ancestor when detecting renames
>>
>> This removes an optimization that was introduced in 91eb4512edd0 but was too
>> aggressive - as indicated by how it changed test-mq-merge.t .
> Patch looks good to me. A bit worried about the performance hit. Did you
> gathered some number?

No. But this is a matter of correctness so a performance hit would be 
ok. Incorrect diffs has in some cases made a big impact on our team 
performance when reviewing moved files ;-).

The code already walks the filelogs back to the ancestor for all missing 
files.
The change adds a single call to _findlimit. It will efficiently walk 
the changeset DAG back to the ancestor (and a bit further) ... and we 
will in worst case walk the filelogs one extra step back.
I don't expect that to be a problem.

Some quick benchmarks shows something like 2% extra for a huge diff --stat.

This code could be optimized for some trivial cases (where it probably 
doesn't matter anyway) by only computing the limit if there actually is 
missing files ... but I doubt that would be worth it.

/Mads
Matt Mackall - Jan. 18, 2014, 4:06 a.m.
On Sun, 2014-01-12 at 20:01 +0100, Mads Kiilerich wrote:
> # HG changeset patch
> # User Mads Kiilerich <madski@unity3d.com>
> # Date 1384634789 18000
> #      Sat Nov 16 15:46:29 2013 -0500
> # Node ID 41372107d64bc9071b4ff24046965c3c59d54c38
> # Parent  57abc08c399b28aa86248f6db2cc63a7cf0b0f13
> diff: search beyond ancestor when detecting renames

Queued for default, thanks.

Patch

diff --git a/mercurial/copies.py b/mercurial/copies.py
--- a/mercurial/copies.py
+++ b/mercurial/copies.py
@@ -98,15 +98,14 @@  def _chain(src, dst, a, b):
 
     return t
 
-def _tracefile(fctx, actx):
-    '''return file context that is the ancestor of fctx present in actx'''
-    stop = actx.rev()
-    am = actx.manifest()
+def _tracefile(fctx, am, limit=-1):
+    '''return file context that is the ancestor of fctx present in ancestor
+    manifest am, stopping after the first ancestor lower than limit'''
 
     for f in fctx.ancestors():
         if am.get(f.path(), None) == f.filenode():
             return f
-        if f.rev() < stop:
+        if f.rev() < limit:
             return None
 
 def _dirstatecopies(d):
@@ -129,6 +128,13 @@  def _forwardcopies(a, b):
             # short-circuit to avoid issues with merge states
             return _dirstatecopies(w)
 
+    # files might have to be traced back to the fctx parent of the last
+    # one-side-only changeset, but not further back than that
+    limit = _findlimit(a._repo, a.rev(), b.rev())
+    if limit is None:
+        limit = -1
+    am = a.manifest()
+
     # find where new files came from
     # we currently don't try to find where old files went, too expensive
     # this means we can miss a case like 'hg rm b; hg cp a b'
@@ -137,7 +143,7 @@  def _forwardcopies(a, b):
     missing.difference_update(a.manifest().iterkeys())
 
     for f in missing:
-        ofctx = _tracefile(b[f], a)
+        ofctx = _tracefile(b[f], am, limit)
         if ofctx:
             cm[f] = ofctx.path()
 
diff --git a/tests/test-mq-merge.t b/tests/test-mq-merge.t
--- a/tests/test-mq-merge.t
+++ b/tests/test-mq-merge.t
@@ -147,11 +147,13 @@  Check patcha is still a git patch:
   -b
   +a
   +c
-  diff --git a/aa b/aa
-  new file mode 100644
-  --- /dev/null
+  diff --git a/a b/aa
+  copy from a
+  copy to aa
+  --- a/a
   +++ b/aa
-  @@ -0,0 +1,1 @@
+  @@ -1,1 +1,1 @@
+  -b
   +a
 
 Check patcha2 is still a regular patch:
diff --git a/tests/test-mv-cp-st-diff.t b/tests/test-mv-cp-st-diff.t
--- a/tests/test-mv-cp-st-diff.t
+++ b/tests/test-mv-cp-st-diff.t
@@ -1567,3 +1567,41 @@  unrelated branch diff
   @@ -0,0 +1,1 @@
   +a
   $ cd ..
+
+
+test for case where we didn't look sufficiently far back to find rename ancestor
+
+  $ hg init diffstop
+  $ cd diffstop
+  $ echo > f
+  $ hg ci -qAmf
+  $ hg mv f g
+  $ hg ci -m'f->g'
+  $ hg up -qr0
+  $ touch x
+  $ hg ci -qAmx
+  $ echo f > f
+  $ hg ci -qmf=f
+  $ hg merge -q
+  $ hg ci -mmerge
+  $ hg log -G --template '{rev}  {desc}'
+  @    4  merge
+  |\
+  | o  3  f=f
+  | |
+  | o  2  x
+  | |
+  o |  1  f->g
+  |/
+  o  0  f
+  
+  $ hg diff --git -r 2
+  diff --git a/f b/g
+  rename from f
+  rename to g
+  --- a/f
+  +++ b/g
+  @@ -1,1 +1,1 @@
+  -
+  +f
+  $ cd ..