Patchwork [2,of,5] filectx.parents: enforce changeid of parent to be in own changectx ancestors

login
register
mail settings
Submitter Pierre-Yves David
Date Dec. 30, 2014, 9:37 p.m.
Message ID <d514ca98fbf1a73adc72.1419975429@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/7265/
State Accepted
Commit c48924787eaa8d43744fcffaf42727b217a1dd76
Headers show

Comments

Pierre-Yves David - Dec. 30, 2014, 9:37 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1419377438 28800
#      Tue Dec 23 15:30:38 2014 -0800
# Node ID d514ca98fbf1a73adc725d48dfe0a8a49b7fd82f
# Parent  88ec1cc5c6f5b3281d46e440ba1e70dbf14d18eb
filectx.parents: enforce changeid of parent to be in own changectx ancestors

Because of the way filenode are computed, you can have multiple changesets
"introducing" the same file revision. For example in the changeset graph
below, changeset 2 and 3 both change a file -to- and -from- the same content.

  o 3: content = new
  |
  | o 2: content = new
  |/
  o 1: content = old

In such case, the file revision is create once, when 2 is added, and just reused
for 3. So the file change in '3' (from "old" to "new)" have no linkrev pointing
to it).  We'll call this situation "linkrev-shadowing". As linkrev is used for
optimisation purpose when walking a file history, the linkrev-shadowing
results in unexpected jump to another branch during such walk.. This lead to
multiple bugs with log, annotate and rename detection.

One element to fix such bug is to ensure that walking the file history stick on
the same topology as the changesets history. For this purpose, we extend the
logic in 'basefilectx.parents' so that it always defines the proper changeset
to associate the parent file revision with. This "proper" changeset has to be an
ancestor of the changeset associated to the child file revision.

This logic is performed in the '_adjustlinkrev' function. This function is
given the starting changeset and all the information regarding the parent file
revision. If the linkrev for the file revision is an ancestor of the  starting
changeset, the linkrev is valid and will be used. If it is not, we detected a
topological jump caused by linkrev shadowing, we are going to walk the
ancestors of the starting changeset until we find one setting the file to the
revision we are trying to create.

The performance impact appears acceptable:

- We are walking the changelog once for each filelog traversal (as there should
  be no overlap between searches),

- changelog traversal itself is fairly cheap, compared to what is likely going
  to be perform on the result on the filelog traversal,

- We only touch the manifest for ancestors touching the file, And such
  changeset are likely to be the one introducing the file. (except in
  pathological case  involving merge),

- We use manifest diff instead of full manifest unpacking to check manifest
  content, so it does not involve applying multiple diff in most case.

- linkrev shadowing is not the common case.

Tests for fixed issue in log, annotate and rename detection have been
added.

But this changeset does not solve all problems. It fixes -ancestry-
computation, but if the linkrev-shadows changesets is the starting one, we'll
still get things wrong. We'll have to fix the bootstrapping of such operation
in later changeset. Also The usage of `hg log FILE`  without --follow still
have issue with linkrev pointing to hidden changesets, because it rely on the
`filelog` revset which implement its own traversal logic that is still to be
fixed.

Thanks goes to:
- Matt Mackall: for nudging me in the right direction
- Julien Cristau and RĂ©mi Cardona: for keep telling me linkrev bug were an
  evolution show stopper for 3 years.
- Durham Goode: for finding a new linkrev issue every few weeks
- Mads Kiilerich: for that last rename bug who raise this topic over my
  anoyance limit.
Matt Mackall - Dec. 30, 2014, 11:43 p.m.
On Tue, 2014-12-30 at 13:37 -0800, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1419377438 28800
> #      Tue Dec 23 15:30:38 2014 -0800
> # Node ID d514ca98fbf1a73adc725d48dfe0a8a49b7fd82f
> # Parent  88ec1cc5c6f5b3281d46e440ba1e70dbf14d18eb
> filectx.parents: enforce changeid of parent to be in own changectx ancestors

> +def _adjustlinkrev(repo, path, filelog, fnode, srcrev):
> +    """return the first ancestor of <srcrev> introducting <fnode>
> +
> +    If the linkrev of the file revision does not point to an ancestor of
> +    srcrev, we'll walk down the ancestors until we found one introducing this
> +    file revision.
> +
> +    :repo: a localrepository object (used to access changelog and manifest)
> +    :path: the file path
> +    :fnode: the nodeid of the file revision
> +    :filelog: the filelog of this path
> +    :srcrev: the changeset revision we search ancestors from
> +    """

This looks like a less advanced version of this code:

http://markmail.org/message/epp4ac5zr26beyq6

For starters, yours is O(all ancestors), even in the trivial case of
ctx[foo].linkrev == ctx.rev().
Pierre-Yves David - Dec. 31, 2014, 7 p.m.
On 12/30/2014 03:43 PM, Matt Mackall wrote:
> On Tue, 2014-12-30 at 13:37 -0800, Pierre-Yves David wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1419377438 28800
>> #      Tue Dec 23 15:30:38 2014 -0800
>> # Node ID d514ca98fbf1a73adc725d48dfe0a8a49b7fd82f
>> # Parent  88ec1cc5c6f5b3281d46e440ba1e70dbf14d18eb
>> filectx.parents: enforce changeid of parent to be in own changectx ancestors
>
>> +def _adjustlinkrev(repo, path, filelog, fnode, srcrev):
>> +    """return the first ancestor of <srcrev> introducting <fnode>
>> +
>> +    If the linkrev of the file revision does not point to an ancestor of
>> +    srcrev, we'll walk down the ancestors until we found one introducing this
>> +    file revision.
>> +
>> +    :repo: a localrepository object (used to access changelog and manifest)
>> +    :path: the file path
>> +    :fnode: the nodeid of the file revision
>> +    :filelog: the filelog of this path
>> +    :srcrev: the changeset revision we search ancestors from
>> +    """
>
> This looks like a less advanced version of this code:
>
> http://markmail.org/message/epp4ac5zr26beyq6

We went over the Matt concerns over voice. Here is a quick summary

> For starters, yours is O(all ancestors),

we use a lazyancestors object and we floor it to the linkrev value. So 
the complexity should be fine.

> even in the trivial case of
> ctx[foo].linkrev == ctx.rev().

The case is not relevant until patches 3. And the trivial case is fast 
pathed in this patch.

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -20,10 +20,44 @@  propertycache = util.propertycache
 # Phony node value to stand-in for new files in some uses of
 # manifests. Manifests support 21-byte hashes for nodes which are
 # dirty in the working copy.
 _newnode = '!' * 21
 
+def _adjustlinkrev(repo, path, filelog, fnode, srcrev):
+    """return the first ancestor of <srcrev> introducting <fnode>
+
+    If the linkrev of the file revision does not point to an ancestor of
+    srcrev, we'll walk down the ancestors until we found one introducing this
+    file revision.
+
+    :repo: a localrepository object (used to access changelog and manifest)
+    :path: the file path
+    :fnode: the nodeid of the file revision
+    :filelog: the filelog of this path
+    :srcrev: the changeset revision we search ancestors from
+    """
+    cl = repo.unfiltered().changelog
+    ma = repo.manifest
+    # fetch the linkrev
+    fr = filelog.rev(fnode)
+    lkr = filelog.linkrev(fr)
+    # check if this linkrev is an ancestor of srcrev
+    anc = cl.ancestors([srcrev], lkr)
+    if lkr not in anc:
+        for a in anc:
+            ac = cl.read(a) # get changeset data (we avoid object creation).
+            if path in ac[3]: # checking the 'files' field.
+                # The file have been touched, check if the content is similar
+                # to the one we search for.
+                if fnode == ma.readdelta(ac[0]).get(path):
+                    return a
+        # In Theory we should never get out of that loop without a result. But
+        # if manifest use a buggy file revision (not children of the one it
+        # replaces) we could. Such buggy situation will likely result is crash
+        # somewhere else at to some point.
+    return lkr
+
 class basectx(object):
     """A basectx object represents the common logic for its children:
     changectx: read-only context that is already present in the repo,
     workingctx: a context that represents the working directory and can
                 be committed,
@@ -737,11 +771,11 @@  class basefilectx(object):
         _path = self._path
         fl = self._filelog
         parents = self._filelog.parents(self._filenode)
         pl = [(_path, node, fl) for node in parents if node != nullid]
 
-        r = self._filelog.renamed(self._filenode)
+        r = fl.renamed(self._filenode)
         if r:
             # - In the simple rename case, both parent are nullid, pl is empty.
             # - In case of merge, only one of the parent is null id and should
             # be replaced with the rename information. This parent is -always-
             # the first one.
@@ -749,11 +783,23 @@  class basefilectx(object):
             # As null id have alway been filtered out in the previous list
             # comprehension, inserting to 0 will always result in "replacing
             # first nullid parent with rename information.
             pl.insert(0, (r[0], r[1], self._repo.file(r[0])))
 
-        return [filectx(self._repo, p, fileid=n, filelog=l) for p, n, l in pl]
+        ret = []
+        for path, fnode, l in pl:
+            if '_changeid' in vars(self) or '_changectx' in vars(self):
+                # If self is associated with a changeset (probably explicitly
+                # fed), ensures the created filectx is associated to a
+                # changesets that is an ancestor of self.changectx.
+                rev = _adjustlinkrev(self._repo, path, l, fnode, self.rev())
+                fctx = filectx(self._repo, path, fileid=fnode, filelog=l,
+                               changeid=rev)
+            else:
+                fctx = filectx(self._repo, path, fileid=fnode, filelog=l)
+            ret.append(fctx)
+        return ret
 
     def p1(self):
         return self.parents()[0]
 
     def p2(self):
diff --git a/tests/test-annotate.t b/tests/test-annotate.t
--- a/tests/test-annotate.t
+++ b/tests/test-annotate.t
@@ -449,5 +449,65 @@  Annotate with --ignore-blank-lines (simi
   0: 
   1:  
   1: b  b
 
   $ cd ..
+
+Annotate with linkrev pointing to other another branch
+------------------------------------------------------
+
+create history with a filerev whose linkrev point to another branch
+
+  $ hg init branchedlinkrev
+  $ cd branchedlinkrev
+  $ echo A > a
+  $ hg commit -Am 'contentA'
+  adding a
+  $ echo B >> a
+  $ hg commit -m 'contentB'
+  $ hg up --rev 'desc(contentA)'
+  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ echo unrelated > unrelated
+  $ hg commit -Am 'unrelated'
+  adding unrelated
+  created new head
+  $ hg graft -r 'desc(contentB)'
+  grafting 1:fd27c222e3e6 "contentB"
+  $ echo C >> a
+  $ hg commit -m 'contentC'
+  $ hg log -G
+  @  changeset:   4:072f1e8df249
+  |  tag:         tip
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     contentC
+  |
+  o  changeset:   3:ff38df03cc4b
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     contentB
+  |
+  o  changeset:   2:62aaf3f6fc06
+  |  parent:      0:f0932f74827e
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     unrelated
+  |
+  | o  changeset:   1:fd27c222e3e6
+  |/   user:        test
+  |    date:        Thu Jan 01 00:00:00 1970 +0000
+  |    summary:     contentB
+  |
+  o  changeset:   0:f0932f74827e
+     user:        test
+     date:        Thu Jan 01 00:00:00 1970 +0000
+     summary:     contentA
+  
+
+Annotate should list ancestor of starting revision only
+
+  $ hg annotate a
+  0: A
+  3: B
+  4: C
+
+  $ cd ..
diff --git a/tests/test-log.t b/tests/test-log.t
--- a/tests/test-log.t
+++ b/tests/test-log.t
@@ -1562,5 +1562,98 @@  hg log -f dir across branches
   o  b
   |
   o  a
   
   $ cd ..
+
+hg log -f with linkrev pointing to another branch
+-------------------------------------------------
+
+create history with a filerev whose linkrev point to another branch
+
+  $ hg init branchedlinkrev
+  $ cd branchedlinkrev
+  $ echo 1 > a
+  $ hg commit -Am 'content1'
+  adding a
+  $ echo 2 > a
+  $ hg commit -m 'content2'
+  $ hg up --rev 'desc(content1)'
+  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  $ echo unrelated > unrelated
+  $ hg commit -Am 'unrelated'
+  adding unrelated
+  created new head
+  $ hg graft -r 'desc(content2)'
+  grafting 1:2294ae80ad84 "content2"
+  $ echo 3 > a
+  $ hg commit -m 'content3'
+  $ hg log -G
+  @  changeset:   4:50b9b36e9c5d
+  |  tag:         tip
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     content3
+  |
+  o  changeset:   3:15b2327059e5
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     content2
+  |
+  o  changeset:   2:2029acd1168c
+  |  parent:      0:ae0a3c9f9e95
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     unrelated
+  |
+  | o  changeset:   1:2294ae80ad84
+  |/   user:        test
+  |    date:        Thu Jan 01 00:00:00 1970 +0000
+  |    summary:     content2
+  |
+  o  changeset:   0:ae0a3c9f9e95
+     user:        test
+     date:        Thu Jan 01 00:00:00 1970 +0000
+     summary:     content1
+  
+
+log -f on the file should list the graft result.
+
+  $ hg log -Gf a
+  @  changeset:   4:50b9b36e9c5d
+  |  tag:         tip
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     content3
+  |
+  o  changeset:   3:15b2327059e5
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     content2
+  |
+  o  changeset:   0:ae0a3c9f9e95
+     user:        test
+     date:        Thu Jan 01 00:00:00 1970 +0000
+     summary:     content1
+  
+
+plain log list the original version
+(XXX we should probably list both)
+
+  $ hg log -G a
+  @  changeset:   4:50b9b36e9c5d
+  |  tag:         tip
+  |  user:        test
+  |  date:        Thu Jan 01 00:00:00 1970 +0000
+  |  summary:     content3
+  |
+  | o  changeset:   1:2294ae80ad84
+  |/   user:        test
+  |    date:        Thu Jan 01 00:00:00 1970 +0000
+  |    summary:     content2
+  |
+  o  changeset:   0:ae0a3c9f9e95
+     user:        test
+     date:        Thu Jan 01 00:00:00 1970 +0000
+     summary:     content1
+  
+  $ cd ..
diff --git a/tests/test-mv-cp-st-diff.t b/tests/test-mv-cp-st-diff.t
--- a/tests/test-mv-cp-st-diff.t
+++ b/tests/test-mv-cp-st-diff.t
@@ -1603,5 +1603,67 @@  test for case where we didn't look suffi
   +++ b/g
   @@ -1,1 +1,1 @@
   -
   +f
   $ cd ..
+
+Additional tricky linkrev case
+------------------------------
+
+If the first file revision after the diff base have a link rev pointing to a
+changeset on another branch with a revision lower that the diff base, we can
+jump past that the copy detection limit and fail to detect the rename,
+
+  $ hg init diffstoplinkrev
+  $ cd diffstoplinkrev
+
+  $ touch f
+  $ hg ci -Aqm 'empty f'
+
+Make a simple change
+
+  $ echo change > f
+  $ hg ci -m 'change f'
+
+Make a second branch, we use named branch to create a simple commit
+that does not touch f.
+
+  $ hg up -qr 'desc(empty)'
+  $ hg branch -q dev
+  $ hg ci -Aqm dev
+
+Graft the initial change, as f was untouched, we reuse the same entry and the
+linkrev point to the older branch.
+
+  $ hg graft -q 'desc(change)'
+
+Make a rename because we want to track rename. It is also important that the
+faulty linkrev is not the "start" commit to ensure the linkrev will be used.
+
+  $ hg mv f renamed
+  $ hg ci -m renamed
+
+  $ hg log -G -T '{rev} {desc}'
+  @  4 renamed
+  |
+  o  3 change f
+  |
+  o  2 dev
+  |
+  | o  1 change f
+  |/
+  o  0 empty f
+  
+
+The copy tracking should still reach rev 2 (branch creation).
+accessing the parent of 4 (renamed) should not jump use to revision 1.
+
+  $ hg diff --git -r 'desc(dev)' -r .
+  diff --git a/f b/renamed
+  rename from f
+  rename to renamed
+  --- a/f
+  +++ b/renamed
+  @@ -0,0 +1,1 @@
+  +change
+
+  $ cd ..