Patchwork changegroup: fix file linkrevs during reorders

login
register
mail settings
Submitter Durham Goode
Date Nov. 21, 2014, 1:27 a.m.
Message ID <b4c33f5d506db0880a9b.1416533248@dev2000.prn2.facebook.com>
Download mbox | patch
Permalink /patch/6815/
State Superseded
Headers show

Comments

Durham Goode - Nov. 21, 2014, 1:27 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1416529857 28800
#      Thu Nov 20 16:30:57 2014 -0800
# Node ID b4c33f5d506db0880a9b02c63c633d216fdcd87c
# Parent  a179db3db9b96b38c10c491e6e7e7ad5f40a7787
changegroup: fix file linkrevs during reorders

Previously, if reorder was true during the creation of a changegroup bundle,
it was possible that the manifest and filelogs would be reordered such that the
resulting bundle filelog had a linkrev that pointed to a commit that was not
the earliest instance of the filelog revision. For example:

With commits:

0<-1<---3<-4
  \       /
   --2<---

if 2 and 3 added the same version of a file, if the manifests of 2 and 3 have
their order reversed, but the changelog did not, it could produce a filelog with
linkrevs 0<-3 instead of 0<-2, which meant if commit 3 was stripped, it would
delete that file data from the repository and commit 2 would be corrupt (as
would any future pulls that tried to build upon that version of the file).

The fix is to make the linkrev fixup smarter. Previously it considered the first
manifest that added a file to be the first commit that added that file, which is
not true. Now, for every file revision we add to the bundle we make sure we
attach it to the earliest applicable linkrev.

I can only repro this issue in a large internal repo, so I don't have a good
test case to add a test for (since it depends on the exact commit vs. manifest
vs. filelog orders).
Pierre-Yves David - Nov. 21, 2014, 1:57 a.m.
On 11/20/2014 05:27 PM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1416529857 28800
> #      Thu Nov 20 16:30:57 2014 -0800
> # Node ID b4c33f5d506db0880a9b02c63c633d216fdcd87c
> # Parent  a179db3db9b96b38c10c491e6e7e7ad5f40a7787
> changegroup: fix file linkrevs during reorders
>
> Previously, if reorder was true during the creation of a changegroup bundle,
> it was possible that the manifest and filelogs would be reordered such that the
> resulting bundle filelog had a linkrev that pointed to a commit that was not
> the earliest instance of the filelog revision. For example:
>
> With commits:
>
> 0<-1<---3<-4
>    \       /
>     --2<---
>
> if 2 and 3 added the same version of a file, if the manifests of 2 and 3 have
> their order reversed, but the changelog did not, it could produce a filelog with
> linkrevs 0<-3 instead of 0<-2, which meant if commit 3 was stripped, it would
> delete that file data from the repository and commit 2 would be corrupt (as
> would any future pulls that tried to build upon that version of the file).
>
> The fix is to make the linkrev fixup smarter. Previously it considered the first
> manifest that added a file to be the first commit that added that file, which is
> not true. Now, for every file revision we add to the bundle we make sure we
> attach it to the earliest applicable linkrev.
>
> I can only repro this issue in a large internal repo, so I don't have a good
> test case to add a test for (since it depends on the exact commit vs. manifest
> vs. filelog orders).

Any change we can get a test for it? Looks like it would not be too hard 
with some --debug output and others debug commands.

>
> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> --- a/mercurial/changegroup.py
> +++ b/mercurial/changegroup.py
> @@ -325,6 +325,7 @@ class cg1packer(object):
>           # for progress output
>           msgbundling = _('bundling')
>
> +        clrevorder = {}
>           mfs = {} # needed manifests
>           fnodes = {} # needed file nodes
>           changedfiles = set()
> @@ -334,6 +335,7 @@ class cg1packer(object):
>           # Returns the linkrev node (identity in the changelog case).
>           def lookupcl(x):
>               c = cl.read(x)
> +            clrevorder[x] = len(clrevorder)
>               changedfiles.update(c[3])
>               # record the first changeset introducing this manifest version
>               mfs.setdefault(c[0], x)
> @@ -349,13 +351,16 @@ class cg1packer(object):
>           # Returns the linkrev node (collected in lookupcl).
>           def lookupmf(x):
>               clnode = mfs[x]
> -            if not fastpathlinkrev:
> +            if not fastpathlinkrev or reorder:
>                   mdata = mf.readfast(x)
>                   for f, n in mdata.iteritems():
>                       if f in changedfiles:
>                           # record the first changeset introducing this filelog
>                           # version
> -                        fnodes.setdefault(f, {}).setdefault(n, clnode)
> +                        fclnodes = fnodes.setdefault(f, {})
> +                        fclnode = fclnodes.setdefault(n, clnode)
> +                        if clrevorder[clnode] < clrevorder[fclnode]:
> +                            fclnodes[n] = clnode
>               return clnode
>
>           mfnodes = self.prune(mf, mfs, commonrevs, source)
> @@ -368,7 +373,7 @@ class cg1packer(object):
>           needed = set(cl.rev(x) for x in clnodes)
>
>           def linknodes(filerevlog, fname):
> -            if fastpathlinkrev:
> +            if fastpathlinkrev and not reorder:
>                   llr = filerevlog.linkrev
>                   def genfilenodes():
>                       for r in filerevlog:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>

Patch

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -325,6 +325,7 @@  class cg1packer(object):
         # for progress output
         msgbundling = _('bundling')
 
+        clrevorder = {}
         mfs = {} # needed manifests
         fnodes = {} # needed file nodes
         changedfiles = set()
@@ -334,6 +335,7 @@  class cg1packer(object):
         # Returns the linkrev node (identity in the changelog case).
         def lookupcl(x):
             c = cl.read(x)
+            clrevorder[x] = len(clrevorder)
             changedfiles.update(c[3])
             # record the first changeset introducing this manifest version
             mfs.setdefault(c[0], x)
@@ -349,13 +351,16 @@  class cg1packer(object):
         # Returns the linkrev node (collected in lookupcl).
         def lookupmf(x):
             clnode = mfs[x]
-            if not fastpathlinkrev:
+            if not fastpathlinkrev or reorder:
                 mdata = mf.readfast(x)
                 for f, n in mdata.iteritems():
                     if f in changedfiles:
                         # record the first changeset introducing this filelog
                         # version
-                        fnodes.setdefault(f, {}).setdefault(n, clnode)
+                        fclnodes = fnodes.setdefault(f, {})
+                        fclnode = fclnodes.setdefault(n, clnode)
+                        if clrevorder[clnode] < clrevorder[fclnode]:
+                            fclnodes[n] = clnode
             return clnode
 
         mfnodes = self.prune(mf, mfs, commonrevs, source)
@@ -368,7 +373,7 @@  class cg1packer(object):
         needed = set(cl.rev(x) for x in clnodes)
 
         def linknodes(filerevlog, fname):
-            if fastpathlinkrev:
+            if fastpathlinkrev and not reorder:
                 llr = filerevlog.linkrev
                 def genfilenodes():
                     for r in filerevlog: