Patchwork [RFC,need,advice] changegroup: introduce cg3, which has support for exchanging treemanifests

login
register
mail settings
Submitter Augie Fackler
Date Oct. 15, 2015, 2:57 p.m.
Message ID <890de0a21cd7cfdceaef.1444921055@arthedain.pit.corp.google.com>
Download mbox | patch
Permalink /patch/11103/
State Superseded
Commit 77d25b913f80b57d54cc81eaa38259fdbb10c657
Headers show

Comments

Augie Fackler - Oct. 15, 2015, 2:57 p.m.
# HG changeset patch
# User Augie Fackler <augie@google.com>
# Date 1444327710 14400
#      Thu Oct 08 14:08:30 2015 -0400
# Node ID 890de0a21cd7cfdceaef55564d6ac010ad27c32a
# Parent  83b738235f1bfc33a943a63a5d0f5527ac2da736
changegroup: introduce cg3, which has support for exchanging treemanifests

Open questions:
  1) Are we happy with just a 't' or 'f' for flagging type of manifest?
  2) How can we not have the grotesque requirements hack during apply()?

Many thanks to adgar@google.com for helping me with various odd
corners of the API.
Gregory Szorc - Oct. 15, 2015, 6:07 p.m.
On Thu, Oct 15, 2015 at 7:57 AM, Augie Fackler <raf@durin42.com> wrote:

> # HG changeset patch
> # User Augie Fackler <augie@google.com>
> # Date 1444327710 14400
> #      Thu Oct 08 14:08:30 2015 -0400
> # Node ID 890de0a21cd7cfdceaef55564d6ac010ad27c32a
> # Parent  83b738235f1bfc33a943a63a5d0f5527ac2da736
> changegroup: introduce cg3, which has support for exchanging treemanifests
>
> Open questions:
>   1) Are we happy with just a 't' or 'f' for flagging type of manifest?
>

This is a binary format. So we have 256 values to play with here. I think
there is room for expansion. Only question would be whether you are using
't' and 'f' instead of say 0x01 and 0x02. But that's cosmetic.


>   2) How can we not have the grotesque requirements hack during apply()?
>

I have a similar issue for stream clones. Currently, the "perform a stream
clone" code updates the requirements when data comes in. Even in my RFC
stream clone bundle series, this isn't done at bundle application though:
only as part of consuming a stream clone from the wire protocol. It does,
however, verify the incoming requirements are supported.

Both of us seem to be grappling with the problem that "unbundle" can occur
both during clone and during an incremental "pull." It's almost like we
need an "intent" to the unbundler that refuses to perform clone-time
actions (like requirements adjusting) on post-clone unbundles.


>
> Many thanks to adgar@google.com for helping me with various odd
> corners of the API.
>
> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> --- a/mercurial/changegroup.py
> +++ b/mercurial/changegroup.py
> @@ -8,6 +8,7 @@
>  from __future__ import absolute_import
>
>  import os
> +import posixpath
>  import struct
>  import tempfile
>  import weakref
> @@ -496,6 +497,53 @@ class cg2unpacker(cg1unpacker):
>          node, p1, p2, deltabase, cs = headertuple
>          return node, p1, p2, deltabase, cs
>
> +class cg3unpacker(cg2unpacker):
> +    deltaheader = _CHANGEGROUPV2_DELTA_HEADER
> +    deltaheadersize = struct.calcsize(deltaheader)
> +    version = '03'
> +
> +    def __init__(self, *args, **kwargs):
> +        super(cg3unpacker, self).__init__(*args, **kwargs)
> +        self._trees = False
> +
> +    def manifestheader(self):
> +        """Read the manifest header.
> +
> +        cg3 supports treemanifests as well as flatmanifests, so we
> +        need to read the header for manifests and find out which we
> +        should read.
> +        """
> +        flagsize = self._chunklength()
> +        assert flagsize == 1, 'flag size should be 1'
> +        flag = readexactly(self, 1)
>
+        if flag not in 'ft':
> +            raise error.Abort(
> +                'Unknown manifest format %r in cg3unpacker.' % flag)
> +        return flag
> +
> +    def _unpackmanifests(self, repo, revmap, trp, prog, numchanges):
> +        # TODO: include number of manifest nodes in header for better
> +        # progress?
> +        flag = self.manifestheader()
> +        if flag == 't':
> +            # TODO this is applying the requirements far too late :(
> +            repo.requirements.add('treemanifest')
> +            repo._applyopenerreqs()
> +            repo._writerequirements()
> +        self.callback = prog(_('manifests'), numchanges)
> +        repo.manifest.addgroup(self, revmap, trp)
> +        dirlog = repo.manifest.dirlog
> +        if flag == 't':
> +            while True:
> +                # TODO consider not (ab)using fileheader for
> +                # treemanifest headers
> +                hdr = self.filelogheader()
> +                if not hdr:
> +                    break
> +                dirname = hdr['filename']
> +                dl = dirlog(dirname)
> +                dl.addgroup(self, revmap, trp)
> +
>  class headerlessfixup(object):
>      def __init__(self, fh, h):
>          self._h = h
> @@ -593,8 +641,9 @@ class cg1packer(object):
>          rr, rl = revlog.rev, revlog.linkrev
>          return [n for n in missing if rl(rr(n)) not in commonrevs]
>
> -    def _packmanifests(self, mfnodes, lookuplinknode):
> +    def _packmanifests(self, mfnodes, tmfnodes, lookuplinknode):
>          """Pack flat manifests into a changegroup stream."""
> +        assert not tmfnodes
>          ml = self._repo.manifest
>          size = 0
>          for chunk in self.group(
> @@ -611,6 +660,7 @@ class cg1packer(object):
>
>          clrevorder = {}
>          mfs = {} # needed manifests
> +        tmfnodes = {}
>          fnodes = {} # needed file nodes
>          changedfiles = set()
>
> @@ -648,6 +698,11 @@ class cg1packer(object):
>          # simply take the slowpath, which already has the 'clrevorder'
> logic.
>          # This was also fixed in cc0ff93d0c0c.
>          fastpathlinkrev = fastpathlinkrev and not self._reorder
> +        # Treemanifests don't work correctly with fastpathlinkrev
> +        # either, because we don't discover which directory nodes to
> +        # send along with files. This could probably be fixed.
> +        fastpathlinkrev = fastpathlinkrev and (
> +            'treemanifest' not in self._repo.requirements)
>          # Callback for the manifest, used to collect linkrevs for filelog
>          # revisions.
>          # Returns the linkrev node (collected in lookupcl).
> @@ -663,10 +718,26 @@ class cg1packer(object):
>                          fclnode = fclnodes.setdefault(n, clnode)
>                          if clrevorder[clnode] < clrevorder[fclnode]:
>                              fclnodes[n] = clnode
> +                        # gather list of changed treemanifest nodes
> +                        if '/' in f and ('treemanifest' in
> +                                         self._repo.requirements):
> +                            dirs = f.split('/')[:-1]
> +                            submf = mdata
> +                            path = []
> +                            for sd in dirs:
> +                                submf = submf._dirs[sd + '/']
> +                                fullpath = posixpath.join(*(path + [sd]))
> +                                path.append(sd)
> +                                tmfclnodes = tmfnodes.setdefault(
> +                                    fullpath, {})
> +                                tmfclnodes.setdefault(submf.node(),
> clnode)
> +                                if clrevorder[clnode] <
> clrevorder[fclnode]:
> +                                    tmfclnodes[n] = clnode
>              return clnode
>
>          mfnodes = self.prune(ml, mfs, commonrevs)
> -        for x in self._packmanifests(mfnodes, lookupmflinknode):
> +        for x in self._packmanifests(
> +            mfnodes, tmfnodes, lookupmflinknode):
>              yield x
>
>          mfs.clear()
> @@ -784,9 +855,38 @@ class cg2packer(cg1packer):
>      def builddeltaheader(self, node, p1n, p2n, basenode, linknode):
>          return struct.pack(self.deltaheader, node, p1n, p2n, basenode,
> linknode)
>
> +class cg3packer(cg2packer):
> +    version = '03'
> +    deltaheader = _CHANGEGROUPV2_DELTA_HEADER
> +
> +    def _packmanifests(self, mfnodes, tmfnodes, lookuplinknode):
> +        yield chunkheader(1)
> +        if 'treemanifest' not in self._repo.requirements:
> +            yield 'f'
> +            assert not tmfnodes
> +            for x in super(cg3packer, self)._packmanifests(
> +                    mfnodes, {}, lookuplinknode):
> +                yield x
> +            return
> +        yield 't'
> +        for x in super(cg3packer, self)._packmanifests(
> +            mfnodes, {}, lookuplinknode):
> +            yield x
> +        dirlog = self._repo.manifest.dirlog
> +        for name, nodes in tmfnodes.iteritems():
> +            # TODO consider defining a separate method for this
> +            yield self.fileheader(name)
> +            dl = dirlog(name)
> +            for chunk in self.group(nodes, dl, nodes.get):
> +                yield chunk
> +        yield self.close()
> +
> +
>  packermap = {'01': (cg1packer, cg1unpacker),
>               # cg2 adds support for exchanging generaldelta
>               '02': (cg2packer, cg2unpacker),
> +             # cg3 adds support for exchanging treemanifests
> +             '03': (cg3packer, cg3unpacker),
>  }
>
>  def _changegroupinfo(repo, nodes, source):
> diff --git a/tests/test-treemanifest.t b/tests/test-treemanifest.t
> --- a/tests/test-treemanifest.t
> +++ b/tests/test-treemanifest.t
> @@ -384,3 +384,45 @@ within that.
>    $ mv oldmf2 .hg/store/meta/b/foo
>    $ mv oldmf3 .hg/store/meta/b/bar/orange
>
> +Test cloning a treemanifest repo over http.
> +  $ hg serve -p $HGPORT -d --pid-file=hg.pid --errorlog=errors.log
> +  $ cat hg.pid >> $DAEMON_PIDS
> +  $ cd ..
> +  $ hg clone http://localhost:$HGPORT deepclone
> +  requesting all changes
> +  adding changesets
> +  adding manifests
> +  adding file changes
> +  added 1 changesets with 8 changes to 8 files
> +  updating to branch default
> +  8 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +No server errors.
> +  $ cat deeprepo/errors.log
> +Tree manifest revlogs exist.
> +  $ find deepclone/.hg/store/meta | sort
> +  deepclone/.hg/store/meta
> +  deepclone/.hg/store/meta/a
> +  deepclone/.hg/store/meta/a/00manifest.i
> +  deepclone/.hg/store/meta/b
> +  deepclone/.hg/store/meta/b/00manifest.i
> +  deepclone/.hg/store/meta/b/bar
> +  deepclone/.hg/store/meta/b/bar/00manifest.i
> +  deepclone/.hg/store/meta/b/bar/orange
> +  deepclone/.hg/store/meta/b/bar/orange/00manifest.i
> +  deepclone/.hg/store/meta/b/bar/orange/fly
> +  deepclone/.hg/store/meta/b/bar/orange/fly/00manifest.i
> +  deepclone/.hg/store/meta/b/foo
> +  deepclone/.hg/store/meta/b/foo/00manifest.i
> +  deepclone/.hg/store/meta/b/foo/apple
> +  deepclone/.hg/store/meta/b/foo/apple/00manifest.i
> +  deepclone/.hg/store/meta/b/foo/apple/bees
> +  deepclone/.hg/store/meta/b/foo/apple/bees/00manifest.i
> +Verify passes.
> +  $ cd deepclone
> +  $ hg verify
> +  checking changesets
> +  checking manifests
> +  crosschecking files in changesets and manifests
> +  checking files
> +  8 files, 1 changesets, 8 total revisions
> +  $ cd ..
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
>
Pierre-Yves David - Oct. 15, 2015, 6:44 p.m.
On 10/15/2015 07:07 PM, Gregory Szorc wrote:
> On Thu, Oct 15, 2015 at 7:57 AM, Augie Fackler <raf@durin42.com
> <mailto:raf@durin42.com>> wrote:
>
>     # HG changeset patch
>     # User Augie Fackler <augie@google.com <mailto:augie@google.com>>
>     # Date 1444327710 14400
>     #      Thu Oct 08 14:08:30 2015 -0400
>     # Node ID 890de0a21cd7cfdceaef55564d6ac010ad27c32a
>     # Parent  83b738235f1bfc33a943a63a5d0f5527ac2da736
>     changegroup: introduce cg3, which has support for exchanging
>     treemanifests
>
>     Open questions:
>        1) Are we happy with just a 't' or 'f' for flagging type of manifest?
>
>
> This is a binary format. So we have 256 values to play with here. I
> think there is room for expansion. Only question would be whether you
> are using 't' and 'f' instead of say 0x01 and 0x02. But that's cosmetic.
>
>        2) How can we not have the grotesque requirements hack during
>     apply()?
>
>
> I have a similar issue for stream clones. Currently, the "perform a
> stream clone" code updates the requirements when data comes in. Even in
> my RFC stream clone bundle series, this isn't done at bundle application
> though: only as part of consuming a stream clone from the wire protocol.
> It does, however, verify the incoming requirements are supported.
>
> Both of us seem to be grappling with the problem that "unbundle" can
> occur both during clone and during an incremental "pull." It's almost
> like we need an "intent" to the unbundler that refuses to perform
> clone-time actions (like requirements adjusting) on post-clone unbundles.

It looks like it should be something on the pulloperation object.

Patch

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -8,6 +8,7 @@ 
 from __future__ import absolute_import
 
 import os
+import posixpath
 import struct
 import tempfile
 import weakref
@@ -496,6 +497,53 @@  class cg2unpacker(cg1unpacker):
         node, p1, p2, deltabase, cs = headertuple
         return node, p1, p2, deltabase, cs
 
+class cg3unpacker(cg2unpacker):
+    deltaheader = _CHANGEGROUPV2_DELTA_HEADER
+    deltaheadersize = struct.calcsize(deltaheader)
+    version = '03'
+
+    def __init__(self, *args, **kwargs):
+        super(cg3unpacker, self).__init__(*args, **kwargs)
+        self._trees = False
+
+    def manifestheader(self):
+        """Read the manifest header.
+
+        cg3 supports treemanifests as well as flatmanifests, so we
+        need to read the header for manifests and find out which we
+        should read.
+        """
+        flagsize = self._chunklength()
+        assert flagsize == 1, 'flag size should be 1'
+        flag = readexactly(self, 1)
+        if flag not in 'ft':
+            raise error.Abort(
+                'Unknown manifest format %r in cg3unpacker.' % flag)
+        return flag
+
+    def _unpackmanifests(self, repo, revmap, trp, prog, numchanges):
+        # TODO: include number of manifest nodes in header for better
+        # progress?
+        flag = self.manifestheader()
+        if flag == 't':
+            # TODO this is applying the requirements far too late :(
+            repo.requirements.add('treemanifest')
+            repo._applyopenerreqs()
+            repo._writerequirements()
+        self.callback = prog(_('manifests'), numchanges)
+        repo.manifest.addgroup(self, revmap, trp)
+        dirlog = repo.manifest.dirlog
+        if flag == 't':
+            while True:
+                # TODO consider not (ab)using fileheader for
+                # treemanifest headers
+                hdr = self.filelogheader()
+                if not hdr:
+                    break
+                dirname = hdr['filename']
+                dl = dirlog(dirname)
+                dl.addgroup(self, revmap, trp)
+
 class headerlessfixup(object):
     def __init__(self, fh, h):
         self._h = h
@@ -593,8 +641,9 @@  class cg1packer(object):
         rr, rl = revlog.rev, revlog.linkrev
         return [n for n in missing if rl(rr(n)) not in commonrevs]
 
-    def _packmanifests(self, mfnodes, lookuplinknode):
+    def _packmanifests(self, mfnodes, tmfnodes, lookuplinknode):
         """Pack flat manifests into a changegroup stream."""
+        assert not tmfnodes
         ml = self._repo.manifest
         size = 0
         for chunk in self.group(
@@ -611,6 +660,7 @@  class cg1packer(object):
 
         clrevorder = {}
         mfs = {} # needed manifests
+        tmfnodes = {}
         fnodes = {} # needed file nodes
         changedfiles = set()
 
@@ -648,6 +698,11 @@  class cg1packer(object):
         # simply take the slowpath, which already has the 'clrevorder' logic.
         # This was also fixed in cc0ff93d0c0c.
         fastpathlinkrev = fastpathlinkrev and not self._reorder
+        # Treemanifests don't work correctly with fastpathlinkrev
+        # either, because we don't discover which directory nodes to
+        # send along with files. This could probably be fixed.
+        fastpathlinkrev = fastpathlinkrev and (
+            'treemanifest' not in self._repo.requirements)
         # Callback for the manifest, used to collect linkrevs for filelog
         # revisions.
         # Returns the linkrev node (collected in lookupcl).
@@ -663,10 +718,26 @@  class cg1packer(object):
                         fclnode = fclnodes.setdefault(n, clnode)
                         if clrevorder[clnode] < clrevorder[fclnode]:
                             fclnodes[n] = clnode
+                        # gather list of changed treemanifest nodes
+                        if '/' in f and ('treemanifest' in
+                                         self._repo.requirements):
+                            dirs = f.split('/')[:-1]
+                            submf = mdata
+                            path = []
+                            for sd in dirs:
+                                submf = submf._dirs[sd + '/']
+                                fullpath = posixpath.join(*(path + [sd]))
+                                path.append(sd)
+                                tmfclnodes = tmfnodes.setdefault(
+                                    fullpath, {})
+                                tmfclnodes.setdefault(submf.node(), clnode)
+                                if clrevorder[clnode] < clrevorder[fclnode]:
+                                    tmfclnodes[n] = clnode
             return clnode
 
         mfnodes = self.prune(ml, mfs, commonrevs)
-        for x in self._packmanifests(mfnodes, lookupmflinknode):
+        for x in self._packmanifests(
+            mfnodes, tmfnodes, lookupmflinknode):
             yield x
 
         mfs.clear()
@@ -784,9 +855,38 @@  class cg2packer(cg1packer):
     def builddeltaheader(self, node, p1n, p2n, basenode, linknode):
         return struct.pack(self.deltaheader, node, p1n, p2n, basenode, linknode)
 
+class cg3packer(cg2packer):
+    version = '03'
+    deltaheader = _CHANGEGROUPV2_DELTA_HEADER
+
+    def _packmanifests(self, mfnodes, tmfnodes, lookuplinknode):
+        yield chunkheader(1)
+        if 'treemanifest' not in self._repo.requirements:
+            yield 'f'
+            assert not tmfnodes
+            for x in super(cg3packer, self)._packmanifests(
+                    mfnodes, {}, lookuplinknode):
+                yield x
+            return
+        yield 't'
+        for x in super(cg3packer, self)._packmanifests(
+            mfnodes, {}, lookuplinknode):
+            yield x
+        dirlog = self._repo.manifest.dirlog
+        for name, nodes in tmfnodes.iteritems():
+            # TODO consider defining a separate method for this
+            yield self.fileheader(name)
+            dl = dirlog(name)
+            for chunk in self.group(nodes, dl, nodes.get):
+                yield chunk
+        yield self.close()
+
+
 packermap = {'01': (cg1packer, cg1unpacker),
              # cg2 adds support for exchanging generaldelta
              '02': (cg2packer, cg2unpacker),
+             # cg3 adds support for exchanging treemanifests
+             '03': (cg3packer, cg3unpacker),
 }
 
 def _changegroupinfo(repo, nodes, source):
diff --git a/tests/test-treemanifest.t b/tests/test-treemanifest.t
--- a/tests/test-treemanifest.t
+++ b/tests/test-treemanifest.t
@@ -384,3 +384,45 @@  within that.
   $ mv oldmf2 .hg/store/meta/b/foo
   $ mv oldmf3 .hg/store/meta/b/bar/orange
 
+Test cloning a treemanifest repo over http.
+  $ hg serve -p $HGPORT -d --pid-file=hg.pid --errorlog=errors.log
+  $ cat hg.pid >> $DAEMON_PIDS
+  $ cd ..
+  $ hg clone http://localhost:$HGPORT deepclone
+  requesting all changes
+  adding changesets
+  adding manifests
+  adding file changes
+  added 1 changesets with 8 changes to 8 files
+  updating to branch default
+  8 files updated, 0 files merged, 0 files removed, 0 files unresolved
+No server errors.
+  $ cat deeprepo/errors.log
+Tree manifest revlogs exist.
+  $ find deepclone/.hg/store/meta | sort
+  deepclone/.hg/store/meta
+  deepclone/.hg/store/meta/a
+  deepclone/.hg/store/meta/a/00manifest.i
+  deepclone/.hg/store/meta/b
+  deepclone/.hg/store/meta/b/00manifest.i
+  deepclone/.hg/store/meta/b/bar
+  deepclone/.hg/store/meta/b/bar/00manifest.i
+  deepclone/.hg/store/meta/b/bar/orange
+  deepclone/.hg/store/meta/b/bar/orange/00manifest.i
+  deepclone/.hg/store/meta/b/bar/orange/fly
+  deepclone/.hg/store/meta/b/bar/orange/fly/00manifest.i
+  deepclone/.hg/store/meta/b/foo
+  deepclone/.hg/store/meta/b/foo/00manifest.i
+  deepclone/.hg/store/meta/b/foo/apple
+  deepclone/.hg/store/meta/b/foo/apple/00manifest.i
+  deepclone/.hg/store/meta/b/foo/apple/bees
+  deepclone/.hg/store/meta/b/foo/apple/bees/00manifest.i
+Verify passes.
+  $ cd deepclone
+  $ hg verify
+  checking changesets
+  checking manifests
+  crosschecking files in changesets and manifests
+  checking files
+  8 files, 1 changesets, 8 total revisions
+  $ cd ..