Patchwork [1,of,5,V2] bundle2: very first version of a bundle2 bundler

login
register
mail settings
Submitter Pierre-Yves David
Date March 20, 2014, 6:39 p.m.
Message ID <14623bafac62835e44ee.1395340782@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/4005/
State Accepted
Commit 9c5183cb9bca46c95b0e8cd94e99cb168df0a057
Headers show

Comments

Pierre-Yves David - March 20, 2014, 6:39 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1395176450 25200
#      Tue Mar 18 14:00:50 2014 -0700
# Node ID 14623bafac62835e44ee4dee806978e1fd50540b
# Parent  29b159bd71bc68e9868c5d2d748ab166dc7a5287
bundle2: very first version of a bundle2 bundler

This changeset is the very first of a long series. It create a new bundle2
module and add a simple class that generate and empty bundle2 container.

The module is documented with the current state of the implementation. For
information about the final goal you may want to consult the mercurial wiki
page:

    http://mercurial.selenic.com/wiki/BundleFormat2

The documentation of the module will be updated with later patches adding more and
more feature to the format.

This patches also introduce a test case. This test case build and use its own
small extension that use the new bundle2 module. Since the new format is unable
to do anything right now, we could not use real mercurial code to test it.
Moreover, some advanced feature of the bundle2 spec will not be used by core
mercurial at all. So we need to have them tested.
Matt Mackall - March 20, 2014, 10:08 p.m.
On Thu, 2014-03-20 at 11:39 -0700, pierre-yves.david@ens-lyon.org wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1395176450 25200
> #      Tue Mar 18 14:00:50 2014 -0700
> # Node ID 14623bafac62835e44ee4dee806978e1fd50540b
> # Parent  29b159bd71bc68e9868c5d2d748ab166dc7a5287
> bundle2: very first version of a bundle2 bundler

The big question I have from looking at this is how does it fit into the
existing framework? There is basically no similarity between these
classes and the existing bundle10/unbundle10 classes (which really
strongly suggest a naming scheme you ought to be using for your new
classes, no?) that I would expect them to eventually be
duck-type-compatible with.

At the end of the day, we need to reach a state where most code in the
push/pull path doesn't need to know or care if it's working with a
bundle10 or later object.
Pierre-Yves David - March 20, 2014, 11:01 p.m.
On 03/20/2014 03:08 PM, Matt Mackall wrote:
> On Thu, 2014-03-20 at 11:39 -0700, pierre-yves.david@ens-lyon.org wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1395176450 25200
>> #      Tue Mar 18 14:00:50 2014 -0700
>> # Node ID 14623bafac62835e44ee4dee806978e1fd50540b
>> # Parent  29b159bd71bc68e9868c5d2d748ab166dc7a5287
>> bundle2: very first version of a bundle2 bundler
>
> The big question I have from looking at this is how does it fit into the
> existing framework? There is basically no similarity between these
> classes and the existing bundle10/unbundle10 classes (which really
> strongly suggest a naming scheme you ought to be using for your new
> classes, no?) that I would expect them to eventually be
> duck-type-compatible with.

(Naming hint taken)

They are few similarity between the two, because they differ a lot in 
they usage, bundle10 focus on exchanging changesets, where bundle20 will 
be used by much more different actors on a much more different live spawn.

The bundle2 class may eventually grow bundle10 compatible method to 
automatically create parts containing the piece of changeset we need. 
That would be nice of him.

However those bundle10 compatibility method will be no use for the 
generic user of bundle2 (phases, bookmarks, obsmarkers). So current my 
focus is to build a minimalistic API that demonstrate the core of 
bundle2 feature: providing an extensible format containing parts. (and 
after that, generic infrastructure to process it)


> At the end of the day, we need to reach a state where most code in the
> push/pull path doesn't need to know or care if it's working with a
> bundle10 or later object.

Keep in mind that the part of push that does not deal with changeset is 
growing.

Current push path is:

A.1. Build changeset bundle
A.2. push that bundle through wire protocol
A.3  Analyse successfulnes
B.1. figure out phase change
B.2. push them with pushkey
B.3  Analyse successfulnes
C.1. pick obsmarker to send
C.2. push them with pushkey//evolve's stuff
D.3  Analyse successfulnes
D.1  figure out bookmark to move
D.2  move then with pushkey
D.3  Analyse successfulnes


This -cannot- be seamlessly switched to bundle2 with magic duck typing.

The introduction of bundle2 will change  toward this:
(disclaimer: this will not be the actual end result)

A.1. Figure what changeset to push
A.2. stick them into bundle
B.1. figure out phase change
B.2. try to stick them into bundle
C.1. pick marker to send
C.2. try to stick obsmarker into bundle
D.1. figure out bookmark to move
D.2. try to stick them into bundle
==== sending bundle2 over the wire
A.3. analyze changeset pushing success
B.3. analyze phase pushing success
C.3. analyse obsmaker pushing success
D.3. analyse bookmark move success

B.4 old style phase push if bundle2 did not supported them
C.4 old style obsmarker push if bundle2 did not supported them
D.4 old style bookmark move if bundle2 did not supported them

So having the ducktyping in bundle2 will help for the changeset but not 
for everything else.

In the exemple above we'll be helped by the fact all of phases. 
obsmarkers and bookmark use pushkey (for now). But you can then add 
extension like large file in mix and get the same kind of result.


My current plan is:

1. produce/consume a bundle2 container that contains parts
2. build the infrastructure to apply effect of part
3. make it available through wire protocol
4. use it in the push code
5. use it in the pull code
6. teach `hg bundle about it`

Each step should see significant improvement to the API.

The bundle10 ducktyping will most probably happen during step "(4) use 
it in the push code".

This step is unfortunately still a few week away
Matt Mackall - March 21, 2014, 7:33 p.m.
On Thu, 2014-03-20 at 16:01 -0700, Pierre-Yves David wrote:
> 
> On 03/20/2014 03:08 PM, Matt Mackall wrote:
> > On Thu, 2014-03-20 at 11:39 -0700, pierre-yves.david@ens-lyon.org wrote:
> >> # HG changeset patch
> >> # User Pierre-Yves David <pierre-yves.david@fb.com>
> >> # Date 1395176450 25200
> >> #      Tue Mar 18 14:00:50 2014 -0700
> >> # Node ID 14623bafac62835e44ee4dee806978e1fd50540b
> >> # Parent  29b159bd71bc68e9868c5d2d748ab166dc7a5287
> >> bundle2: very first version of a bundle2 bundler
> >
> > The big question I have from looking at this is how does it fit into the
> > existing framework? There is basically no similarity between these
> > classes and the existing bundle10/unbundle10 classes (which really
> > strongly suggest a naming scheme you ought to be using for your new
> > classes, no?) that I would expect them to eventually be
> > duck-type-compatible with.
> 
> (Naming hint taken)
> 
> They are few similarity between the two, because they differ a lot in 
> they usage, bundle10 focus on exchanging changesets, where bundle20 will 
> be used by much more different actors on a much more different live spawn.
> 
> The bundle2 class may eventually grow bundle10 compatible method to 
> automatically create parts containing the piece of changeset we need. 
> That would be nice of him.

In my view, this is definitely 'the hard part', and there's been already
been substantial work put into factoring out bundle10 precisely to make
this even possible, which is why I'm drawing it to your attention.

Patch

diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
new file mode 100644
--- /dev/null
+++ b/mercurial/bundle2.py
@@ -0,0 +1,84 @@ 
+# bundle2.py - generic container format to transmit arbitrary data.
+#
+# Copyright 2013 Facebook, Inc.
+#
+# This software may be used and distributed according to the terms of the
+# GNU General Public License version 2 or any later version.
+"""Handling of the new bundle2 format
+
+The goal of bundle2 is to act as an atomically packet to transmit a set of
+payloads in an application agnostic way. It consist in a sequence of "parts"
+that will be handed to and processed by the application layer.
+
+
+General format architecture
+===========================
+
+The format is architectured as follow
+
+ - magic string
+ - stream level parameters
+ - payload parts (any number)
+ - end of stream marker.
+
+The current implementation is limited to empty bundle.
+
+Details on the Binary format
+============================
+
+All numbers are unsigned and big endian.
+
+stream level parameters
+------------------------
+
+Binary format is as follow
+
+:params size: (16 bits integer)
+
+  The total number of Bytes used by the parameters
+
+  Currently force to 0.
+
+:params value: arbitrary number of Bytes
+
+  A blob of `params size` containing the serialized version of all stream level
+  parameters.
+
+  Currently always empty.
+
+
+Payload part
+------------------------
+
+Binary format is as follow
+
+:header size: (16 bits inter)
+
+  The total number of Bytes used by the part headers. When the header is empty
+  (size = 0) this is interpreted as the end of stream marker.
+
+  Currently forced to 0 in the current state of the implementation
+"""
+
+_magicstring = 'HG20'
+
+class bundler(object):
+    """represent an outgoing bundle2 container
+
+    People will eventually be able to add param and parts to this object and
+    generated a stream from it."""
+
+    def __init__(self):
+        self._params = []
+        self._parts = []
+
+    def getchunks(self):
+        yield _magicstring
+        # no support for any param yet
+        # to be obviously fixed soon.
+        assert not self._params
+        yield '\0\0'
+        # no support for parts
+        # to be obviously fixed soon.
+        assert not self._parts
+        yield '\0\0'
diff --git a/tests/test-bundle2.t b/tests/test-bundle2.t
new file mode 100644
--- /dev/null
+++ b/tests/test-bundle2.t
@@ -0,0 +1,36 @@ 
+
+Create an extension to test bundle2 API
+
+  $ cat > bundle2.py << EOF
+  > """A small extension to test bundle2 implementation
+  > 
+  > Current bundle2 implementation is far too limited to be used in any core
+  > code. We still need to be able to test it while it grow up.
+  > """
+  > 
+  > from mercurial import cmdutil
+  > from mercurial import bundle2
+  > cmdtable = {}
+  > command = cmdutil.command(cmdtable)
+  > 
+  > @command('bundle2', [], '')
+  > def cmdbundle2(ui, repo):
+  >     """write a bundle2 container on standard ouput"""
+  >     bundler = bundle2.bundler()
+  >     for chunk in bundler.getchunks():
+  >         ui.write(chunk)
+  > EOF
+  $ cat >> $HGRCPATH << EOF
+  > [extensions]
+  > bundle2=$TESTTMP/bundle2.py
+  > EOF
+
+The extension requires a repo (currently unused)
+
+  $ hg init main
+  $ cd main
+
+Test simple generation of empty bundle
+
+  $ hg bundle2
+  HG20\x00\x00\x00\x00 (no-eol) (esc)