Patchwork [2,of,4,V2] bundle2: support unbundling empty part

login
register
mail settings
Submitter Pierre-Yves David
Date March 28, 2014, 2:14 a.m.
Message ID <e7c977d822a8c7921510.1395972881@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/4096/
State Superseded
Commit 9a75d2559cff01ed68c30720a72f6faf7c65d4aa
Headers show

Comments

Pierre-Yves David - March 28, 2014, 2:14 a.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1395295443 25200
#      Wed Mar 19 23:04:03 2014 -0700
# Node ID e7c977d822a8c7921510c034e250ed52233fff41
# Parent  f5be6f95d76a9369b441da680cf2a3dd959da1f7
bundle2: support unbundling empty part

We augment the unbundler to make it able to unbundle the empty part we are now
able to bundle.
David Soria Parra - March 28, 2014, 6:10 p.m.
pierre-yves.david@ens-lyon.org writes:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1395295443 25200
> #      Wed Mar 19 23:04:03 2014 -0700
> # Node ID e7c977d822a8c7921510c034e250ed52233fff41
> # Parent  f5be6f95d76a9369b441da680cf2a3dd959da1f7
> bundle2: support unbundling empty part
>
> We augment the unbundler to make it able to unbundle the empty part we are now
> able to bundle.
>
> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
> --- a/mercurial/bundle2.py
> +++ b/mercurial/bundle2.py
> @@ -238,23 +238,44 @@ class unbundle20(object):
>              part = self._readpart()
>          self.ui.debug('end of bundle2 stream\n')
>  
>      def _readpart(self):
>          """return None when an end of stream markers is reach"""
> -        headersize = self._readexact(2)
> -        assert headersize == '\0\0'
> -        return None
> +
> +        headersize = self._unpack(_fpartheadersize)[0]
> +        self.ui.debug('part header size: %i\n' % headersize)
> +        if not headersize:
> +            return None
> +        headerblock = self._readexact(headersize)
> +        # some utility to help reading from the header bloc

bloc -> block

> +        def fromheader(size, _offset=[0]):
> +            """return the next <size> byte from the header"""
> +            offset = _offset[0] #small hack to make this mutable
> +            data = headerblock[offset:(offset + size)]
> +            _offset[0] = offset + size
> +            return data

It might be just me, but I think this is a bit hacky and hard to follow.
Can we either return the offset with data and just pass it around like

  return data, offset+size
  
and

  typesize, offset = fromheader(1)
  next, offset = fromheader(1, offset)

or use a variable defined just before fromheader? Maybe yoru approach is
a python standard and we should keep it, but i had to read twice to
follow what it's actually doing.

> +        typesize = _unpack(_fparttypesize, fromheader(1))[0]
> +        parttype = fromheader(typesize)
> +        self.ui.debug('part type: "%s"\n' % parttype)
> +        current = part(parttype)
> +        assert fromheader(2) == '\0\0' # no option for now
> +        self.ui.debug('part parameters: 0\n')
> +        assert self._readexact(4) == '\0\0\0\0' #empty payload
> +        self.ui.debug('payload chunk size: 0\n')
> +        return current
> +
Pierre-Yves David - March 28, 2014, 6:20 p.m.
On 03/28/2014 11:10 AM, David Soria Parra wrote:
> pierre-yves.david@ens-lyon.org writes:
>
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1395295443 25200
>> #      Wed Mar 19 23:04:03 2014 -0700
>> # Node ID e7c977d822a8c7921510c034e250ed52233fff41
>> # Parent  f5be6f95d76a9369b441da680cf2a3dd959da1f7
>> bundle2: support unbundling empty part
>>
>> We augment the unbundler to make it able to unbundle the empty part we are now
>> able to bundle.
>>
>> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
>> --- a/mercurial/bundle2.py
>> +++ b/mercurial/bundle2.py
>> @@ -238,23 +238,44 @@ class unbundle20(object):
>>               part = self._readpart()
>>           self.ui.debug('end of bundle2 stream\n')
>>
>>       def _readpart(self):
>>           """return None when an end of stream markers is reach"""
>> -        headersize = self._readexact(2)
>> -        assert headersize == '\0\0'
>> -        return None
>> +
>> +        headersize = self._unpack(_fpartheadersize)[0]
>> +        self.ui.debug('part header size: %i\n' % headersize)
>> +        if not headersize:
>> +            return None
>> +        headerblock = self._readexact(headersize)
>> +        # some utility to help reading from the header bloc
>
> bloc -> block
>
>> +        def fromheader(size, _offset=[0]):
>> +            """return the next <size> byte from the header"""
>> +            offset = _offset[0] #small hack to make this mutable
>> +            data = headerblock[offset:(offset + size)]
>> +            _offset[0] = offset + size
>> +            return data
>
> It might be just me, but I think this is a bit hacky and hard to follow.
> Can we either return the offset with data and just pass it around like
>
>    return data, offset+size
>
> and
>
>    typesize, offset = fromheader(1)
>    next, offset = fromheader(1, offset)
>
> or use a variable defined just before fromheader? Maybe yoru approach is
> a python standard and we should keep it, but i had to read twice to
> follow what it's actually doing.


In a prior version, _offset was defined right before fromheader. I 
decided to move it in the function parameter to make it clear is was 
used only here. (basically a static variable with for this unbundling.)

The result of from header is other directly fed to unpack so I'm not 
keen to use the first proposal (return data, newoffset).
Matt Mackall - March 28, 2014, 8:50 p.m.
On Fri, 2014-03-28 at 11:20 -0700, Pierre-Yves David wrote:
> 
> On 03/28/2014 11:10 AM, David Soria Parra wrote:
> > pierre-yves.david@ens-lyon.org writes:
> >
> >> # HG changeset patch
> >> # User Pierre-Yves David <pierre-yves.david@fb.com>
> >> # Date 1395295443 25200
> >> #      Wed Mar 19 23:04:03 2014 -0700
> >> # Node ID e7c977d822a8c7921510c034e250ed52233fff41
> >> # Parent  f5be6f95d76a9369b441da680cf2a3dd959da1f7
> >> bundle2: support unbundling empty part
> >>
> >> We augment the unbundler to make it able to unbundle the empty part we are now
> >> able to bundle.
> >>
> >> diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
> >> --- a/mercurial/bundle2.py
> >> +++ b/mercurial/bundle2.py
> >> @@ -238,23 +238,44 @@ class unbundle20(object):
> >>               part = self._readpart()
> >>           self.ui.debug('end of bundle2 stream\n')
> >>
> >>       def _readpart(self):
> >>           """return None when an end of stream markers is reach"""
> >> -        headersize = self._readexact(2)
> >> -        assert headersize == '\0\0'
> >> -        return None
> >> +
> >> +        headersize = self._unpack(_fpartheadersize)[0]
> >> +        self.ui.debug('part header size: %i\n' % headersize)
> >> +        if not headersize:
> >> +            return None
> >> +        headerblock = self._readexact(headersize)
> >> +        # some utility to help reading from the header bloc
> >
> > bloc -> block
> >
> >> +        def fromheader(size, _offset=[0]):
> >> +            """return the next <size> byte from the header"""
> >> +            offset = _offset[0] #small hack to make this mutable
> >> +            data = headerblock[offset:(offset + size)]
> >> +            _offset[0] = offset + size
> >> +            return data
> >
> > It might be just me, but I think this is a bit hacky and hard to follow.
> > Can we either return the offset with data and just pass it around like
> >
> >    return data, offset+size
> >
> > and
> >
> >    typesize, offset = fromheader(1)
> >    next, offset = fromheader(1, offset)
> >
> > or use a variable defined just before fromheader? Maybe yoru approach is
> > a python standard and we should keep it, but i had to read twice to
> > follow what it's actually doing.
> 
> 
> In a prior version, _offset was defined right before fromheader. I 
> decided to move it in the function parameter to make it clear is was 
> used only here. (basically a static variable with for this unbundling.)

Using "make it clear" to describe whatever's happening here is...
entertaining.

I've queued patch 1. I've queued patch one, thanks.

Patch

diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
--- a/mercurial/bundle2.py
+++ b/mercurial/bundle2.py
@@ -238,23 +238,44 @@  class unbundle20(object):
             part = self._readpart()
         self.ui.debug('end of bundle2 stream\n')
 
     def _readpart(self):
         """return None when an end of stream markers is reach"""
-        headersize = self._readexact(2)
-        assert headersize == '\0\0'
-        return None
+
+        headersize = self._unpack(_fpartheadersize)[0]
+        self.ui.debug('part header size: %i\n' % headersize)
+        if not headersize:
+            return None
+        headerblock = self._readexact(headersize)
+        # some utility to help reading from the header bloc
+        def fromheader(size, _offset=[0]):
+            """return the next <size> byte from the header"""
+            offset = _offset[0] #small hack to make this mutable
+            data = headerblock[offset:(offset + size)]
+            _offset[0] = offset + size
+            return data
+        typesize = _unpack(_fparttypesize, fromheader(1))[0]
+        parttype = fromheader(typesize)
+        self.ui.debug('part type: "%s"\n' % parttype)
+        current = part(parttype)
+        assert fromheader(2) == '\0\0' # no option for now
+        self.ui.debug('part parameters: 0\n')
+        assert self._readexact(4) == '\0\0\0\0' #empty payload
+        self.ui.debug('payload chunk size: 0\n')
+        return current
+
 
 class part(object):
     """A bundle2 part contains application level payload
 
     The part `type` is used to route the part to the application level
     handler.
     """
 
-    def __init__(self, parttype):
+    def __init__(self, parttype, data=''):
         self.type = parttype
+        self.data = data
 
     def getchunks(self):
         ### header
         header = [_pack(_fparttypesize, len(self.type)),
                   self.type,
diff --git a/tests/test-bundle2.t b/tests/test-bundle2.t
--- a/tests/test-bundle2.t
+++ b/tests/test-bundle2.t
@@ -58,10 +58,12 @@  Create an extension to test bundle2 API
   >         value = params[key]
   >         if value is not None:
   >             ui.write('    %s\n' % value)
   >     parts = list(unbundler)
   >     ui.write('parts count:   %i\n' % len(parts))
+  >     for p in parts:
+  >         ui.write('  :%s:\n' % p.type)
   > EOF
   $ cat >> $HGRCPATH << EOF
   > [extensions]
   > bundle2=$TESTTMP/bundle2.py
   > EOF
@@ -204,10 +206,11 @@  unbundling debug
   options count: 2
   - e|! 7/
       babar%#==tutu
   - simple
   start extraction of bundle2 parts
+  part header size: 0
   end of bundle2 stream
   parts count:   0
 
 
 Test buggy input
@@ -241,5 +244,29 @@  Test part
   HG20\x00\x00\x00\r (esc)
   test:empty\x00\x00\x00\x00\x00\x00\x00\r (esc)
   test:empty\x00\x00\x00\x00\x00\x00\x00\x00 (no-eol) (esc)
 
 
+  $ hg unbundle2 < ../parts.hg2
+  options count: 0
+  parts count:   2
+    :test:empty:
+    :test:empty:
+
+  $ hg unbundle2 --debug < ../parts.hg2
+  start processing of HG20 stream
+  reading bundle2 stream parameters
+  options count: 0
+  start extraction of bundle2 parts
+  part header size: 13
+  part type: "test:empty"
+  part parameters: 0
+  payload chunk size: 0
+  part header size: 13
+  part type: "test:empty"
+  part parameters: 0
+  payload chunk size: 0
+  part header size: 0
+  end of bundle2 stream
+  parts count:   2
+    :test:empty:
+    :test:empty: