Patchwork [1,of,6] changegroup: use a different compression key for BZ in HG10

login
register
mail settings
Submitter Pierre-Yves David
Date Sept. 24, 2015, 1:21 a.m.
Message ID <0e8207361f5a0eb27fd9.1443057680@marginatus.alto.octopoid.net>
Download mbox | patch
Permalink /patch/10603/
State Superseded
Headers show

Comments

Pierre-Yves David - Sept. 24, 2015, 1:21 a.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@fb.com>
# Date 1443033210 25200
#      Wed Sep 23 11:33:30 2015 -0700
# Node ID 0e8207361f5a0eb27fd92288c34c5bb3d1d3eb53
# Parent  f946c1260035f96aa30052c28e6c68c559677059
changegroup: use a different compression key for BZ in HG10

For "space saving", bundle1 "strip" the first two bytes of the BZ stream since
they always are 'BZ'. So the current code boostrap the uncompressor with 'BZ'.
This hack is impractical in more generic case so we move it in a dedicated
"decompression".
Gregory Szorc - Sept. 24, 2015, 2:20 a.m.
On Wed, Sep 23, 2015 at 9:21 PM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1443033210 25200
> #      Wed Sep 23 11:33:30 2015 -0700
> # Node ID 0e8207361f5a0eb27fd92288c34c5bb3d1d3eb53
> # Parent  f946c1260035f96aa30052c28e6c68c559677059
> changegroup: use a different compression key for BZ in HG10
>
>
Related to this series, at some point I'd like to formalize the `hg clone
--uncompressed` "bundle" format as something that can be saved to a file
and handled by `hg unbundle` as well as `hg clone`. As the work I've done
at Mozilla has shown, "stream bundles," while larger, are insanely fast to
apply (they are basically tar archives of revlogs), making them ideal for
fast network environments such as datacenters. They are the different
between 5 and 1 minute clones. Also, I think moving "stream bundles" to the
"getbundle" wire protocol command (instead of maintaining a one-off) makes
some sense.

I'm not sure how this should be done. New magic value in the bundle header?
Separate bundle2 part?

I just wanted to let you know about this in case it impacts your work here.
Pierre-Yves David - Sept. 24, 2015, 2:44 a.m.
On 09/23/2015 07:20 PM, Gregory Szorc wrote:
> On Wed, Sep 23, 2015 at 9:21 PM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>     # HG changeset patch
>     # User Pierre-Yves David <pierre-yves.david@fb.com
>     <mailto:pierre-yves.david@fb.com>>
>     # Date 1443033210 25200
>     #      Wed Sep 23 11:33:30 2015 -0700
>     # Node ID 0e8207361f5a0eb27fd92288c34c5bb3d1d3eb53
>     # Parent  f946c1260035f96aa30052c28e6c68c559677059
>     changegroup: use a different compression key for BZ in HG10
>
>
> Related to this series, at some point I'd like to formalize the `hg
> clone --uncompressed` "bundle" format as something that can be saved to
> a file and handled by `hg unbundle` as well as `hg clone`. As the work
> I've done at Mozilla has shown, "stream bundles," while larger, are
> insanely fast to apply (they are basically tar archives of revlogs),
> making them ideal for fast network environments such as datacenters.
> They are the different between 5 and 1 minute clones. Also, I think
> moving "stream bundles" to the "getbundle" wire protocol command
> (instead of maintaining a one-off) makes some sense.

Sound like a great plan:

> I'm not sure how this should be done. New magic value in the bundle
> header? Separate bundle2 part?

Specific bundle part, refuses to apply if the destination repository is 
not empty?

> I just wanted to let you know about this in case it impacts your work here.

Not really, as far as I understand…
Simon King - Sept. 24, 2015, 8:43 a.m.
On Thu, Sep 24, 2015 at 2:21 AM, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@fb.com>
> # Date 1443033210 25200
> #      Wed Sep 23 11:33:30 2015 -0700
> # Node ID 0e8207361f5a0eb27fd92288c34c5bb3d1d3eb53
> # Parent  f946c1260035f96aa30052c28e6c68c559677059
> changegroup: use a different compression key for BZ in HG10
>
> For "space saving", bundle1 "strip" the first two bytes of the BZ stream since
> they always are 'BZ'. So the current code boostrap the uncompressor with 'BZ'.
> This hack is impractical in more generic case so we move it in a dedicated
> "decompression".
>
> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
> --- a/mercurial/changegroup.py
> +++ b/mercurial/changegroup.py
> @@ -161,10 +161,12 @@ class cg1unpacker(object):
>          if alg == 'UN':
>              alg = None # get more modern without breaking too much
>          if not alg in util.decompressors:
>              raise util.Abort(_('unknown stream compression type: %s')
>                               % alg)
> +        if alg == 'BZ':
> +            alg = '_truncatedBZ'
>          self._stream = util.decompressors[alg](fh)
>          self._type = alg
>          self.callback = None
>      def compressed(self):
>          return self._type is not None
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -2373,10 +2373,12 @@ def _bz2():
>      d.decompress('BZ')
>      return d
>
>  decompressors = {None: lambda fh: fh,
>                   'BZ': _makedecompressor(_bz2),
> +                 '_truncatedBZ': _makedecompressor(_bz2),
> +                 'BZ': _makedecompressor(lambda: bz2.BZ2Decompressor()),
>                   'GZ': _makedecompressor(lambda: zlib.decompressobj()),
>                   }

This looks wrong - the 'BZ' key appears twice. Should the first one
have been removed?

Simon
Pierre-Yves David - Sept. 24, 2015, 1:05 p.m.
On 09/24/2015 01:43 AM, Simon King wrote:
> On Thu, Sep 24, 2015 at 2:21 AM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>> # HG changeset patch
>> # User Pierre-Yves David <pierre-yves.david@fb.com>
>> # Date 1443033210 25200
>> #      Wed Sep 23 11:33:30 2015 -0700
>> # Node ID 0e8207361f5a0eb27fd92288c34c5bb3d1d3eb53
>> # Parent  f946c1260035f96aa30052c28e6c68c559677059
>> changegroup: use a different compression key for BZ in HG10
>>
>> For "space saving", bundle1 "strip" the first two bytes of the BZ stream since
>> they always are 'BZ'. So the current code boostrap the uncompressor with 'BZ'.
>> This hack is impractical in more generic case so we move it in a dedicated
>> "decompression".
>>
>> diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
>> --- a/mercurial/changegroup.py
>> +++ b/mercurial/changegroup.py
>> @@ -161,10 +161,12 @@ class cg1unpacker(object):
>>           if alg == 'UN':
>>               alg = None # get more modern without breaking too much
>>           if not alg in util.decompressors:
>>               raise util.Abort(_('unknown stream compression type: %s')
>>                                % alg)
>> +        if alg == 'BZ':
>> +            alg = '_truncatedBZ'
>>           self._stream = util.decompressors[alg](fh)
>>           self._type = alg
>>           self.callback = None
>>       def compressed(self):
>>           return self._type is not None
>> diff --git a/mercurial/util.py b/mercurial/util.py
>> --- a/mercurial/util.py
>> +++ b/mercurial/util.py
>> @@ -2373,10 +2373,12 @@ def _bz2():
>>       d.decompress('BZ')
>>       return d
>>
>>   decompressors = {None: lambda fh: fh,
>>                    'BZ': _makedecompressor(_bz2),
>> +                 '_truncatedBZ': _makedecompressor(_bz2),
>> +                 'BZ': _makedecompressor(lambda: bz2.BZ2Decompressor()),
>>                    'GZ': _makedecompressor(lambda: zlib.decompressobj()),
>>                    }
>
> This looks wrong - the 'BZ' key appears twice. Should the first one
> have been removed?

Good catch, it should be removed.

Patch

diff --git a/mercurial/changegroup.py b/mercurial/changegroup.py
--- a/mercurial/changegroup.py
+++ b/mercurial/changegroup.py
@@ -161,10 +161,12 @@  class cg1unpacker(object):
         if alg == 'UN':
             alg = None # get more modern without breaking too much
         if not alg in util.decompressors:
             raise util.Abort(_('unknown stream compression type: %s')
                              % alg)
+        if alg == 'BZ':
+            alg = '_truncatedBZ'
         self._stream = util.decompressors[alg](fh)
         self._type = alg
         self.callback = None
     def compressed(self):
         return self._type is not None
diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -2373,10 +2373,12 @@  def _bz2():
     d.decompress('BZ')
     return d
 
 decompressors = {None: lambda fh: fh,
                  'BZ': _makedecompressor(_bz2),
+                 '_truncatedBZ': _makedecompressor(_bz2),
+                 'BZ': _makedecompressor(lambda: bz2.BZ2Decompressor()),
                  'GZ': _makedecompressor(lambda: zlib.decompressobj()),
                  }
 # also support the old form by courtesies
 decompressors['UN'] = decompressors[None]