Patchwork [1,of,2,"V3] compression: introduce an official `zstd-revlog` requirement

login
register
mail settings
Submitter Pierre-Yves David
Date April 9, 2019, 10:06 a.m.
Message ID <8ce788576265eb8b84cb.1554804360@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/39546/
State New
Headers show

Comments

Pierre-Yves David - April 9, 2019, 10:06 a.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1553707623 -3600
#      Wed Mar 27 18:27:03 2019 +0100
# Node ID 8ce788576265eb8b84cb697ea2e09b2246fb7b81
# Parent  456c37433c433ee59d321315da24eb944e495f08
# EXP-Topic zstd-revlog
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 8ce788576265
compression: introduce an official `zstd-revlog` requirement

This requirement supersedes `exp-compression-zstd`. However, we keep support for
the old requirement.

Strictly speaking, we do not need to add a new requirement, there are no logic
change making "new" repo incompatible with mercurial client using a version
that support `exp-compression-zstd`.

The choice to introduce a new requirement is motivated by the following:

* The previous requirement was explicitly "experimental". Using it by default
  could confuse users.

* adding support for a hypothetical third compression engine will requires new
  code, and will comes with its own requirement tool.

* We won't use it as the default for a while since I do not think we support
  zstd on all platform. I can imagine we'll gain another (unrelated but on my
  default) requirement by the time we turn this zstd by default.
Augie Fackler - April 15, 2019, 2:34 p.m.
Greg: ping on these? I'm aware you had some unease on this topic, and
I'd like to figure out if we should ship this. Thanks!

On Tue, Apr 09, 2019 at 12:06:00PM +0200, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1553707623 -3600
> #      Wed Mar 27 18:27:03 2019 +0100
> # Node ID 8ce788576265eb8b84cb697ea2e09b2246fb7b81
> # Parent  456c37433c433ee59d321315da24eb944e495f08
> # EXP-Topic zstd-revlog
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 8ce788576265
> compression: introduce an official `zstd-revlog` requirement
>
> This requirement supersedes `exp-compression-zstd`. However, we keep support for
> the old requirement.
>
> Strictly speaking, we do not need to add a new requirement, there are no logic
> change making "new" repo incompatible with mercurial client using a version
> that support `exp-compression-zstd`.
>
> The choice to introduce a new requirement is motivated by the following:
>
> * The previous requirement was explicitly "experimental". Using it by default
>   could confuse users.
>
> * adding support for a hypothetical third compression engine will requires new
>   code, and will comes with its own requirement tool.
>
> * We won't use it as the default for a while since I do not think we support
>   zstd on all platform. I can imagine we'll gain another (unrelated but on my
>   default) requirement by the time we turn this zstd by default.
>
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -645,6 +645,8 @@ def gathersupportedrequirements(ui):
>          engine = util.compengines[name]
>          if engine.revlogheader():
>              supported.add(b'exp-compression-%s' % name)
> +            if engine.name() == 'zstd':
> +                supported.add(b'revlog-compression-zstd')
>
>      return supported
>
> @@ -794,8 +796,13 @@ def resolverevlogstorevfsoptions(ui, req
>          options[b'maxchainlen'] = maxchainlen
>
>      for r in requirements:
> -        if r.startswith(b'exp-compression-'):
> -            options[b'compengine'] = r[len(b'exp-compression-'):]
> +        # we allow multiple compression engine requirement to co-exist because
> +        # strickly speaking, revlog seems to support mixed compression style.
> +        #
> +        # The compression used for new entries will be "the last one"
> +        prefix = r.startswith
> +        if prefix('revlog-compression-') or prefix('exp-compression-'):
> +            options[b'compengine'] = r.split('-', 2)[2]
>
>      options[b'zlib.level'] = ui.configint(b'storage', b'revlog.zlib.level')
>      if options[b'zlib.level'] is not None:
> @@ -2929,7 +2936,9 @@ def newreporequirements(ui, createopts):
>                                   'compression engines'))
>
>      # zlib is the historical default and doesn't need an explicit requirement.
> -    if compengine != 'zlib':
> +    elif compengine == 'zstd':
> +        requirements.add('revlog-compression-zstd')
> +    elif compengine != 'zlib':
>          requirements.add('exp-compression-%s' % compengine)
>
>      if scmutil.gdinitconfig(ui):
> diff --git a/mercurial/upgrade.py b/mercurial/upgrade.py
> --- a/mercurial/upgrade.py
> +++ b/mercurial/upgrade.py
> @@ -325,10 +325,16 @@ class compressionengine(formatvariant):
>
>      @classmethod
>      def fromrepo(cls, repo):
> +        # we allow multiple compression engine requirement to co-exist because
> +        # strickly speaking, revlog seems to support mixed compression style.
> +        #
> +        # The compression used for new entries will be "the last one"
> +        compression = 'zlib'
>          for req in repo.requirements:
> -            if req.startswith('exp-compression-'):
> -                return req.split('-', 2)[2]
> -        return 'zlib'
> +            prefix = req.startswith
> +            if prefix('revlog-compression-') or prefix('exp-compression-'):
> +                compression = req.split('-', 2)[2]
> +        return compression
>
>      @classmethod
>      def fromconfig(cls, repo):
> diff --git a/tests/test-repo-compengines.t b/tests/test-repo-compengines.t
> --- a/tests/test-repo-compengines.t
> +++ b/tests/test-repo-compengines.t
> @@ -44,9 +44,9 @@ A requirement specifying an unknown comp
>    $ cd zstd
>    $ cat .hg/requires
>    dotencode
> -  exp-compression-zstd
>    fncache
>    generaldelta
> +  revlog-compression-zstd
>    revlogv1
>    sparserevlog
>    store
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Gregory Szorc - April 16, 2019, 10:28 a.m.
On Mon, Apr 15, 2019 at 8:34 AM Augie Fackler <raf@durin42.com> wrote:

> Greg: ping on these? I'm aware you had some unease on this topic, and
> I'd like to figure out if we should ship this. Thanks!
>

I've been enjoying my holiday away from the computer. I met with
Pierre-Yves and Georges yesterday and agreed to review these before the
freeze...


>
> On Tue, Apr 09, 2019 at 12:06:00PM +0200, Pierre-Yves David wrote:
> > # HG changeset patch
> > # User Pierre-Yves David <pierre-yves.david@octobus.net>
> > # Date 1553707623 -3600
> > #      Wed Mar 27 18:27:03 2019 +0100
> > # Node ID 8ce788576265eb8b84cb697ea2e09b2246fb7b81
> > # Parent  456c37433c433ee59d321315da24eb944e495f08
> > # EXP-Topic zstd-revlog
> > # Available At https://bitbucket.org/octobus/mercurial-devel/
> > #              hg pull https://bitbucket.org/octobus/mercurial-devel/
> -r 8ce788576265
> > compression: introduce an official `zstd-revlog` requirement
> >
> > This requirement supersedes `exp-compression-zstd`. However, we keep
> support for
> > the old requirement.
> >
> > Strictly speaking, we do not need to add a new requirement, there are no
> logic
> > change making "new" repo incompatible with mercurial client using a
> version
> > that support `exp-compression-zstd`.
> >
> > The choice to introduce a new requirement is motivated by the following:
> >
> > * The previous requirement was explicitly "experimental". Using it by
> default
> >   could confuse users.
> >
> > * adding support for a hypothetical third compression engine will
> requires new
> >   code, and will comes with its own requirement tool.
> >
> > * We won't use it as the default for a while since I do not think we
> support
> >   zstd on all platform. I can imagine we'll gain another (unrelated but
> on my
> >   default) requirement by the time we turn this zstd by default.
>

So, the reason why I named things "compression-<format>" was so we could
have a per-compressor requirement that could potentially allow *anything*
to use that compressor. For example, "compression-zstd" could allow random
on-disk files to be compressed with zstd.

Of course, *any* time you make a BC change to how the repository is read
from the filesystem, you need a new requirement, otherwise old clients may
attempt to open the repo and get confused or cause corruption. So even if
we have an e.g. "compression-zstd" requirement, we would still need
additional requirements to denote when zstd support is added to individual
components. e.g. we'd need a dedicated requirement for "zstd in revlogs."

I think I'm OK with the approach in this patch.



> >
> > diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> > --- a/mercurial/localrepo.py
> > +++ b/mercurial/localrepo.py
> > @@ -645,6 +645,8 @@ def gathersupportedrequirements(ui):
> >          engine = util.compengines[name]
> >          if engine.revlogheader():
> >              supported.add(b'exp-compression-%s' % name)
> > +            if engine.name() == 'zstd':
> > +                supported.add(b'revlog-compression-zstd')
>

I'm still not a fan of special casing zstd: there should be something on
the compression engine API doing this. But I'm willing to look the other
way for now because doing it as part of the compression engine API means
stabilizing the "exp-compression-<format>" bit and I don't want to do that
just yet.

There is also an existing bug where we fail to check engine.available().
This will result in Mercurial thinking it can open zstd repositories and
then failing when it tries to use zstd when it isn't available.
(Compression engines are always present/registered but they may not be
available at run-time due to missing dependencies.)

Could you please send a patch (either standalone or a v4 of this one) that
adds a check for engine.available() and skips engines that aren't usable?


> >
> >      return supported
> >
> > @@ -794,8 +796,13 @@ def resolverevlogstorevfsoptions(ui, req
> >          options[b'maxchainlen'] = maxchainlen
> >
> >      for r in requirements:
> > -        if r.startswith(b'exp-compression-'):
> > -            options[b'compengine'] = r[len(b'exp-compression-'):]
> > +        # we allow multiple compression engine requirement to co-exist
> because
> > +        # strickly speaking, revlog seems to support mixed compression
> style.
> > +        #
> > +        # The compression used for new entries will be "the last one"
> > +        prefix = r.startswith
> > +        if prefix('revlog-compression-') or prefix('exp-compression-'):
> > +            options[b'compengine'] = r.split('-', 2)[2]
>

An existing problem with this code is that `requirements` is a set and
iteration order will be undefined. So if we have multiple compressors in
play, we could inconsistently compress with different compressors for
different transactions! We should have some way to sort the priorities. But
this is scope bloat and an existing issue, so I'm inclined to ignore it for
now. An inline comment would be useful, however.


> >
> >      options[b'zlib.level'] = ui.configint(b'storage',
> b'revlog.zlib.level')
> >      if options[b'zlib.level'] is not None:
> > @@ -2929,7 +2936,9 @@ def newreporequirements(ui, createopts):
> >                                   'compression engines'))
> >
> >      # zlib is the historical default and doesn't need an explicit
> requirement.
> > -    if compengine != 'zlib':
> > +    elif compengine == 'zstd':
> > +        requirements.add('revlog-compression-zstd')
> > +    elif compengine != 'zlib':
> >          requirements.add('exp-compression-%s' % compengine)
> >
> >      if scmutil.gdinitconfig(ui):
> > diff --git a/mercurial/upgrade.py b/mercurial/upgrade.py
> > --- a/mercurial/upgrade.py
> > +++ b/mercurial/upgrade.py
> > @@ -325,10 +325,16 @@ class compressionengine(formatvariant):
> >
> >      @classmethod
> >      def fromrepo(cls, repo):
> > +        # we allow multiple compression engine requirement to co-exist
> because
> > +        # strickly speaking, revlog seems to support mixed compression
> style.
> > +        #
> > +        # The compression used for new entries will be "the last one"
> > +        compression = 'zlib'
> >          for req in repo.requirements:
> > -            if req.startswith('exp-compression-'):
> > -                return req.split('-', 2)[2]
> > -        return 'zlib'
> > +            prefix = req.startswith
> > +            if prefix('revlog-compression-') or
> prefix('exp-compression-'):
> > +                compression = req.split('-', 2)[2]
> > +        return compression
> >
> >      @classmethod
> >      def fromconfig(cls, repo):
> > diff --git a/tests/test-repo-compengines.t
> b/tests/test-repo-compengines.t
> > --- a/tests/test-repo-compengines.t
> > +++ b/tests/test-repo-compengines.t
> > @@ -44,9 +44,9 @@ A requirement specifying an unknown comp
> >    $ cd zstd
> >    $ cat .hg/requires
> >    dotencode
> > -  exp-compression-zstd
> >    fncache
> >    generaldelta
> > +  revlog-compression-zstd
> >    revlogv1
> >    sparserevlog
> >    store
>
Pierre-Yves David - April 16, 2019, 1:15 p.m.
On 4/16/19 12:28 PM, Gregory Szorc wrote:
> On Mon, Apr 15, 2019 at 8:34 AM Augie Fackler <raf@durin42.com 
> <mailto:raf@durin42.com>> wrote:
> 
>     Greg: ping on these? I'm aware you had some unease on this topic, and
>     I'd like to figure out if we should ship this. Thanks!
> 
> 
> I've been enjoying my holiday away from the computer. I met with 
> Pierre-Yves and Georges yesterday and agreed to review these before the 
> freeze...
> 
> 
>     On Tue, Apr 09, 2019 at 12:06:00PM +0200, Pierre-Yves David wrote:
>      > # HG changeset patch
>      > # User Pierre-Yves David <pierre-yves.david@octobus.net
>     <mailto:pierre-yves.david@octobus.net>>
>      > # Date 1553707623 -3600
>      > #      Wed Mar 27 18:27:03 2019 +0100
>      > # Node ID 8ce788576265eb8b84cb697ea2e09b2246fb7b81
>      > # Parent  456c37433c433ee59d321315da24eb944e495f08
>      > # EXP-Topic zstd-revlog
>      > # Available At https://bitbucket.org/octobus/mercurial-devel/
>      > #              hg pull
>     https://bitbucket.org/octobus/mercurial-devel/ -r 8ce788576265
>      > compression: introduce an official `zstd-revlog` requirement
>      >
>      > This requirement supersedes `exp-compression-zstd`. However, we
>     keep support for
>      > the old requirement.
>      >
>      > Strictly speaking, we do not need to add a new requirement, there
>     are no logic
>      > change making "new" repo incompatible with mercurial client using
>     a version
>      > that support `exp-compression-zstd`.
>      >
>      > The choice to introduce a new requirement is motivated by the
>     following:
>      >
>      > * The previous requirement was explicitly "experimental". Using
>     it by default
>      >   could confuse users.
>      >
>      > * adding support for a hypothetical third compression engine will
>     requires new
>      >   code, and will comes with its own requirement tool.
>      >
>      > * We won't use it as the default for a while since I do not think
>     we support
>      >   zstd on all platform. I can imagine we'll gain another
>     (unrelated but on my
>      >   default) requirement by the time we turn this zstd by default.
> 
> 
> So, the reason why I named things "compression-<format>" was so we could 
> have a per-compressor requirement that could potentially allow 
> *anything* to use that compressor. For example, "compression-zstd" could 
> allow random on-disk files to be compressed with zstd.
> 
> Of course, *any* time you make a BC change to how the repository is read 
> from the filesystem, you need a new requirement, otherwise old clients 
> may attempt to open the repo and get confused or cause corruption. So 
> even if we have an e.g. "compression-zstd" requirement, we would still 
> need additional requirements to denote when zstd support is added to 
> individual components. e.g. we'd need a dedicated requirement for "zstd 
> in revlogs."
> 
> I think I'm OK with the approach in this patch.
> 
>      >
>      > diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
>      > --- a/mercurial/localrepo.py
>      > +++ b/mercurial/localrepo.py
>      > @@ -645,6 +645,8 @@ def gathersupportedrequirements(ui):
>      >          engine = util.compengines[name]
>      >          if engine.revlogheader():
>      >              supported.add(b'exp-compression-%s' % name)
>      > +            if engine.name <http://engine.name>() == 'zstd':
>      > +                supported.add(b'revlog-compression-zstd')
> 
> 
> I'm still not a fan of special casing zstd: there should be something on 
> the compression engine API doing this. But I'm willing to look the other 
> way for now because doing it as part of the compression engine API means 
> stabilizing the "exp-compression-<format>" bit and I don't want to do 
> that just yet.
> 
> There is also an existing bug where we fail to check engine.available(). 
> This will result in Mercurial thinking it can open zstd repositories and 
> then failing when it tries to use zstd when it isn't available. 
> (Compression engines are always present/registered but they may not be 
> available at run-time due to missing dependencies.)
> 
> Could you please send a patch (either standalone or a v4 of this one) 
> that adds a check for engine.available() and skips engines that aren't 
> usable?

I am on it. It looks like only one line needs to be updated. I am 
sending a V4 with this extra patch + update to the last patch.

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -645,6 +645,8 @@  def gathersupportedrequirements(ui):
         engine = util.compengines[name]
         if engine.revlogheader():
             supported.add(b'exp-compression-%s' % name)
+            if engine.name() == 'zstd':
+                supported.add(b'revlog-compression-zstd')
 
     return supported
 
@@ -794,8 +796,13 @@  def resolverevlogstorevfsoptions(ui, req
         options[b'maxchainlen'] = maxchainlen
 
     for r in requirements:
-        if r.startswith(b'exp-compression-'):
-            options[b'compengine'] = r[len(b'exp-compression-'):]
+        # we allow multiple compression engine requirement to co-exist because
+        # strickly speaking, revlog seems to support mixed compression style.
+        #
+        # The compression used for new entries will be "the last one"
+        prefix = r.startswith
+        if prefix('revlog-compression-') or prefix('exp-compression-'):
+            options[b'compengine'] = r.split('-', 2)[2]
 
     options[b'zlib.level'] = ui.configint(b'storage', b'revlog.zlib.level')
     if options[b'zlib.level'] is not None:
@@ -2929,7 +2936,9 @@  def newreporequirements(ui, createopts):
                                  'compression engines'))
 
     # zlib is the historical default and doesn't need an explicit requirement.
-    if compengine != 'zlib':
+    elif compengine == 'zstd':
+        requirements.add('revlog-compression-zstd')
+    elif compengine != 'zlib':
         requirements.add('exp-compression-%s' % compengine)
 
     if scmutil.gdinitconfig(ui):
diff --git a/mercurial/upgrade.py b/mercurial/upgrade.py
--- a/mercurial/upgrade.py
+++ b/mercurial/upgrade.py
@@ -325,10 +325,16 @@  class compressionengine(formatvariant):
 
     @classmethod
     def fromrepo(cls, repo):
+        # we allow multiple compression engine requirement to co-exist because
+        # strickly speaking, revlog seems to support mixed compression style.
+        #
+        # The compression used for new entries will be "the last one"
+        compression = 'zlib'
         for req in repo.requirements:
-            if req.startswith('exp-compression-'):
-                return req.split('-', 2)[2]
-        return 'zlib'
+            prefix = req.startswith
+            if prefix('revlog-compression-') or prefix('exp-compression-'):
+                compression = req.split('-', 2)[2]
+        return compression
 
     @classmethod
     def fromconfig(cls, repo):
diff --git a/tests/test-repo-compengines.t b/tests/test-repo-compengines.t
--- a/tests/test-repo-compengines.t
+++ b/tests/test-repo-compengines.t
@@ -44,9 +44,9 @@  A requirement specifying an unknown comp
   $ cd zstd
   $ cat .hg/requires
   dotencode
-  exp-compression-zstd
   fncache
   generaldelta
+  revlog-compression-zstd
   revlogv1
   sparserevlog
   store