Patchwork [4,of,8,"] compression: introduce a `storage.revlog.zstd.level` configuration

login
register
mail settings
Submitter Pierre-Yves David
Date March 31, 2019, 3:36 p.m.
Message ID <bcc4ba4c53b44dc6013b.1554046580@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/39422/
State Accepted
Headers show

Comments

Pierre-Yves David - March 31, 2019, 3:36 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1553708159 -3600
#      Wed Mar 27 18:35:59 2019 +0100
# Node ID bcc4ba4c53b44dc6013b89f8c85b0f1967dfaebb
# Parent  df7c537a8d07d6c1d4e7aa7604af30a57717bcf6
# EXP-Topic zstd-revlog
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r bcc4ba4c53b4
compression: introduce a `storage.revlog.zstd.level` configuration

This option control the zstd compression level used when compressing revlog
chunk. The usage of zstd for revlog compression has not graduated from
experimental yet, but we intend to fix that soon.

The option name for the compression level is more straight forward to pick, so
this changesets comes first.  Having a dedicated option for each compression
engine is useful because they don't support the same range of values.

I ran the same measurement as for the zlib compression level (in the parent
changesets). The variation in repository size is stay mostly in the same (small)
range. The "read/write" performance see smallish variation, but are overall much
better than zlib. Write performance show the same tend of having better write
performance for when reaching high-end compression.

Again, we don't intend to change the default zstd compression level (currently:
3) in this series. However this is worth investigating in the future.

The Performance comparison of zlib vs zstd is quite impressive. The repository
size stay in the same range, but the performance are much better in all
situations.

Comparison summary
Gregory Szorc - April 2, 2019, 5:56 p.m.
On Sun, Mar 31, 2019 at 8:39 AM Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1553708159 -3600
> #      Wed Mar 27 18:35:59 2019 +0100
> # Node ID bcc4ba4c53b44dc6013b89f8c85b0f1967dfaebb
> # Parent  df7c537a8d07d6c1d4e7aa7604af30a57717bcf6
> # EXP-Topic zstd-revlog
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r
> bcc4ba4c53b4
> compression: introduce a `storage.revlog.zstd.level` configuration
>

> This option control the zstd compression level used when compressing revlog
> chunk. The usage of zstd for revlog compression has not graduated from
> experimental yet, but we intend to fix that soon.
>
> The option name for the compression level is more straight forward to
> pick, so
> this changesets comes first.  Having a dedicated option for each
> compression
> engine is useful because they don't support the same range of values.
>
> I ran the same measurement as for the zlib compression level (in the parent
> changesets). The variation in repository size is stay mostly in the same
> (small)
> range. The "read/write" performance see smallish variation, but are
> overall much
> better than zlib. Write performance show the same tend of having better
> write
> performance for when reaching high-end compression.
>
> Again, we don't intend to change the default zstd compression level
> (currently:
> 3) in this series. However this is worth investigating in the future.
>
> The Performance comparison of zlib vs zstd is quite impressive. The
> repository
> size stay in the same range, but the performance are much better in all
> situations.
>
> Comparison summary
> ==================
>
> We are looking at:
> - performance range for zlib
> - performance range for zstd
> - comparison of default zstd (level-3) to default zlib (level 6)
> - comparison of the slowest zstd time to the fastest zlib time
>
> Read performance:
> -----------------
>           |           zlib          |           zstd          | cmp | f2s
> mercurial |   0.170159 -   0.189219 |   0.144127 -   0.149624 | 80% | 88%
> pypy      |   2.679217 -   2.768691 |   1.532317 -   1.705044 | 60% | 63%
> netbeans  | 122.477027 - 141.620281 |  72.996346 -  89.731560 | 58% | 73%
> mozilla   | 147.867662 - 170.572118 |  91.700995 - 105.853099 | 56% | 71%
>
> Write performance:
> ------------------
>           |           zlib          |           zstd          | cmp | f2s
> mercurial |  53.250304 - 56.2936129 |  40.877025 -  45.677286 | 75% | 86%
> pypy      | 460.721984 - 476.589918 | 270.545409 - 301.002219 | 63% | 65%
> netbeans  | 520.560316 - 715.930400 | 370.356311 - 428.329652 | 55% | 82%
> mozilla   | 739.803002 - 987.056093 | 505.152906 - 591.930683 | 57% | 80%
>
> Raw data
> --------
>
> repo      alg lvl  .hg/store size  00manifest.d read       write
>
> mercurial zlib  1      49,402,813     5,963,475   0.170159  53.250304
> mercurial zlib  6      47,197,397     5,875,730   0.182820  56.264320
> mercurial zlib  9      47,121,596     5,849,781   0.189219  56.293612
>
> mercurial zstd  1      49,737,084     5,966,355   0.144127  40.877025
> mercurial zstd  3      48,961,867     5,895,208   0.146376  42.268142
> mercurial zstd  5      48,200,592     5,938,676   0.149624  43.162875
> mercurial zstd 10      47,833,520     5,913,353   0.145185  44.012489
> mercurial zstd 15      47,314,604     5,728,679   0.147686  45.677286
> mercurial zstd 20      47,330,502     5,830,539   0.145789  45.025407
> mercurial zstd 22      47,330,076     5,830,539   0.143996  44.690460
>
>
> pypy      zlib  1     370,830,572    28,462,425   2.679217 460.721984
> pypy      zlib  6     340,112,317    27,648,747   2.768691 467.537158
> pypy      zlib  9     338,360,736    27,639,003   2.763495 476.589918
>
> pypy      zstd  1     362,377,479    27,916,214   1.532317 270.545409
> pypy      zstd  3     354,137,693    27,905,988   1.686718 294.951509
> pypy      zstd  5     342,640,043    27,655,774   1.705044 301.002219
> pypy      zstd 10     334,224,327    27,164,493   1.567287 285.186239
> pypy      zstd 15     329,000,363    26,645,965   1.637729 299.561332
> pypy      zstd 20     324,534,039    26,199,547   1.526813 302.149827
> pypy      zstd 22     324,530,595    26,198,932   1.525718 307.821218
>
>
> netbeans  zlib  1   1,281,847,810   165,495,457 122.477027 520.560316
> netbeans  zlib  6   1,205,284,353   159,161,207 139.876147 715.930400
> netbeans  zlib  9   1,197,135,671   155,034,586 141.620281 678.297064
>
> netbeans  zstd  1   1,259,581,737   160,840,613  72.996346 370.356311
> netbeans  zstd  3   1,232,978,122   157,691,551  81.622317 396.733087
> netbeans  zstd  5   1,208,034,075   160,246,880  83.080549 364.342626
> netbeans  zstd 10   1,188,624,176   156,083,417  79.323935 403.594602
> netbeans  zstd 15   1,176,973,589   153,859,477  89.731560 428.329652
> netbeans  zstd 20   1,162,958,258   151,147,535  82.842667 392.335349
> netbeans  zstd 22   1,162,707,029   151,150,220  82.565695 402.840655
>
>
> mozilla   zlib  1   2,775,497,186   298,527,987 147.867662 751.263721
> mozilla   zlib  6   2,596,856,420   286,597,671 170.572118 987.056093
> mozilla   zlib  9   2,587,542,494   287,018,264 163.622338 739.803002
>
> mozilla   zstd  1   2,723,159,348   286,617,532  91.700995 570.042751
> mozilla   zstd  3   2,665,055,001   286,152,013  95.240155 561.412805
> mozilla   zstd  5   2,607,819,817   288,060,030 101.978048 505.152906
> mozilla   zstd 10   2,558,761,085   283,967,648 104.113481 497.771202
> mozilla   zstd 15   2,526,216,060   275,581,300 105.853099 591.930683
> mozilla   zstd 20   2,485,114,806   266,478,859  95.268795 576.515389
> mozilla   zstd 22   2,484,869,080   266,456,505  94.429282 572.785537
>
> diff --git a/mercurial/configitems.py b/mercurial/configitems.py
> --- a/mercurial/configitems.py
> +++ b/mercurial/configitems.py
> @@ -995,6 +995,9 @@ coreconfigitem('storage', 'revlog.reuse-
>  coreconfigitem('storage', 'revlog.zlib.level',
>      default=None,
>  )
> +coreconfigitem('storage', 'revlog.zstd.level',
> +    default=None,
> +)
>  coreconfigitem('server', 'bookmarks-pushkey-compat',
>      default=True,
>  )
> diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt
> --- a/mercurial/help/config.txt
> +++ b/mercurial/help/config.txt
> @@ -1886,6 +1886,12 @@ category impact performance and reposito
>      Value range from 1 (lowest compression) to 9 (highest compression).
> Zlib
>      default value is 6.
>
> +
> +``revlog.zstd.level``
> +    zstd compression level used when storing data into the repository.
> Accepted
> +    Value range from 1 (lowest compression) to 22 (highest compression).
> +    (default 3)
> +
>  ``server``
>  ----------
>
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -802,6 +802,11 @@ def resolverevlogstorevfsoptions(ui, req
>          if not (0 <= options[b'zlib.level'] <= 9):
>              msg = _('invalid value for `storage.revlog.zlib.level`
> config: %d')
>              raise error.Abort(msg % options[b'zlib.level'])
> +    options[b'zstd.level'] = ui.configint(b'storage',
> b'revlog.zstd.level')
> +    if options[b'zstd.level'] is not None:
> +        if not (0 <= options[b'zstd.level'] <= 22):
> +            msg = _('invalid value for `storage.revlog.zstd.level`
> config: %d')
> +            raise error.Abort(msg % options[b'zstd.level'])
>

I'm probably going to queue this. However, zstd supports negative
compression levels. When you go negative, zstd approaches lz4's performance.

Instead of trying to screen for the allowed levels here, I would catch the
ValueError raised when constructing the ZstdCompressor and turn it into an
error.Abort. This can be done as a follow-up.



>      if repository.NARROW_REQUIREMENT in requirements:
>          options[b'enableellipsis'] = True
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -419,6 +419,8 @@ class revlog(object):
>              self._compengine = opts['compengine']
>          if 'zlib.level' in opts:
>              self._compengineopts['zlib.level'] = opts['zlib.level']
> +        if 'zstd.level' in opts:
> +            self._compengineopts['zstd.level'] = opts['zstd.level']
>          if 'maxdeltachainspan' in opts:
>              self._maxdeltachainspan = opts['maxdeltachainspan']
>          if self._mmaplargeindex and 'mmapindexthreshold' in opts:
> diff --git a/mercurial/utils/compression.py
> b/mercurial/utils/compression.py
> --- a/mercurial/utils/compression.py
> +++ b/mercurial/utils/compression.py
> @@ -721,8 +721,12 @@ class _zstdengine(compressionengine):
>
>      def revlogcompressor(self, opts=None):
>          opts = opts or {}
> -        return self.zstdrevlogcompressor(self._module,
> -                                         level=opts.get('level', 3))
> +        level = opts.get('zstd.level')
> +        if level is None:
> +            level = opts.get('level')
> +        if level is None:
> +            level = 3
> +        return self.zstdrevlogcompressor(self._module, level=level)
>
>  compengines.register(_zstdengine())
>
> diff --git a/tests/test-repo-compengines.t b/tests/test-repo-compengines.t
> --- a/tests/test-repo-compengines.t
> +++ b/tests/test-repo-compengines.t
> @@ -138,3 +138,58 @@ Test error cases
>    abort: invalid value for `storage.revlog.zlib.level` config: 42
>    [255]
>
> +checking zstd options
> +=====================
> +
> +  $ hg init zstd-level-default --config
> experimental.format.compression=zstd
> +  $ hg init zstd-level-1 --config experimental.format.compression=zstd
> +  $ cat << EOF >> zstd-level-1/.hg/hgrc
> +  > [storage]
> +  > revlog.zstd.level=1
> +  > EOF
> +  $ hg init zstd-level-22 --config experimental.format.compression=zstd
> +  $ cat << EOF >> zstd-level-22/.hg/hgrc
> +  > [storage]
> +  > revlog.zstd.level=22
> +  > EOF
> +
> +
> +  $ commitone() {
> +  >    repo=$1
> +  >    cp $RUNTESTDIR/bundles/issue4438-r1.hg $repo/a
> +  >    hg -R $repo add $repo/a
> +  >    hg -R $repo commit -m some-commit
> +  > }
> +
> +  $ for repo in zstd-level-default zstd-level-1 zstd-level-22; do
> +  >     commitone $repo
> +  > done
> +
> +  $ $RUNTESTDIR/f -s zstd-*/.hg/store/data/*
> +  zstd-level-1/.hg/store/data/a.i: size=4097
> +  zstd-level-22/.hg/store/data/a.i: size=4091
> +  zstd-level-default/.hg/store/data/a.i: size=4094
> +
> +Test error cases
> +
> +  $ hg init zstd-level-invalid --config
> experimental.format.compression=zstd
> +  $ cat << EOF >> zstd-level-invalid/.hg/hgrc
> +  > [storage]
> +  > revlog.zstd.level=foobar
> +  > EOF
> +  $ commitone zstd-level-invalid
> +  abort: storage.revlog.zstd.level is not a valid integer ('foobar')
> +  abort: storage.revlog.zstd.level is not a valid integer ('foobar')
> +  [255]
> +
> +  $ hg init zstd-level-out-of-range --config
> experimental.format.compression=zstd
> +  $ cat << EOF >> zstd-level-out-of-range/.hg/hgrc
> +  > [storage]
> +  > revlog.zstd.level=42
> +  > EOF
> +
> +  $ commitone zstd-level-out-of-range
> +  abort: invalid value for `storage.revlog.zstd.level` config: 42
> +  abort: invalid value for `storage.revlog.zstd.level` config: 42
> +  [255]
> +
>

Patch

==================

We are looking at:
- performance range for zlib
- performance range for zstd
- comparison of default zstd (level-3) to default zlib (level 6)
- comparison of the slowest zstd time to the fastest zlib time

Read performance:
-----------------
          |           zlib          |           zstd          | cmp | f2s
mercurial |   0.170159 -   0.189219 |   0.144127 -   0.149624 | 80% | 88%
pypy      |   2.679217 -   2.768691 |   1.532317 -   1.705044 | 60% | 63%
netbeans  | 122.477027 - 141.620281 |  72.996346 -  89.731560 | 58% | 73%
mozilla   | 147.867662 - 170.572118 |  91.700995 - 105.853099 | 56% | 71%

Write performance:
------------------
          |           zlib          |           zstd          | cmp | f2s
mercurial |  53.250304 - 56.2936129 |  40.877025 -  45.677286 | 75% | 86%
pypy      | 460.721984 - 476.589918 | 270.545409 - 301.002219 | 63% | 65%
netbeans  | 520.560316 - 715.930400 | 370.356311 - 428.329652 | 55% | 82%
mozilla   | 739.803002 - 987.056093 | 505.152906 - 591.930683 | 57% | 80%

Raw data
--------

repo      alg lvl  .hg/store size  00manifest.d read       write

mercurial zlib  1      49,402,813     5,963,475   0.170159  53.250304
mercurial zlib  6      47,197,397     5,875,730   0.182820  56.264320
mercurial zlib  9      47,121,596     5,849,781   0.189219  56.293612

mercurial zstd  1      49,737,084     5,966,355   0.144127  40.877025
mercurial zstd  3      48,961,867     5,895,208   0.146376  42.268142
mercurial zstd  5      48,200,592     5,938,676   0.149624  43.162875
mercurial zstd 10      47,833,520     5,913,353   0.145185  44.012489
mercurial zstd 15      47,314,604     5,728,679   0.147686  45.677286
mercurial zstd 20      47,330,502     5,830,539   0.145789  45.025407
mercurial zstd 22      47,330,076     5,830,539   0.143996  44.690460


pypy      zlib  1     370,830,572    28,462,425   2.679217 460.721984
pypy      zlib  6     340,112,317    27,648,747   2.768691 467.537158
pypy      zlib  9     338,360,736    27,639,003   2.763495 476.589918

pypy      zstd  1     362,377,479    27,916,214   1.532317 270.545409
pypy      zstd  3     354,137,693    27,905,988   1.686718 294.951509
pypy      zstd  5     342,640,043    27,655,774   1.705044 301.002219
pypy      zstd 10     334,224,327    27,164,493   1.567287 285.186239
pypy      zstd 15     329,000,363    26,645,965   1.637729 299.561332
pypy      zstd 20     324,534,039    26,199,547   1.526813 302.149827
pypy      zstd 22     324,530,595    26,198,932   1.525718 307.821218


netbeans  zlib  1   1,281,847,810   165,495,457 122.477027 520.560316
netbeans  zlib  6   1,205,284,353   159,161,207 139.876147 715.930400
netbeans  zlib  9   1,197,135,671   155,034,586 141.620281 678.297064

netbeans  zstd  1   1,259,581,737   160,840,613  72.996346 370.356311
netbeans  zstd  3   1,232,978,122   157,691,551  81.622317 396.733087
netbeans  zstd  5   1,208,034,075   160,246,880  83.080549 364.342626
netbeans  zstd 10   1,188,624,176   156,083,417  79.323935 403.594602
netbeans  zstd 15   1,176,973,589   153,859,477  89.731560 428.329652
netbeans  zstd 20   1,162,958,258   151,147,535  82.842667 392.335349
netbeans  zstd 22   1,162,707,029   151,150,220  82.565695 402.840655


mozilla   zlib  1   2,775,497,186   298,527,987 147.867662 751.263721
mozilla   zlib  6   2,596,856,420   286,597,671 170.572118 987.056093
mozilla   zlib  9   2,587,542,494   287,018,264 163.622338 739.803002

mozilla   zstd  1   2,723,159,348   286,617,532  91.700995 570.042751
mozilla   zstd  3   2,665,055,001   286,152,013  95.240155 561.412805
mozilla   zstd  5   2,607,819,817   288,060,030 101.978048 505.152906
mozilla   zstd 10   2,558,761,085   283,967,648 104.113481 497.771202
mozilla   zstd 15   2,526,216,060   275,581,300 105.853099 591.930683
mozilla   zstd 20   2,485,114,806   266,478,859  95.268795 576.515389
mozilla   zstd 22   2,484,869,080   266,456,505  94.429282 572.785537

diff --git a/mercurial/configitems.py b/mercurial/configitems.py
--- a/mercurial/configitems.py
+++ b/mercurial/configitems.py
@@ -995,6 +995,9 @@  coreconfigitem('storage', 'revlog.reuse-
 coreconfigitem('storage', 'revlog.zlib.level',
     default=None,
 )
+coreconfigitem('storage', 'revlog.zstd.level',
+    default=None,
+)
 coreconfigitem('server', 'bookmarks-pushkey-compat',
     default=True,
 )
diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt
--- a/mercurial/help/config.txt
+++ b/mercurial/help/config.txt
@@ -1886,6 +1886,12 @@  category impact performance and reposito
     Value range from 1 (lowest compression) to 9 (highest compression). Zlib
     default value is 6.
 
+
+``revlog.zstd.level``
+    zstd compression level used when storing data into the repository. Accepted
+    Value range from 1 (lowest compression) to 22 (highest compression).
+    (default 3)
+
 ``server``
 ----------
 
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -802,6 +802,11 @@  def resolverevlogstorevfsoptions(ui, req
         if not (0 <= options[b'zlib.level'] <= 9):
             msg = _('invalid value for `storage.revlog.zlib.level` config: %d')
             raise error.Abort(msg % options[b'zlib.level'])
+    options[b'zstd.level'] = ui.configint(b'storage', b'revlog.zstd.level')
+    if options[b'zstd.level'] is not None:
+        if not (0 <= options[b'zstd.level'] <= 22):
+            msg = _('invalid value for `storage.revlog.zstd.level` config: %d')
+            raise error.Abort(msg % options[b'zstd.level'])
 
     if repository.NARROW_REQUIREMENT in requirements:
         options[b'enableellipsis'] = True
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -419,6 +419,8 @@  class revlog(object):
             self._compengine = opts['compengine']
         if 'zlib.level' in opts:
             self._compengineopts['zlib.level'] = opts['zlib.level']
+        if 'zstd.level' in opts:
+            self._compengineopts['zstd.level'] = opts['zstd.level']
         if 'maxdeltachainspan' in opts:
             self._maxdeltachainspan = opts['maxdeltachainspan']
         if self._mmaplargeindex and 'mmapindexthreshold' in opts:
diff --git a/mercurial/utils/compression.py b/mercurial/utils/compression.py
--- a/mercurial/utils/compression.py
+++ b/mercurial/utils/compression.py
@@ -721,8 +721,12 @@  class _zstdengine(compressionengine):
 
     def revlogcompressor(self, opts=None):
         opts = opts or {}
-        return self.zstdrevlogcompressor(self._module,
-                                         level=opts.get('level', 3))
+        level = opts.get('zstd.level')
+        if level is None:
+            level = opts.get('level')
+        if level is None:
+            level = 3
+        return self.zstdrevlogcompressor(self._module, level=level)
 
 compengines.register(_zstdengine())
 
diff --git a/tests/test-repo-compengines.t b/tests/test-repo-compengines.t
--- a/tests/test-repo-compengines.t
+++ b/tests/test-repo-compengines.t
@@ -138,3 +138,58 @@  Test error cases
   abort: invalid value for `storage.revlog.zlib.level` config: 42
   [255]
 
+checking zstd options
+=====================
+
+  $ hg init zstd-level-default --config experimental.format.compression=zstd
+  $ hg init zstd-level-1 --config experimental.format.compression=zstd
+  $ cat << EOF >> zstd-level-1/.hg/hgrc
+  > [storage]
+  > revlog.zstd.level=1
+  > EOF
+  $ hg init zstd-level-22 --config experimental.format.compression=zstd
+  $ cat << EOF >> zstd-level-22/.hg/hgrc
+  > [storage]
+  > revlog.zstd.level=22
+  > EOF
+
+
+  $ commitone() {
+  >    repo=$1
+  >    cp $RUNTESTDIR/bundles/issue4438-r1.hg $repo/a
+  >    hg -R $repo add $repo/a
+  >    hg -R $repo commit -m some-commit
+  > }
+
+  $ for repo in zstd-level-default zstd-level-1 zstd-level-22; do
+  >     commitone $repo
+  > done
+
+  $ $RUNTESTDIR/f -s zstd-*/.hg/store/data/*
+  zstd-level-1/.hg/store/data/a.i: size=4097
+  zstd-level-22/.hg/store/data/a.i: size=4091
+  zstd-level-default/.hg/store/data/a.i: size=4094
+
+Test error cases
+
+  $ hg init zstd-level-invalid --config experimental.format.compression=zstd
+  $ cat << EOF >> zstd-level-invalid/.hg/hgrc
+  > [storage]
+  > revlog.zstd.level=foobar
+  > EOF
+  $ commitone zstd-level-invalid
+  abort: storage.revlog.zstd.level is not a valid integer ('foobar')
+  abort: storage.revlog.zstd.level is not a valid integer ('foobar')
+  [255]
+
+  $ hg init zstd-level-out-of-range --config experimental.format.compression=zstd
+  $ cat << EOF >> zstd-level-out-of-range/.hg/hgrc
+  > [storage]
+  > revlog.zstd.level=42
+  > EOF
+
+  $ commitone zstd-level-out-of-range
+  abort: invalid value for `storage.revlog.zstd.level` config: 42
+  abort: invalid value for `storage.revlog.zstd.level` config: 42
+  [255]
+