Patchwork hgweb: config option to control zlib compression level

login
register
mail settings
Submitter Gregory Szorc
Date Aug. 8, 2016, 1:12 a.m.
Message ID <d5e9dedbc7f912ff2dc2.1470618757@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/16188/
State Accepted
Headers show

Comments

Gregory Szorc - Aug. 8, 2016, 1:12 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1470618598 25200
#      Sun Aug 07 18:09:58 2016 -0700
# Node ID d5e9dedbc7f912ff2dc2e92586347250aee11ac2
# Parent  3ef9aa7ad1fc4c43b92d48e4bb1f4e3de68b6910
hgweb: config option to control zlib compression level

Before this patch, the HTTP transport protocol would always zlib
compress certain responses (notably "getbundle" wire protocol commands)
at zlib compression level 6.

zlib can be a massive CPU resource sink for servers. Some server
operators may wish to reduce server-side CPU requirements while
requiring more bandwidth. This is common on corporate intranets, for
example. Others may wish to use more CPU but reduce bandwidth.

This patch introduces a config option to allow server operators
to control the zlib compression level.

On the "mozilla-unified" generaldelta repository, setting this
value to "0" (disable compression) results in server-side CPU
utilization for a `hg clone` going from ~180s to ~124s CPU time on
my i7-6700K.  A level of "1" (which increases the transfer size from
~1,074 MB at level 6 to ~1,222 MB) utilizes ~132s CPU time.
Anton Shestakov - Aug. 8, 2016, 1:31 a.m.
08.08.2016, 09:12, "Gregory Szorc" <gregory.szorc@gmail.com>:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1470618598 25200
> # Sun Aug 07 18:09:58 2016 -0700
> # Node ID d5e9dedbc7f912ff2dc2e92586347250aee11ac2
> # Parent 3ef9aa7ad1fc4c43b92d48e4bb1f4e3de68b6910
> hgweb: config option to control zlib compression level
>
> Before this patch, the HTTP transport protocol would always zlib
> compress certain responses (notably "getbundle" wire protocol commands)
> at zlib compression level 6.
>
> zlib can be a massive CPU resource sink for servers. Some server
> operators may wish to reduce server-side CPU requirements while
> requiring more bandwidth. This is common on corporate intranets, for
> example. Others may wish to use more CPU but reduce bandwidth.
>
> This patch introduces a config option to allow server operators
> to control the zlib compression level.
>
> On the "mozilla-unified" generaldelta repository, setting this
> value to "0" (disable compression) results in server-side CPU
> utilization for a `hg clone` going from ~180s to ~124s CPU time on
> my i7-6700K. A level of "1" (which increases the transfer size from
> ~1,074 MB at level 6 to ~1,222 MB) utilizes ~132s CPU time.

Looks good to me.

Does memlevel argument to zlib.compressobj look interesting? The docs say "higher values use more memory, but are faster and produce smaller output", maybe bumping it to 9 from the default 8 would also lower CPU time.
Gregory Szorc - Aug. 8, 2016, 2:04 a.m.
On Sun, Aug 7, 2016 at 6:31 PM, Anton Shestakov <engored@ya.ru> wrote:

> 08.08.2016, 09:12, "Gregory Szorc" <gregory.szorc@gmail.com>:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1470618598 25200
> > # Sun Aug 07 18:09:58 2016 -0700
> > # Node ID d5e9dedbc7f912ff2dc2e92586347250aee11ac2
> > # Parent 3ef9aa7ad1fc4c43b92d48e4bb1f4e3de68b6910
> > hgweb: config option to control zlib compression level
> >
> > Before this patch, the HTTP transport protocol would always zlib
> > compress certain responses (notably "getbundle" wire protocol commands)
> > at zlib compression level 6.
> >
> > zlib can be a massive CPU resource sink for servers. Some server
> > operators may wish to reduce server-side CPU requirements while
> > requiring more bandwidth. This is common on corporate intranets, for
> > example. Others may wish to use more CPU but reduce bandwidth.
> >
> > This patch introduces a config option to allow server operators
> > to control the zlib compression level.
> >
> > On the "mozilla-unified" generaldelta repository, setting this
> > value to "0" (disable compression) results in server-side CPU
> > utilization for a `hg clone` going from ~180s to ~124s CPU time on
> > my i7-6700K. A level of "1" (which increases the transfer size from
> > ~1,074 MB at level 6 to ~1,222 MB) utilizes ~132s CPU time.
>
> Looks good to me.
>
> Does memlevel argument to zlib.compressobj look interesting? The docs say
> "higher values use more memory, but are faster and produce smaller output",
> maybe bumping it to 9 from the default 8 would also lower CPU time.
>

From zconf.h:

/* The memory requirements for deflate are (in bytes): (1 <<
(windowBits+2)) + (1 << (memLevel+9))

windowBits is 15 by default. So the memory requirements for 8 vs 9 are 256k
vs 384k. There is no impact on memory requirements for inflate. That memory
use is insignificant compared to what Python is doing, so increasing
shouldn't be a problem.

However, on an uncompressed mozilla-unified generaldelta bundle, memlevel 9
is surprisingly both slower and yields worse compression than memlevel 8!
On my machine, memlevel 9 uses ~5s more CPU time and produces an archive
~1.6 MB larger (from 1,072 MB). memlevel 6 is ~25s slower and produces an
archive ~2.7 MB larger. That's definitely unexpected given zlib's docs.
YMMV, of course, but it's not looking good for changing memlevel.
Yuya Nishihara - Aug. 8, 2016, 1:45 p.m.
On Sun, 07 Aug 2016 18:12:37 -0700, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1470618598 25200
> #      Sun Aug 07 18:09:58 2016 -0700
> # Node ID d5e9dedbc7f912ff2dc2e92586347250aee11ac2
> # Parent  3ef9aa7ad1fc4c43b92d48e4bb1f4e3de68b6910
> hgweb: config option to control zlib compression level
> 
> Before this patch, the HTTP transport protocol would always zlib
> compress certain responses (notably "getbundle" wire protocol commands)
> at zlib compression level 6.
> 
> zlib can be a massive CPU resource sink for servers. Some server
> operators may wish to reduce server-side CPU requirements while
> requiring more bandwidth. This is common on corporate intranets, for
> example. Others may wish to use more CPU but reduce bandwidth.
> 
> This patch introduces a config option to allow server operators
> to control the zlib compression level.

Sounds good, queued, thanks.

> --- a/mercurial/hgweb/protocol.py
> +++ b/mercurial/hgweb/protocol.py
> @@ -69,17 +69,17 @@ class webproto(wireproto.abstractserverp
>      def redirect(self):
>          self.oldio = self.ui.fout, self.ui.ferr
>          self.ui.ferr = self.ui.fout = stringio()
>      def restore(self):
>          val = self.ui.fout.getvalue()
>          self.ui.ferr, self.ui.fout = self.oldio
>          return val
>      def groupchunks(self, cg):
> -        z = zlib.compressobj()
> +        z = zlib.compressobj(self.ui.configint('server', 'zliblevel', -1))

Any idea if untrusted hgrc should be allowed?
Gregory Szorc - Aug. 14, 2016, 9:36 p.m.
On Mon, Aug 8, 2016 at 6:45 AM, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sun, 07 Aug 2016 18:12:37 -0700, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1470618598 25200
> > #      Sun Aug 07 18:09:58 2016 -0700
> > # Node ID d5e9dedbc7f912ff2dc2e92586347250aee11ac2
> > # Parent  3ef9aa7ad1fc4c43b92d48e4bb1f4e3de68b6910
> > hgweb: config option to control zlib compression level
> >
> > Before this patch, the HTTP transport protocol would always zlib
> > compress certain responses (notably "getbundle" wire protocol commands)
> > at zlib compression level 6.
> >
> > zlib can be a massive CPU resource sink for servers. Some server
> > operators may wish to reduce server-side CPU requirements while
> > requiring more bandwidth. This is common on corporate intranets, for
> > example. Others may wish to use more CPU but reduce bandwidth.
> >
> > This patch introduces a config option to allow server operators
> > to control the zlib compression level.
>
> Sounds good, queued, thanks.
>
> > --- a/mercurial/hgweb/protocol.py
> > +++ b/mercurial/hgweb/protocol.py
> > @@ -69,17 +69,17 @@ class webproto(wireproto.abstractserverp
> >      def redirect(self):
> >          self.oldio = self.ui.fout, self.ui.ferr
> >          self.ui.ferr = self.ui.fout = stringio()
> >      def restore(self):
> >          val = self.ui.fout.getvalue()
> >          self.ui.ferr, self.ui.fout = self.oldio
> >          return val
> >      def groupchunks(self, cg):
> > -        z = zlib.compressobj()
> > +        z = zlib.compressobj(self.ui.configint('server', 'zliblevel',
> -1))
>
> Any idea if untrusted hgrc should be allowed?
>

I can go both ways on this. However, I'm inclined to say "no" because
setting the compression level very high or disabling it could make a CPU or
network usage increase significantly. I think server operators would want
to retain control over this setting for that reason.
Yuya Nishihara - Aug. 15, 2016, 9:45 a.m.
On Sun, 14 Aug 2016 14:36:03 -0700, Gregory Szorc wrote:
> On Mon, Aug 8, 2016 at 6:45 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Sun, 07 Aug 2016 18:12:37 -0700, Gregory Szorc wrote:
> > > --- a/mercurial/hgweb/protocol.py
> > > +++ b/mercurial/hgweb/protocol.py
> > > @@ -69,17 +69,17 @@ class webproto(wireproto.abstractserverp
> > >      def redirect(self):
> > >          self.oldio = self.ui.fout, self.ui.ferr
> > >          self.ui.ferr = self.ui.fout = stringio()
> > >      def restore(self):
> > >          val = self.ui.fout.getvalue()
> > >          self.ui.ferr, self.ui.fout = self.oldio
> > >          return val
> > >      def groupchunks(self, cg):
> > > -        z = zlib.compressobj()
> > > +        z = zlib.compressobj(self.ui.configint('server', 'zliblevel',
> > -1))
> >
> > Any idea if untrusted hgrc should be allowed?
>
> I can go both ways on this. However, I'm inclined to say "no" because
> setting the compression level very high or disabling it could make a CPU or
> network usage increase significantly. I think server operators would want
> to retain control over this setting for that reason.

Okay, let's add a comment because hgweb generally reads config with
untrusted=True.

Patch

diff --git a/mercurial/help/config.txt b/mercurial/help/config.txt
--- a/mercurial/help/config.txt
+++ b/mercurial/help/config.txt
@@ -1552,16 +1552,31 @@  Controls generic server settings.
     Like ``bundle1.pull`` but only used if the repository is using the
     *generaldelta* storage format. (default: True)
 
     Large repositories using the *generaldelta* storage format should
     consider setting this option because converting *generaldelta*
     repositories to the exchange format required by the bundle1 data
     format can consume a lot of CPU.
 
+``zliblevel``
+    Integer between ``-1`` and ``9`` that controls the zlib compression level
+    for wire protocol commands that send zlib compressed output (notably the
+    commands that send repository history data).
+
+    The default (``-1``) uses the default zlib compression level, which is
+    likely equivalent to ``6``. ``0`` means no compression. ``9`` means
+    maximum compression.
+
+    Setting this option allows server operators to make trade-offs between
+    bandwidth and CPU used. Lowering the compression lowers CPU utilization
+    but sends more bytes to clients.
+
+    This option only impacts the HTTP server.
+
 ``smtp``
 --------
 
 Configuration for extensions that need to send email messages.
 
 ``host``
     Host name of mail server, e.g. "mail.example.com".
 
diff --git a/mercurial/hgweb/protocol.py b/mercurial/hgweb/protocol.py
--- a/mercurial/hgweb/protocol.py
+++ b/mercurial/hgweb/protocol.py
@@ -69,17 +69,17 @@  class webproto(wireproto.abstractserverp
     def redirect(self):
         self.oldio = self.ui.fout, self.ui.ferr
         self.ui.ferr = self.ui.fout = stringio()
     def restore(self):
         val = self.ui.fout.getvalue()
         self.ui.ferr, self.ui.fout = self.oldio
         return val
     def groupchunks(self, cg):
-        z = zlib.compressobj()
+        z = zlib.compressobj(self.ui.configint('server', 'zliblevel', -1))
         while True:
             chunk = cg.read(4096)
             if not chunk:
                 break
             yield z.compress(chunk)
         yield z.flush()
     def _client(self):
         return 'remote:%s:%s:%s' % (