Patchwork [10,of,10] deltas: set estimated compression upper bound to "3x" instead of "10x"

login
register
mail settings
Submitter Pierre-Yves David
Date June 13, 2019, 1:23 p.m.
Message ID <620e4fca59ae3b5b0276.1560432185@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/40478/
State Accepted
Headers show

Comments

Pierre-Yves David - June 13, 2019, 1:23 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1556232492 -7200
#      Fri Apr 26 00:48:12 2019 +0200
# Node ID 620e4fca59ae3b5b02763c0fac952693159d0014
# Parent  2cf494071b512e69877589bf740117a642d26330
# EXP-Topic delta-extra
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 620e4fca59ae
deltas: set estimated compression upper bound to "3x" instead of "10x"

In pratice, we very rarely observer compression better than "3x" on manifest
deltas. Having a more aggressive estimate significantly helps our pathological
use case on a private repository. Here are a comparison of timings using
different upper bound.

Estimated compression |    ø   |  ×10 |  ×5  |  ×3  |
timing                |  14.11 | 2.61 | 1.96 | 1.53 |


We also tested the impact of this series on an array of public repositories.
This shown no impact in either size nor timing.

Full data set below for those interested.

Size
----

Regarding size, not significant impact have been noticed on neither public nor
private repositories. Here are the number we gathered on public repositories:

zlib/upperbound | no            | 10x           | 5x            | 3x
mercurial       |     5 875 730 |     5 875 730 |     5 875 730 |     5 875 730
pypy            |    27 782 913 |    27 782 913 |    27 782 913 |    27 782 913
netbeans        |   159 161 207 |   159 161 207 |   159 161 207 |   159 959 879 (+0.5%)
mozilla-central |   323 841 642 |   323 841 642 |   323 841 642 |   319 867 519 (-2.5%)
mozilla-try     |   746 649 123 |   746 649 123 |   746 649 123 |   741 155 568 (-0.7%)
private-repo    | 1 485 287 294 | 1 485 287 294 | 1 485 287 294 | 1 409 248 382 (-5.1%)

zstd/upperbound | no            | 10x           | 5x            | 3x
mercurial       |     5 895 206 |     5 895 206 |     5 895 206 |     5 895 206
pypy            |    28 689 230 |    28 689 230 |    28 689 230 |    28 689 230
netbeans        |   157 636 387 |   157 636 387 |   157 636 387 |   159 692 678 (+1.3%)
mozilla-central |   317 650 281 |   317 650 281 |   317 650 281 |   319 613 603 (+0.6%)
mozilla-try     |   737 555 275 |   737 555 275 |   737 555 275 |   738 079 473 (+0.1%)
private-repo    | 1 352 362 982 | 1 352 362 982 | 1 346 961 880 | 1 361 327 384 (+0.7%)


Speed
------

Timing gathered using `hg perfrevlogwrite -m`. Value are in seconds.

mercurial

zlib   | no        | 10x       | 5x        | 3x        |
total  | 65.551783 | 65.388887 | 65.260658 | 65.321199 |
max    |  0.034544 |  0.034571 |  0.034659 |  0.034521 |
99.99% |  0.034544 |  0.034571 |  0.034659 |  0.034521 |

zstd   | no        | 10x       | 5x        | 3x        |
total  | 49.118449 | 49.054062 | 48.753588 | 48.740230 |
max    |  0.009338 |  0.009239 |  0.009202 |  0.009178 |
99.99% |  0.007618 |  0.007639 |  0.007626 |  0.007621 |

pypy

zlib   | no         | 10x        | 5x         | 3x         |
total  | 560.865984 | 558.983817 | 559.083815 | 559.349152 |
max    |   0.219614 |   0.215922 |   0.218112 |   0.218107 |
99.99% |   0.219614 |   0.215922 |   0.218112 |   0.218107 |

zstd   | no         | 10x        | 5x         | 3x         |
total  | 349.393280 | 347.395819 | 347.185407 | 345.643985 |
max    |   0.084143 |   0.083536 |   0.081834 |   0.082178 |
99.99% |   0.039445 |   0.039639 |   0.039612 |   0.039175 |

netbeans
zlib   | no           | 10x          | 5x           | 3x           |
total  | 33103.327727 | 33314.932260 | 33211.745233 | 33345.891778 |
max    |     2.666852 |     2.672059 |     2.662453 |     2.662936 |
99.99% |     2.058772 |     2.070429 |     2.069569 |     2.064653 |

zstd   | no           | 10x         | 5x            | 3x           |
total  | 20112.102708 | 20095.879719 | 20083.390300 | 20123.221859 |
max    |     2.063482 |     2.062851 |     2.065229 |     2.060147 |
99.99% |     1.146647 |     1.143794 |     1.142933 |     1.146529 |

mozilla
zlib   | no           | 10x          | 5x           | 3x           |
total  | 41374.102138 | 41418.816773 | 41381.956370 | 41334.280732 |
max    |     3.383474 |     3.387400 |     3.405711 |     3.387316 |
99.99% |     1.006755 |     1.005954 |     1.007700 |     1.007373 |

zstd   | no           | 10x          | 5x           | 3x           |
total  | 24689.691520 | 24643.939662 | 24664.630027 | 24664.512714 |
max    |     1.460822 |     1.449640 |     1.439747 |     1.465304 |
99.99% |     0.527111 |     0.527377 |     0.527807 |     0.527226 |
Augie Fackler - June 14, 2019, 2:46 p.m.
queued the series, very impressive work!

On Thu, Jun 13, 2019 at 02:23:05PM +0100, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@octobus.net>
> # Date 1556232492 -7200
> #      Fri Apr 26 00:48:12 2019 +0200
> # Node ID 620e4fca59ae3b5b02763c0fac952693159d0014
> # Parent  2cf494071b512e69877589bf740117a642d26330
> # EXP-Topic delta-extra
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 620e4fca59ae
> deltas: set estimated compression upper bound to "3x" instead of "10x"
>
> In pratice, we very rarely observer compression better than "3x" on manifest
> deltas. Having a more aggressive estimate significantly helps our pathological
> use case on a private repository. Here are a comparison of timings using
> different upper bound.
>
> Estimated compression |    ø   |  ×10 |  ×5  |  ×3  |
> timing                |  14.11 | 2.61 | 1.96 | 1.53 |
>
>
> We also tested the impact of this series on an array of public repositories.
> This shown no impact in either size nor timing.
>
> Full data set below for those interested.
>
> Size
> ----
>
> Regarding size, not significant impact have been noticed on neither public nor
> private repositories. Here are the number we gathered on public repositories:
>
> zlib/upperbound | no            | 10x           | 5x            | 3x
> mercurial       |     5 875 730 |     5 875 730 |     5 875 730 |     5 875 730
> pypy            |    27 782 913 |    27 782 913 |    27 782 913 |    27 782 913
> netbeans        |   159 161 207 |   159 161 207 |   159 161 207 |   159 959 879 (+0.5%)
> mozilla-central |   323 841 642 |   323 841 642 |   323 841 642 |   319 867 519 (-2.5%)
> mozilla-try     |   746 649 123 |   746 649 123 |   746 649 123 |   741 155 568 (-0.7%)
> private-repo    | 1 485 287 294 | 1 485 287 294 | 1 485 287 294 | 1 409 248 382 (-5.1%)
>
> zstd/upperbound | no            | 10x           | 5x            | 3x
> mercurial       |     5 895 206 |     5 895 206 |     5 895 206 |     5 895 206
> pypy            |    28 689 230 |    28 689 230 |    28 689 230 |    28 689 230
> netbeans        |   157 636 387 |   157 636 387 |   157 636 387 |   159 692 678 (+1.3%)
> mozilla-central |   317 650 281 |   317 650 281 |   317 650 281 |   319 613 603 (+0.6%)
> mozilla-try     |   737 555 275 |   737 555 275 |   737 555 275 |   738 079 473 (+0.1%)
> private-repo    | 1 352 362 982 | 1 352 362 982 | 1 346 961 880 | 1 361 327 384 (+0.7%)
>
>
> Speed
> ------
>
> Timing gathered using `hg perfrevlogwrite -m`. Value are in seconds.
>
> mercurial
>
> zlib   | no        | 10x       | 5x        | 3x        |
> total  | 65.551783 | 65.388887 | 65.260658 | 65.321199 |
> max    |  0.034544 |  0.034571 |  0.034659 |  0.034521 |
> 99.99% |  0.034544 |  0.034571 |  0.034659 |  0.034521 |
>
> zstd   | no        | 10x       | 5x        | 3x        |
> total  | 49.118449 | 49.054062 | 48.753588 | 48.740230 |
> max    |  0.009338 |  0.009239 |  0.009202 |  0.009178 |
> 99.99% |  0.007618 |  0.007639 |  0.007626 |  0.007621 |
>
> pypy
>
> zlib   | no         | 10x        | 5x         | 3x         |
> total  | 560.865984 | 558.983817 | 559.083815 | 559.349152 |
> max    |   0.219614 |   0.215922 |   0.218112 |   0.218107 |
> 99.99% |   0.219614 |   0.215922 |   0.218112 |   0.218107 |
>
> zstd   | no         | 10x        | 5x         | 3x         |
> total  | 349.393280 | 347.395819 | 347.185407 | 345.643985 |
> max    |   0.084143 |   0.083536 |   0.081834 |   0.082178 |
> 99.99% |   0.039445 |   0.039639 |   0.039612 |   0.039175 |
>
> netbeans
> zlib   | no           | 10x          | 5x           | 3x           |
> total  | 33103.327727 | 33314.932260 | 33211.745233 | 33345.891778 |
> max    |     2.666852 |     2.672059 |     2.662453 |     2.662936 |
> 99.99% |     2.058772 |     2.070429 |     2.069569 |     2.064653 |
>
> zstd   | no           | 10x         | 5x            | 3x           |
> total  | 20112.102708 | 20095.879719 | 20083.390300 | 20123.221859 |
> max    |     2.063482 |     2.062851 |     2.065229 |     2.060147 |
> 99.99% |     1.146647 |     1.143794 |     1.142933 |     1.146529 |
>
> mozilla
> zlib   | no           | 10x          | 5x           | 3x           |
> total  | 41374.102138 | 41418.816773 | 41381.956370 | 41334.280732 |
> max    |     3.383474 |     3.387400 |     3.405711 |     3.387316 |
> 99.99% |     1.006755 |     1.005954 |     1.007700 |     1.007373 |
>
> zstd   | no           | 10x          | 5x           | 3x           |
> total  | 24689.691520 | 24643.939662 | 24664.630027 | 24664.512714 |
> max    |     1.460822 |     1.449640 |     1.439747 |     1.465304 |
> 99.99% |     0.527111 |     0.527377 |     0.527807 |     0.527226 |
>
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -1419,7 +1419,7 @@ class manifestfulltextcache(util.lrucach
>
>  # and upper bound of what we expect from compression
>  # (real live value seems to be "3")
> -MAXCOMPRESSION = 10
> +MAXCOMPRESSION = 3
>
>  @interfaceutil.implementer(repository.imanifeststorage)
>  class manifestrevlog(object):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -1419,7 +1419,7 @@  class manifestfulltextcache(util.lrucach
 
 # and upper bound of what we expect from compression
 # (real live value seems to be "3")
-MAXCOMPRESSION = 10
+MAXCOMPRESSION = 3
 
 @interfaceutil.implementer(repository.imanifeststorage)
 class manifestrevlog(object):