Patchwork [03,of,10] revlog: add the option to track the expected compression upper bound

login
register
mail settings
Submitter Pierre-Yves David
Date June 13, 2019, 1:22 p.m.
Message ID <057f04f3d9aee2818b41.1560432178@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/40475/
State Accepted
Headers show

Comments

Pierre-Yves David - June 13, 2019, 1:22 p.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@octobus.net>
# Date 1556231302 -7200
#      Fri Apr 26 00:28:22 2019 +0200
# Node ID 057f04f3d9aee2818b41a540152e48cbd60a34ec
# Parent  cbcf0facaa1f7db528e648f6aee1b8df5b7fe6c4
# EXP-Topic delta-extra
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 057f04f3d9ae
revlog: add the option to track the expected compression upper bound

There are various optimization we can do if we can estimate the size of delta
before actually spending CPU compressing them. So we add a attributed dedicated
to tracking that.

We only use it on Manifest because (1) it structure is quite stable across all
Mercurial repository so its compression ratio is fairly universal. This is the
revlog with most extreme delta (cf the sparse-revlog optimization).

This will be put to use in later changesets.

Right now the compression upper bound is set to 10. This is a fairly
conservative value (observed value is more around 3), but I prefer to be safe
while introducing the optimization principles. We can tune the optimization
threshold later.

Patch

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -2277,6 +2277,10 @@  def _temprevlog(ui, orig, truncaterev):
 
     if orig._inline:
         raise error.Abort('not supporting inline revlog (yet)')
+    revlogkwargs = {}
+    k = 'upperboundcomp'
+    if util.safehasattr(orig, k):
+        revlogkwargs[k] = getattr(orig, k)
 
     origindexpath = orig.opener.join(orig.indexfile)
     origdatapath = orig.opener.join(orig.datafile)
@@ -2308,7 +2312,7 @@  def _temprevlog(ui, orig, truncaterev):
 
         dest = revlog.revlog(vfs,
                              indexfile=indexname,
-                             datafile=dataname)
+                             datafile=dataname, **revlogkwargs)
         if dest._inline:
             raise error.Abort('not supporting inline revlog (yet)')
         # make sure internals are initialized
diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -1417,6 +1417,10 @@  class manifestfulltextcache(util.lrucach
             self.write()
         self._read = False
 
+# and upper bound of what we expect from compression
+# (real live value seems to be "3")
+MAXCOMPRESSION = 10
+
 @interfaceutil.implementer(repository.imanifeststorage)
 class manifestrevlog(object):
     '''A revlog that stores manifest texts. This is responsible for caching the
@@ -1467,7 +1471,8 @@  class manifestrevlog(object):
         self._revlog = revlog.revlog(opener, indexfile,
                                      # only root indexfile is cached
                                      checkambig=not bool(tree),
-                                     mmaplargeindex=True)
+                                     mmaplargeindex=True,
+                                     upperboundcomp=MAXCOMPRESSION)
 
         self.index = self._revlog.index
         self.version = self._revlog.version
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -337,15 +337,21 @@  class revlog(object):
     configured threshold.
 
     If censorable is True, the revlog can have censored revisions.
+
+    If `upperboundcomp` is not None, this is the expected maximal gain from
+    compression for the data content.
     """
     def __init__(self, opener, indexfile, datafile=None, checkambig=False,
-                 mmaplargeindex=False, censorable=False):
+                 mmaplargeindex=False, censorable=False,
+                 upperboundcomp=None):
         """
         create a revlog object
 
         opener is a function that abstracts the file opening operation
         and can be used to implement COW semantics or the like.
+
         """
+        self.upperboundcomp = upperboundcomp
         self.indexfile = indexfile
         self.datafile = datafile or (indexfile[:-2] + ".d")
         self.opener = opener