Patchwork [13,of,13] revlog: enforce chunk slicing down to a certain size

login
register
mail settings
Submitter Boris Feld
Date July 10, 2018, 1:27 p.m.
Message ID <be01ab00bed851ca5c79.1531229242@FB-lair>
Download mbox | patch
Permalink /patch/32752/
State Accepted
Headers show

Comments

Boris Feld - July 10, 2018, 1:27 p.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1531218057 -7200
#      Tue Jul 10 12:20:57 2018 +0200
# Node ID be01ab00bed851ca5c7961fe176eaae1f16645ff
# Parent  3e174c2b2b66e92e75e57fe7286fce63fa1e39ad
# EXP-Topic write-for-sparse-read
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r be01ab00bed8
revlog: enforce chunk slicing down to a certain size

Limit maximum chunk size to 4x final size when reading a revision from a
revlog. We only apply this logic when the target size is known from the
revlog.

Ideally, revlog's delta chain would be written in a way that does not trigger
this extra slicing often. However, having this second guarantee that we won't
read unexpectedly large amounts of memory in all cases is important for the
future. Future delta chain building algorithms might have good reason to
create delta chain with such characteristics.

Including this code in core as soon as possible will make Mercurial 4.7
forward-compatible with such improvement.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -1949,7 +1949,7 @@  class revlog(object):
         """
         return self.decompress(self._getsegmentforrevs(rev, rev, df=df)[1])
 
-    def _chunks(self, revs, df=None):
+    def _chunks(self, revs, df=None, targetsize=None):
         """Obtain decompressed chunks for the specified revisions.
 
         Accepts an iterable of numeric revisions that are assumed to be in
@@ -1976,7 +1976,7 @@  class revlog(object):
         if not self._withsparseread:
             slicedchunks = (revs,)
         else:
-            slicedchunks = _slicechunk(self, revs)
+            slicedchunks = _slicechunk(self, revs, targetsize)
 
         for revschunk in slicedchunks:
             firstrev = revschunk[0]
@@ -2079,7 +2079,12 @@  class revlog(object):
             # drop cache to save memory
             self._cache = None
 
-            bins = self._chunks(chain, df=_df)
+            targetsize = None
+            rawsize = self.index[rev][2]
+            if 0 <= rawsize:
+                targetsize = 4 * rawsize
+
+            bins = self._chunks(chain, df=_df, targetsize=targetsize)
             if rawtext is None:
                 rawtext = bytes(bins[0])
                 bins = bins[1:]