Patchwork D1390: bundle2: don't use seekable bundle2 parts by default (issue5691)

login
register
mail settings
Submitter phabricator
Date Nov. 14, 2017, 4:59 a.m.
Message ID <differential-rev-PHID-DREV-qjhkffrxoxmcfi3uj6cr-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/25549/
State Superseded
Headers show

Comments

phabricator - Nov. 14, 2017, 4:59 a.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  The last commit removed the last use of the bundle2 part seek() API
  in the generic bundle2 part iteration code. This means we can now
  switch to using unseekable bundle2 parts by default and have the
  special consumers that actually need the behavior request it.
  
  This commit changes unbundle20.iterparts() to expose non-seekable
  unbundlepart instances by default. If seekable parts are needed,
  callers can pass "seekable=True." The bundlerepo class needs
  seekable parts, so it does this.
  
  The interrupt handler is also changed to use a regular unbundlepart.
  So, by default, all consumers except bundlerepo will see unseekable
  parts.
  
  Because the behavior of the iterparts() benchmark changed, we add
  a variation to test seekable parts vs unseekable parts. And because
  parts no longer have seek() unless "seekable=True," we update the
  "part seek" benchmark.
  
  Speaking of benchmarks, this change has the following impact to
  `hg perfbundleread` on an uncompressed bundle of the Firefox repo
  (6,070,036,163 bytes):
  
  ! read(8k)
  ! wall 0.722709 comb 0.720000 user 0.150000 sys 0.570000 (best of 14)
  ! read(16k)
  ! wall 0.602208 comb 0.590000 user 0.080000 sys 0.510000 (best of 17)
  ! read(32k)
  ! wall 0.554018 comb 0.560000 user 0.050000 sys 0.510000 (best of 18)
  ! read(128k)
  ! wall 0.520086 comb 0.530000 user 0.020000 sys 0.510000 (best of 20)
  ! bundle2 forwardchunks()
  ! wall 2.996329 comb 3.000000 user 2.300000 sys 0.700000 (best of 4)
  ! bundle2 iterparts()
  ! wall 8.070791 comb 8.060000 user 7.180000 sys 0.880000 (best of 3)
  ! wall 6.983756 comb 6.980000 user 6.220000 sys 0.760000 (best of 3)
  ! bundle2 iterparts() seekable
  ! wall 8.132131 comb 8.110000 user 7.160000 sys 0.950000 (best of 3)
  ! bundle2 part seek()
  ! wall 10.370142 comb 10.350000 user 7.430000 sys 2.920000 (best of 3)
  ! wall 10.860942 comb 10.840000 user 7.790000 sys 3.050000 (best of 3)
  ! bundle2 part read(8k)
  ! wall 8.599892 comb 8.580000 user 7.720000 sys 0.860000 (best of 3)
  ! wall 7.258035 comb 7.260000 user 6.470000 sys 0.790000 (best of 3)
  ! bundle2 part read(16k)
  ! wall 8.265361 comb 8.250000 user 7.360000 sys 0.890000 (best of 3)
  ! wall 7.099891 comb 7.080000 user 6.310000 sys 0.770000 (best of 3)
  ! bundle2 part read(32k)
  ! wall 8.290308 comb 8.280000 user 7.330000 sys 0.950000 (best of 3)
  ! wall 6.964685 comb 6.950000 user 6.130000 sys 0.820000 (best of 3)
  ! bundle2 part read(128k)
  ! wall 8.204900 comb 8.150000 user 7.210000 sys 0.940000 (best of 3)
  ! wall 6.852867 comb 6.850000 user 6.060000 sys 0.790000 (best of 3)
  
  The significant speedup is due to not incurring the overhead to track
  payload offset data. Of course, this overhead is proportional to
  bundle2 part size. So a multiple gigabyte changegroup part is on the
  extreme side of the spectrum for real-world impact.
  
  In addition to the CPU efficiency wins, not tracking offset data
  also means not using memory to hold that data. Using a bundle based on
  the example BSD repository in issue 5691, this change has a drastic
  impact to memory usage during `hg unbundle` (`hg clone` would behave
  similarly). Before, memory usage incrementally increased for the
  duration of bundle processing. In other words, as we advanced through
  the changegroup and bundle2 part, we kept allocating more memory to
  hold offset data. After this change, we still increase memory during
  changegroup application. But the rate of increase is significantly
  slower. (I suspect there is a memory leak somewhere, but that's for
  another commit.)
  
  The RSS at the end of filelog application is as follows:
  
  Before: ~752 MB
  After:  ~567 MB
  
  So, we were storing ~185 MB of offset data that we never even used.
  Talk about wasteful!
  
  .. api::
  
    bundle2 parts are no longer seekable by default.
  
  .. perf::
  
    bundle2 read I/O throughput significantly increased.
  
  .. perf::
  
    Significant memory use reductions when reading from bundle2 bundles.
    
    On the BSD repository, peak RSS during changegroup application
    decreased by ~185 MB from ~752 MB to ~567 MB.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D1390

AFFECTED FILES
  contrib/perf.py
  mercurial/bundle2.py
  mercurial/bundlerepo.py

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/mercurial/bundlerepo.py b/mercurial/bundlerepo.py
--- a/mercurial/bundlerepo.py
+++ b/mercurial/bundlerepo.py
@@ -289,7 +289,7 @@ 
             self._cgunpacker = None
 
             cgpart = None
-            for part in bundle.iterparts():
+            for part in bundle.iterparts(seekable=True):
                 if part.type == 'changegroup':
                     if cgpart:
                         raise NotImplementedError("can't process "
diff --git a/mercurial/bundle2.py b/mercurial/bundle2.py
--- a/mercurial/bundle2.py
+++ b/mercurial/bundle2.py
@@ -845,16 +845,17 @@ 
             yield self._readexact(size)
 
 
-    def iterparts(self):
+    def iterparts(self, seekable=False):
         """yield all parts contained in the stream"""
+        cls = seekableunbundlepart if seekable else unbundlepart
         # make sure param have been loaded
         self.params
         # From there, payload need to be decompressed
         self._fp = self._compengine.decompressorreader(self._fp)
         indebug(self.ui, 'start extraction of bundle2 parts')
         headerblock = self._readpartheader()
         while headerblock is not None:
-            part = seekableunbundlepart(self.ui, headerblock, self._fp)
+            part = cls(self.ui, headerblock, self._fp)
             yield part
             # Ensure part is fully consumed so we can start reading the next
             # part.
@@ -1154,7 +1155,7 @@ 
         if headerblock is None:
             indebug(self.ui, 'no part found during interruption.')
             return
-        part = seekableunbundlepart(self.ui, headerblock, self._fp)
+        part = unbundlepart(self.ui, headerblock, self._fp)
         op = interruptoperation(self.ui)
         hardabort = False
         try:
diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -546,8 +546,12 @@ 
         for part in bundle.iterparts():
             pass
 
+    def iterpartsseekable(bundle):
+        for part in bundle.iterparts(seekable=True):
+            pass
+
     def seek(bundle):
-        for part in bundle.iterparts():
+        for part in bundle.iterparts(seekable=True):
             part.seek(0, os.SEEK_END)
 
     def makepartreadnbytes(size):
@@ -583,6 +587,7 @@ 
             benches.extend([
                 (makebench(forwardchunks), 'bundle2 forwardchunks()'),
                 (makebench(iterparts), 'bundle2 iterparts()'),
+                (makebench(iterpartsseekable), 'bundle2 iterparts() seekable'),
                 (makebench(seek), 'bundle2 part seek()'),
                 (makepartreadnbytes(8192), 'bundle2 part read(8k)'),
                 (makepartreadnbytes(16384), 'bundle2 part read(16k)'),