Patchwork [2,of,3] revlog: read/cache chunks in fixed windows of 64 KB

login
register
mail settings
Submitter Brodie Rao
Date Nov. 17, 2013, 7:51 p.m.
Message ID <5d08fc0f4d14e4165392.1384717904@hit-nxdomain.opendns.com>
Download mbox | patch
Permalink /patch/3027/
State Superseded
Headers show

Comments

Brodie Rao - Nov. 17, 2013, 7:51 p.m.
# HG changeset patch
# User Brodie Rao <brodie@sf.io>
# Date 1384717820 18000
#      Sun Nov 17 14:50:20 2013 -0500
# Node ID 5d08fc0f4d14e41653927ce729eef0303c8f1786
# Parent  cd18bc43b89a33f321b2025b4e7e32c8f33f4614
revlog: read/cache chunks in fixed windows of 64 KB

When reading a revlog chunk, instead of reading up to 64 KB ahead of the
request offset and caching that, this change caches a fixed window before
and after the requested data that falls on 64 KB boundaries. This increases
cache hits when reading revlogs backwards.

Running perfmoonwalk on the Mercurial repo (with almost 20,000 changesets) on
Mac OS X with an SSD, before this change:

$ hg perfmoonwalk
! wall 2.307994 comb 2.310000 user 2.120000 sys 0.190000 (best of 5)

(Each run has 10,668 cache hits and 9,304 misses.)

After this change:

$ hg perfmoonwalk
! wall 1.814117 comb 1.810000 user 1.810000 sys 0.000000 (best of 6)

(19,931 cache hits, 62 misses.)

On a busy NFS share, before this change:

$ hg perfmoonwalk
! wall 17.000034 comb 4.100000 user 3.270000 sys 0.830000 (best of 3)

After:

$ hg perfmoonwalk
! wall 1.746115 comb 1.670000 user 1.660000 sys 0.010000 (best of 5)

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -820,13 +820,14 @@  class revlog(object):
         else:
             df = self.opener(self.datafile)
 
-        readahead = max(65536, length)
-        df.seek(offset)
-        d = df.read(readahead)
+        realoffset = offset & ~65535
+        reallength = ((offset + length + 65536) & ~65535) - realoffset
+        df.seek(realoffset)
+        d = df.read(reallength)
         df.close()
-        self._addchunk(offset, d)
-        if readahead > length:
-            return util.buffer(d, 0, length)
+        self._addchunk(realoffset, d)
+        if offset != realoffset or reallength != length:
+            return util.buffer(d, offset - realoffset, length)
         return d
 
     def _getchunk(self, offset, length):