Patchwork D2885: RFC: use Redis to cache file data

login
register
mail settings
Submitter phabricator
Date March 16, 2018, 11:07 p.m.
Message ID <differential-rev-PHID-DREV-seaytu4zegvcpkn7vodp-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29574/
State Superseded
Headers show

Comments

phabricator - March 16, 2018, 11:07 p.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This [very hacky] commit implements a new files store that uses
  Redis to cache files data. It services requests from Redis (if
  available) and falls back to a base store / interface (revlogs in
  our case) on misses.
  
  The purpose of this commit is to first demonstrate the value in having
  interfaces for storage. If we code to the interface, then another
  interface can come along and do useful things - like caching.
  
  The other purpose was to investigate performance. Would a memory-backed
  key-value store have a significant impact on performance of our
  experimental wire protocol command to serve file data fulltexts for
  a specific revisions? The answer is a very resounding yet!
  
  Using the same mozilla-unified revision from the previous commit:
  
  - no compression: 1478MB;  ~94s   wall; ~56s   CPU w/ hot redis: 1478MB;   ~9.6s wall;  ~8.6s CPU
  - zstd level 3:    343MB;  ~97s   wall; ~57s   CPU w/ hot redis:  343MB;   ~8.5s wall;  ~8.3s CPU
  - zstd level 1 w/ hot redis:  377MB;   ~6.8s wall;  ~6.6s CPU
  - zlib level 6:    367MB; ~116s   wall; ~74s   CPU w/ hot redis:  367MB;  ~36.7s wall; ~36s   CPU
  
  For the curious, the ls profiler says that our hotspot without
  compression is in socket I/O. With zstd compression, the hotspot is
  compression.
  
  I reckon the reason for the socket I/O overhead is because we end up
  writing tons more chunks on the wire when uncompressed (compression
  will effectively ensure each output chunk is a similar, large'ish
  size). All those extra Python function calls and system calls do add
  up!
  
  Anyway, I'm definitely happy with the performance improvements. I'd
  say this was a useful experiment!

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2885

AFFECTED FILES
  mercurial/localrepo.py
  mercurial/revlogstore.py

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - March 22, 2018, 3:33 p.m.
indygreg abandoned this revision.
indygreg added a comment.


  This was just an RFC. It doesn't need to be an open review.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2885

To: indygreg, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/mercurial/revlogstore.py b/mercurial/revlogstore.py
--- a/mercurial/revlogstore.py
+++ b/mercurial/revlogstore.py
@@ -35,3 +35,51 @@ 
 
             data = fl.read(node)
             yield 'ok', path, node, data
+
+def redisfiledatakey(path, node):
+    return b'filedata:%s:%s' % (path, node)
+
+class redisacceleratedrevlogfilesstore(repository.basefilesstore):
+    """"A filesstore that can use a redis server to speed up operations."""
+    def __init__(self, redis, basestore):
+        self._redis = redis
+        self._basestore = basestore
+
+    def resolvefilesdata(self, entries):
+        # Our strategy is to batch requests to redis because this is faster
+        # than a command for every entry.
+
+        batch = []
+        for i, entry in enumerate(entries):
+            batch.append(entry)
+
+            if i and not i % 1000:
+                for res in self._processfiledatabatch(batch):
+                    yield res
+
+                batch = []
+
+        if batch:
+            for res in self._processfiledatabatch(batch):
+                yield res
+
+    def _processfiledatabatch(self, batch):
+        keys = [redisfiledatakey(path, node) for path, node in batch]
+
+        missing = []
+
+        for i, redisdata in enumerate(self._redis.mget(keys)):
+            path, node = batch[i]
+
+            if redisdata is None:
+                missing.append((path, node))
+            else:
+                yield 'ok', path, node, redisdata
+
+        # Now resolve all the missing data from the base store.
+        for res, path, node, data in self._basestore.resolvefilesdata(missing):
+            yield res, path, node, data
+
+            # Don't forget to cache it!
+            if res == 'ok':
+                self._redis.set(redisfiledatakey(path, node), data)
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -480,7 +480,11 @@ 
             else: # standard vfs
                 self.svfs.audit = self._getsvfsward(self.svfs.audit)
         self._applyopenerreqs()
-        self.filesstore = revlogstore.revlogfilesstore(self.svfs)
+        import redis
+        basefilesstore = revlogstore.revlogfilesstore(self.svfs)
+        redisconn = redis.StrictRedis(host='localhost', port=6379, db=0)
+        self.filesstore = revlogstore.redisacceleratedrevlogfilesstore(
+            redisconn, basefilesstore)
 
         if create:
             self._writerequirements()