Patchwork D5296: store: don't read the whole fncache in memory

login
register
mail settings
Submitter phabricator
Date March 17, 2019, 12:33 a.m.
Message ID <bb1b4833c784d6219b9c2f923226a0df@localhost.localdomain>
Download mbox | patch
Permalink /patch/39300/
State Not Applicable
Headers show

Comments

phabricator - March 17, 2019, 12:33 a.m.
pulkit updated this revision to Diff 14529.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D5296?vs=14256&id=14529

REVISION DETAIL
  https://phab.mercurial-scm.org/D5296

AFFECTED FILES
  mercurial/store.py

CHANGE DETAILS




To: pulkit, #hg-reviewers
Cc: indygreg, yuja, mjpieters, mercurial-devel
Yuya Nishihara - March 17, 2019, 1:44 a.m.
> +        for c in iter(functools.partial(fp.read, chunksize), b''):
> +            chunk += c
> +            try:
> +                p = chunk.rindex(b'\n')
> +                self.entries |= set(decodedir(chunk[:p + 1]).splitlines())

Nit: you can `entries.update(any_iterable)` which would be slightly faster
when bucket size changes.

>   Right now this patch does not have any tests. How should I add them?
>   
>   1. add a debug config option and pass that to fncachestore and then to fncache
>   2. have a function which returns the chunk_size, write an extenion in tests which wrap that function and enable that extension in tests

Something like 2. `chunksize` can be a module constant, which can later be
updated by an extension. That's probably the easiest option.
phabricator - March 17, 2019, 1:51 a.m.
yuja added a comment.


  > +        for c in iter(functools.partial(fp.read, chunksize), b''):
  >  +            chunk += c
  >  +            try:
  >  +                p = chunk.rindex(b'\n')
  >  +                self.entries |= set(decodedir(chunk[:p + 1]).splitlines())
  
  Nit: you can `entries.update(any_iterable)` which would be slightly faster
  when bucket size changes.
  
  >   Right now this patch does not have any tests. How should I add them?
  >   
  >   1. add a debug config option and pass that to fncachestore and then to fncache
  >   2. have a function which returns the chunk_size, write an extenion in tests which wrap that function and enable that extension in tests
  
  Something like 2. `chunksize` can be a module constant, which can later be
  updated by an extension. That's probably the easiest option.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5296

To: pulkit, #hg-reviewers
Cc: indygreg, yuja, mjpieters, mercurial-devel

Patch

diff --git a/mercurial/store.py b/mercurial/store.py
--- a/mercurial/store.py
+++ b/mercurial/store.py
@@ -8,6 +8,7 @@ 
 from __future__ import absolute_import
 
 import errno
+import functools
 import hashlib
 import os
 import stat
@@ -463,7 +464,21 @@ 
             # skip nonexistent file
             self.entries = set()
             return
-        self.entries = set(decodedir(fp.read()).splitlines())
+
+        self.entries = set()
+        chunksize = (10 ** 6) # 1 MB
+        chunk = b''
+        for c in iter(functools.partial(fp.read, chunksize), b''):
+            chunk += c
+            try:
+                p = chunk.rindex(b'\n')
+                self.entries |= set(decodedir(chunk[:p + 1]).splitlines())
+                chunk = chunk[p + 1:]
+            except ValueError:
+                # substring '\n' not found, maybe the entry is bigger than the
+                # chunksize, so let's keep iterating
+                pass
+
         self._checkentries(fp)
         fp.close()