Patchwork D2883: revlogstore: create and implement an interface for repo files storage

login
register
mail settings
Submitter phabricator
Date March 16, 2018, 11:07 p.m.
Message ID <differential-rev-PHID-DREV-rzfgjpbmlqce6luik6ro-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29572/
State New
Headers show

Comments

phabricator - March 16, 2018, 11:07 p.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  In order to better support partial clones, we will need to overhaul
  local repository storage. This will be a major effort, as many parts
  of the code assume things like the existence of revlogs for storing
  data.
  
  To help support alternate storage implementations, we will create
  interfaces for accessing storage. The idea is that consumers will
  all code to an interface and any new interface-conforming
  implementation can come along and be swapped in to provide new and
  novel storage mechanisms.
  
  This commit starts the process of defining those interfaces.
  
  We define an interface for accessing files data. It has a single
  method for resolving the fulltext of an iterable of inputs.
  
  The interface is specifically defined to allow out-of-order responses.
  It also provides a mechanism for declaring that files data is censored.
  We *may* also want a mechanism to declare LFS or largefiles data.
  But I'm not sure how that mechanism works or what the best way to
  handle that would be, if any.
  
  We introduce a new "revlogstore" module to hold the definitions of
  these interfaces that use our existing revlog-based storage
  mechanism.
  
  An attribute pointing to the "files store" has been added to
  localrepository.
  
  No consumers of the new interface have been added. The interface
  should still be considered highly experimental and details are
  expected to change.
  
  It was tempting to define the interface as one level higher than
  file storage - in such a way to facilitate accessing changeset
  and manifest data as well. However, I believe these 3 primitives -
  changesets, manifests, and files - each have unique requirements
  that will dictate special, one-off methods on their storage
  interfaces. I'd rather we define our interfaces so they are
  tailored to each type initially. If an implementation wants to
  shoehorn all data into generic key-value blog store, they can
  still do that. And we also reserve the right to combine interfaces
  in the future. I just think that attempting to have the initial
  versions of the interfaces deviate too far from current reality will
  make it very challenging to define and implement them.
  
  The reason I'm defining and implementing this interface now is to
  support new (experimental) wire protocol commands to be used to
  support partial clone. Some of these commands will benefit from
  aggressive caching. I want to prove out the efficacy of the interfaces
  approach by implementing cache-based speedups in the interface layer.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2883

AFFECTED FILES
  mercurial/localrepo.py
  mercurial/repository.py
  mercurial/revlogstore.py

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - March 21, 2018, 2:30 a.m.
mharbison72 added a comment.


  It's probably too early to worry about for the experimenting that you're doing, but at some point, maybe this should also allow yielding the full text in chunks?  As it stands now, there are a couple places where LFS has to read in the full file, and one of those places is the filelog/revlog.  IIRC, largefiles manages to avoid that completely.
  
  This dated paged is the only thing that I could find talking about the issues with that approach:
  
  https://www.mercurial-scm.org/wiki/HandlingLargeFiles
  
  I'm not sure what this should look like either, but it seemed worthwhile to point out that page, with the accompanying discussion of revlog limitations.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2883

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - March 22, 2018, 3:21 p.m.
indygreg planned changes to this revision.
indygreg added a comment.


  I'll rebase this on top of `zope.interface` (https://phab.mercurial-scm.org/D2928 and friends). Please defer reviewing for now.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2883

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel
phabricator - March 22, 2018, 3:24 p.m.
indygreg added a comment.


  In https://phab.mercurial-scm.org/D2883#46712, @mharbison72 wrote:
  
  > It's probably too early to worry about for the experimenting that you're doing, but at some point, maybe this should also allow yielding the full text in chunks?  As it stands now, there are a couple places where LFS has to read in the full file, and one of those places is the filelog/revlog.  IIRC, largefiles manages to avoid that completely.
  
  
  Yes, we should definitely design the interface such that file fulltexts can be expressed as chunks. That doesn't mean we have to implement things to actually use chunks. But it will at least give us an escape hatch so we can do more reasonable things for large files in the future.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2883

To: indygreg, #hg-reviewers
Cc: mharbison72, mercurial-devel

Patch

diff --git a/mercurial/revlogstore.py b/mercurial/revlogstore.py
new file mode 100644
--- /dev/null
+++ b/mercurial/revlogstore.py
@@ -0,0 +1,37 @@ 
+# revlogstore.py - storage interface for repositories using revlog storage
+#
+# Copyright 2018 Gregory Szorc <gregory.szorc@gmail.com>
+#
+# This software may be used and distributed according to the terms of the
+# GNU General Public License version 2 or any later version.
+
+from __future__ import absolute_import
+
+from . import (
+    error,
+    filelog,
+    repository,
+)
+
+class revlogfilesstore(repository.basefilesstore):
+    """Files storage layer using revlogs for files storage."""
+
+    def __init__(self, svfs):
+        self._svfs = svfs
+
+    def resolvefilesdata(self, entries):
+        for path, node in entries:
+            fl = filelog.filelog(self._svfs, path)
+
+            try:
+                rev = fl.rev(node)
+            except error.LookupError:
+                yield 'missing', path, node, None
+                continue
+
+            if fl.iscensored(rev):
+                yield 'censored', path, node, None
+                continue
+
+            data = fl.read(node)
+            yield 'ok', path, node, data
diff --git a/mercurial/repository.py b/mercurial/repository.py
--- a/mercurial/repository.py
+++ b/mercurial/repository.py
@@ -266,3 +266,33 @@ 
 
 class legacypeer(peer, _baselegacywirecommands):
     """peer but with support for legacy wire protocol commands."""
+
+class basefilesstore(object):
+    """Storage interface for repository files data.
+
+    This interface defines mechanisms to access repository files data in a
+    storage agnostic manner. The goal of this interface is to abstract storage
+    implementations so implementation details of storage don't leak into
+    higher-level repository consumers.
+    """
+
+    __metaclass__ = abc.ABCMeta
+
+    def resolvefilesdata(self, entries):
+        """Resolve the fulltext data for an iterable of files.
+
+        Each entry is defined by a 2-tuple of (path, node).
+
+        The method is a generator that emits results as they become available.
+        Each emitted item is a 4-tuple of (result, path, node, data), where
+        the first element can be one of the following to represent the operation
+        result for this request:
+
+        ok
+           Successfully resolved fulltext data. Data field is a bytes-like
+           object.
+        missing
+           Data for this item not found. Data field is ``None``.
+        censored
+           Data for this revision is censored. Data field is ``None``.
+        """
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -52,6 +52,7 @@ 
     pycompat,
     repository,
     repoview,
+    revlogstore,
     revset,
     revsetlang,
     scmutil,
@@ -479,6 +480,8 @@ 
             else: # standard vfs
                 self.svfs.audit = self._getsvfsward(self.svfs.audit)
         self._applyopenerreqs()
+        self.filesstore = revlogstore.revlogfilesstore(self.svfs)
+
         if create:
             self._writerequirements()