Patchwork D5565: revlog: add version 2 format flag to control sparse

login
register
mail settings
Submitter phabricator
Date Jan. 11, 2019, 12:34 a.m.
Message ID <differential-rev-PHID-DREV-kgfc5cthcd7mh6z3mj5p-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/37651/
State New
Headers show

Comments

phabricator - Jan. 11, 2019, 12:34 a.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Revlog behavior should live as close to the revlog as possible.
  In version 1 revlogs, we engaged sparse semantics by using a
  repository requirement. This constituted action at a distance
  and wasn't optimal.
  
  This commit introduces a feature flag on version 2 revlogs that
  explicitly denotes whether sparse reading semantics are enabled.
  
  The flag is enabled by default for version 2 revlogs.
  
  In order to get the flag enabled by default, we had to tweak
  localrepo's logic for populating opener options to only set
  "sparse-revlog" if the requirement was present. This lets the
  revlog initializer use sane defaults depending on the revlog
  version. (The interaction between opener options and the revlog
  class is kinda wonky and could use a re-think. But my little
  brain isn't prepared to handle that at this time.)
  
  I'm also a bit unsure why we have 2 attributes controlling
  sparse behavior (revlog._sparserevlog and revlog._withsparseread).
  I *think* they can be consolidated. But I'm not sure. The new
  revlog feature flag implies both, which seems reasonable.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5565

AFFECTED FILES
  mercurial/help/internals/revlogs.txt
  mercurial/localrepo.py
  mercurial/revlog.py
  mercurial/revlogutils/constants.py
  tests/test-revlog-v2.t

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - Jan. 11, 2019, 10:06 a.m.
lothiraldan added a comment.


  > Revlog behavior should live as close to the revlog as possible.
  >  In version 1 revlogs, we engaged sparse semantics by using a
  >  repository requirement. This constituted action at a distance
  >  and wasn't optimal.
  > 
  > This commit introduces a feature flag on version 2 revlogs that
  >  explicitly denotes whether sparse reading semantics are enabled.
  
  Why have a flag for revlog v2 at all? Since v2 have not been "released"
  yet, why not use sparse-revlog logic in all case when using revlog v2?
  
  > The flag is enabled by default for version 2 revlogs.
  > 
  > In order to get the flag enabled by default, we had to tweak
  >  localrepo's logic for populating opener options to only set
  >  "sparse-revlog" if the requirement was present. This lets the
  >  revlog initializer use sane defaults depending on the revlog
  >  version. (The interaction between opener options and the revlog
  >  class is kinda wonky and could use a re-think. But my little
  >  brain isn't prepared to handle that at this time.)
  > 
  > I'm also a bit unsure why we have 2 attributes controlling
  >  sparse behavior (revlog._sparserevlog and revlog._withsparseread).
  >  I *think* they can be consolidated. But I'm not sure. The new
  >  revlog feature flag implies both, which seems reasonable.
  
  The revlog._sparserevlog flag enables sparse writing (and reading),
  writing sparse revlog requires readers to use sparse-reading and is
  gated behind a new requirement (introduced in hg 4.7).
  
  Sparse-reading can be used for non-sparse repositories, it is just a
  reader thing.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5565

To: indygreg, #hg-reviewers
Cc: lothiraldan, mercurial-devel
phabricator - Jan. 11, 2019, 11:11 p.m.
indygreg planned changes to this revision.
indygreg added a comment.


  In https://phab.mercurial-scm.org/D5565#82224, @lothiraldan wrote:
  
  > Why have a flag for revlog v2 at all? Since v2 have not been "released"
  >  yet, why not use sparse-revlog logic in all case when using revlog v2?
  
  
  That's a good point and I'm glad you weighed in on this. I meant to ask that question in the commit message!
  
  If we don't have a good reason to disable sparse revlogs, I'd be OK removing the feature flag and implying the feature by default. I'll update this commit accordingly unless others object to making is always on.
  
  > The revlog._sparserevlog flag enables sparse writing (and reading),
  >  writing sparse revlog requires readers to use sparse-reading and is
  >  gated behind a new requirement (introduced in hg 4.7).
  > 
  > Sparse-reading can be used for non-sparse repositories, it is just a
  >  reader thing.
  
  Thank you for clarifying this! It might be useful to have this better documented in the code...

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5565

To: indygreg, #hg-reviewers
Cc: lothiraldan, mercurial-devel

Patch

diff --git a/tests/test-revlog-v2.t b/tests/test-revlog-v2.t
--- a/tests/test-revlog-v2.t
+++ b/tests/test-revlog-v2.t
@@ -22,7 +22,7 @@ 
   $ cd empty-repo
   $ cat .hg/requires
   dotencode
-  exp-revlogv2.1
+  exp-revlogv2.2
   fncache
   sparserevlog
   store
@@ -58,8 +58,8 @@ 
 
   $ f --hexdump --bytes 4 .hg/store/00changelog.i
   .hg/store/00changelog.i:
-  0000: 00 01 de ad                                     |....|
+  0000: 00 03 de ad                                     |....|
 
   $ f --hexdump --bytes 4 .hg/store/data/foo.i
   .hg/store/data/foo.i:
-  0000: 00 01 de ad                                     |....|
+  0000: 00 03 de ad                                     |....|
diff --git a/mercurial/revlogutils/constants.py b/mercurial/revlogutils/constants.py
--- a/mercurial/revlogutils/constants.py
+++ b/mercurial/revlogutils/constants.py
@@ -24,11 +24,13 @@ 
 FLAG_INLINE_DATA = (1 << 16)
 # Only used by v1, implied by v2.
 FLAG_GENERALDELTA = (1 << 17)
+# Only used by v2.
+FLAG_SPARSE_DELTA_CHAINS = (1 << 17)
 REVLOG_DEFAULT_FLAGS = FLAG_INLINE_DATA
 REVLOG_DEFAULT_FORMAT = REVLOGV1
 REVLOG_DEFAULT_VERSION = REVLOG_DEFAULT_FORMAT | REVLOG_DEFAULT_FLAGS
 REVLOGV1_FLAGS = FLAG_INLINE_DATA | FLAG_GENERALDELTA
-REVLOGV2_FLAGS = FLAG_INLINE_DATA
+REVLOGV2_FLAGS = FLAG_INLINE_DATA | FLAG_SPARSE_DELTA_CHAINS
 
 # revlog index flags
 
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -37,6 +37,7 @@ 
 from .revlogutils.constants import (
     FLAG_GENERALDELTA,
     FLAG_INLINE_DATA,
+    FLAG_SPARSE_DELTA_CHAINS,
     REVIDX_DEFAULT_FLAGS,
     REVIDX_ELLIPSIS,
     REVIDX_EXTSTORED,
@@ -83,6 +84,7 @@ 
 REVLOGV2
 FLAG_INLINE_DATA
 FLAG_GENERALDELTA
+FLAG_SPARSE_DELTA_CHAINS
 REVLOG_DEFAULT_FLAGS
 REVLOG_DEFAULT_FORMAT
 REVLOG_DEFAULT_VERSION
@@ -392,6 +394,10 @@ 
 
         if 'revlogv2' in opts:
             newversionflags = REVLOGV2 | FLAG_INLINE_DATA
+
+            if 'sparse-revlog' not in opts or opts.get('sparse-revlog'):
+                newversionflags |= FLAG_SPARSE_DELTA_CHAINS
+
         elif 'revlogv1' in opts:
             newversionflags = REVLOGV1 | FLAG_INLINE_DATA
             if 'generaldelta' in opts:
@@ -412,10 +418,6 @@ 
             self._maxdeltachainspan = opts['maxdeltachainspan']
         if self._mmaplargeindex and 'mmapindexthreshold' in opts:
             mmapindexthreshold = opts['mmapindexthreshold']
-        self._sparserevlog = bool(opts.get('sparse-revlog', False))
-        withsparseread = bool(opts.get('with-sparse-read', False))
-        # sparse-revlog forces sparse-read
-        self._withsparseread = self._sparserevlog or withsparseread
         if 'sparse-read-density-threshold' in opts:
             self._srdensitythreshold = opts['sparse-read-density-threshold']
         if 'sparse-read-min-gap-size' in opts:
@@ -467,6 +469,8 @@ 
 
             self._inline = False
             self._generaldelta = False
+            self._sparserevlog = False
+            self._withsparseread = False
 
         elif fmt == REVLOGV1:
             if flags & ~REVLOGV1_FLAGS:
@@ -477,6 +481,11 @@ 
             self._inline = versionflags & FLAG_INLINE_DATA
             self._generaldelta = versionflags & FLAG_GENERALDELTA
 
+            self._sparserevlog = bool(opts.get('sparse-revlog', False))
+            withsparseread = bool(opts.get('with-sparse-read', False))
+            # sparse-revlog forces sparse-read
+            self._withsparseread = self._sparserevlog or withsparseread
+
         elif fmt == REVLOGV2:
             if flags & ~REVLOGV2_FLAGS:
                 raise error.RevlogError(_('unknown flags (%#04x) in version %d '
@@ -487,6 +496,9 @@ 
             # generaldelta implied by version 2 revlogs.
             self._generaldelta = True
 
+            self._sparserevlog = versionflags & FLAG_SPARSE_DELTA_CHAINS
+            self._withsparseread = versionflags & FLAG_SPARSE_DELTA_CHAINS
+
         else:
             raise error.RevlogError(_('unknown version (%d) in revlog %s') %
                                     (fmt, self.indexfile))
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -363,7 +363,7 @@ 
 
 # Increment the sub-version when the revlog v2 format changes to lock out old
 # clients.
-REVLOGV2_REQUIREMENT = 'exp-revlogv2.1'
+REVLOGV2_REQUIREMENT = 'exp-revlogv2.2'
 
 # A repository with the sparserevlog feature will have delta chains that
 # can spread over a larger span. Sparse reading cuts these large spans into
@@ -772,8 +772,8 @@ 
     options[b'sparse-read-min-gap-size'] = srmingapsize
 
     sparserevlog = SPARSEREVLOG_REQUIREMENT in requirements
-    options[b'sparse-revlog'] = sparserevlog
     if sparserevlog:
+        options[b'sparse-revlog'] = True
         options[b'generaldelta'] = True
 
     maxchainlen = None
diff --git a/mercurial/help/internals/revlogs.txt b/mercurial/help/internals/revlogs.txt
--- a/mercurial/help/internals/revlogs.txt
+++ b/mercurial/help/internals/revlogs.txt
@@ -66,6 +66,13 @@ 
 
 0
    Store revision data inline.
+1
+   Sparse delta chains. When set, delta chains (see below) may cover
+   long spans in the revlog and readers should perform sparse reads to
+   minimize the number of bytes that must be read to resolve a revision.
+   Historically, revlogs limited the total linear span between a revision's
+   first and last delta and readers performed a read of all the bytes
+   in between.
 
 The following header values are common:
 
@@ -162,6 +169,9 @@ 
 There is no dedicated *generaldelta* revlog format flag. Instead,
 the feature is implied enabled by default.
 
+There is a revlog format flag that indicates whether sparse delta chains
+and reading should be used.
+
 Delta Chains
 ============