Patchwork D7313: revlog: move the nodemap into the index object (for pure)

login
register
mail settings
Submitter phabricator
Date Nov. 8, 2019, 9:30 a.m.
Message ID <differential-rev-PHID-DREV-mqf2rqj7duuhldesknes-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/42910/
State Superseded
Headers show

Comments

phabricator - Nov. 8, 2019, 9:30 a.m.
marmoute created this revision.
Herald added a reviewer: indygreg.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This make the pure code closer to the C extension one. The ultimate goal is to
  merge the two into a single object and offer a unified API. This changeset
  focus on gathering the data on the same object.
  
  For now the code for `revlogoldindex` and `BaseIndexObject` index object are
  quite similar. However, there will be larger divergence later on, so I don't
  think is worth doing a base case.
  
  This work is part of a refactoring to unify the revlog index and the nodemap.
  This unification prepare the use of a persistent nodemap.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D7313

AFFECTED FILES
  mercurial/pure/parsers.py
  mercurial/revlog.py

CHANGE DETAILS




To: marmoute, indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - Nov. 8, 2019, 7:31 p.m.
indygreg added a comment.


  This seems OK to me aside from the use of `range()`, which we don't want since `len()` can be massive and this could create perf problems on Python 2.

INLINE COMMENTS

> parsers.py:53
> +        nodemap = revlogutils.NodeMap({nullid: nullrev})
> +        for r in range(0, len(self)):
> +            n = self[r][7]

We want this to be `pycompat.xrange`.

> revlog.py:211
> +        nodemap = revlogutils.NodeMap({nullid: nullrev})
> +        for r in range(0, len(self)):
> +            n = self[r][7]

`pycompat.xrange`.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7313/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7313

To: marmoute, indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - Nov. 9, 2019, 12:22 a.m.
marmoute added a comment.


  In D7313#107559 <https://phab.mercurial-scm.org/D7313#107559>, @indygreg wrote:
  
  > This seems OK to me aside from the use of `range()`, which we don't want since `len()` can be massive and this could create perf problems on Python 2.
  
  I am not too convinced by a switch to `range()`. I've two main reason:
  
  1. this is the "pure" version, so people are unlikely to use it on large repository, they would get in other large performance issue before that.
  2. we are about to allocate a full nodemap, this will be significantly slower than the associated `range()` call.
  
  Do you still cant the pycompat.xrange change ?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7313/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7313

To: marmoute, indygreg, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -205,6 +205,17 @@ 
 
 
 class revlogoldindex(list):
+    @util.propertycache
+    def nodemap(self):
+        nodemap = revlogutils.NodeMap({nullid: nullrev})
+        for r in range(0, len(self)):
+            n = self[r][7]
+            nodemap[n] = r
+        return nodemap
+
+    def clearcaches(self):
+        self.__dict__.pop('nodemap', None)
+
     def __getitem__(self, i):
         if i == -1:
             return (0, 0, 0, -1, -1, -1, -1, nullid)
@@ -240,7 +251,8 @@ 
             nodemap[e[6]] = n
             n += 1
 
-        return revlogoldindex(index), nodemap, None
+        index = revlogoldindex(index)
+        return index, index.nodemap, None
 
     def packentry(self, entry, node, version, rev):
         if gettype(entry[0]):
@@ -287,7 +299,7 @@ 
     def parseindex(self, data, inline):
         # call the C implementation to parse the index data
         index, cache = parsers.parse_index2(data, inline)
-        return index, getattr(index, 'nodemap', None), cache
+        return index, index.nodemap, cache
 
     def packentry(self, entry, node, version, rev):
         p = indexformatng_pack(*entry)
@@ -372,11 +384,11 @@ 
         self._chunkcachesize = 65536
         self._maxchainlen = None
         self._deltabothparents = True
-        self.index = []
+        self.index = None
         # Mapping of partial identifiers to full nodes.
         self._pcache = {}
         # Mapping of revision integer to full node.
-        self._nodecache = revlogutils.NodeMap({nullid: nullrev})
+        self._nodecache = None
         self._nodepos = None
         self._compengine = b'zlib'
         self._compengineopts = {}
@@ -541,8 +553,7 @@ 
                 _(b"index %s is corrupted") % self.indexfile
             )
         self.index, nodemap, self._chunkcache = d
-        if nodemap is not None:
-            self.nodemap = self._nodecache = nodemap
+        self.nodemap = self._nodecache = nodemap
         if not self._chunkcache:
             self._chunkclear()
         # revnum -> (chain-length, sum-delta-length)
@@ -646,15 +657,7 @@ 
         self._chainbasecache.clear()
         self._chunkcache = (0, b'')
         self._pcache = {}
-
-        try:
-            # If we are using the native C version, you are in a fun case
-            # where self.index, self.nodemap and self._nodecaches is the same
-            # object.
-            self._nodecache.clearcaches()
-        except AttributeError:
-            self._nodecache = revlogutils.NodeMap({nullid: nullrev})
-            self._nodepos = None
+        self.index.clearcaches()
 
     def rev(self, node):
         try:
@@ -662,29 +665,10 @@ 
         except TypeError:
             raise
         except error.RevlogError:
-            if not isinstance(self._nodecache, revlogutils.NodeMap):
-                # parsers.c radix tree lookup failed
-                if node == wdirid or node in wdirfilenodeids:
-                    raise error.WdirUnsupported
-                raise error.LookupError(node, self.indexfile, _(b'no node'))
-            else:
-                # pure python cache lookup failed
-                n = self._nodecache
-                i = self.index
-                p = self._nodepos
-                if p is None:
-                    p = len(i) - 1
-                else:
-                    assert p < len(i)
-                for r in pycompat.xrange(p, -1, -1):
-                    v = i[r][7]
-                    n[v] = r
-                    if v == node:
-                        self._nodepos = r - 1
-                        return r
-                if node == wdirid or node in wdirfilenodeids:
-                    raise error.WdirUnsupported
-                raise error.LookupError(node, self.indexfile, _(b'no node'))
+            # parsers.c radix tree lookup failed
+            if node == wdirid or node in wdirfilenodeids:
+                raise error.WdirUnsupported
+            raise error.LookupError(node, self.indexfile, _(b'no node'))
 
     # Accessors for index entries.
 
diff --git a/mercurial/pure/parsers.py b/mercurial/pure/parsers.py
--- a/mercurial/pure/parsers.py
+++ b/mercurial/pure/parsers.py
@@ -10,8 +10,12 @@ 
 import struct
 import zlib
 
-from ..node import nullid
-from .. import pycompat
+from ..node import nullid, nullrev
+from .. import (
+    pycompat,
+    revlogutils,
+    util,
+)
 
 stringio = pycompat.bytesio
 
@@ -43,6 +47,17 @@ 
 
 
 class BaseIndexObject(object):
+    @util.propertycache
+    def nodemap(self):
+        nodemap = revlogutils.NodeMap({nullid: nullrev})
+        for r in range(0, len(self)):
+            n = self[r][7]
+            nodemap[n] = r
+        return nodemap
+
+    def clearcaches(self):
+        self.__dict__.pop('nodemap', None)
+
     def __len__(self):
         return self._lgt + len(self._extra)