Patchwork revlog: allow custom nodeids in subclasses

login
register
mail settings
Submitter Wojciech Lopata
Date Aug. 14, 2013, 4:53 p.m.
Message ID <1e1e57b10f2a19277814.1376499200@dev1179.prn1.facebook.com>
Download mbox | patch
Permalink /patch/2175/
State Changes Requested
Headers show

Comments

Wojciech Lopata - Aug. 14, 2013, 4:53 p.m.
# HG changeset patch
# User Wojciech Lopata <lopek@fb.com>
# Date 1376497815 25200
#      Wed Aug 14 09:30:15 2013 -0700
# Node ID 1e1e57b10f2a1927781408b8c01ab84dc84e8d3f
# Parent  df54cc67b91bea97c4fc299241a67aeecf099d00
revlog: allow custom nodeids in subclasses

Modify revlog class to allow its subclasses to use custom method of computing
nodeids. In particular this change is necesary to implement manifest
compression.
Matt Mackall - Aug. 15, 2013, 8:29 p.m.
On Wed, 2013-08-14 at 09:53 -0700, Wojciech Lopata wrote:
> # HG changeset patch
> # User Wojciech Lopata <lopek@fb.com>
> # Date 1376497815 25200
> #      Wed Aug 14 09:30:15 2013 -0700
> # Node ID 1e1e57b10f2a1927781408b8c01ab84dc84e8d3f
> # Parent  df54cc67b91bea97c4fc299241a67aeecf099d00
> revlog: allow custom nodeids in subclasses
> 
> Modify revlog class to allow its subclasses to use custom method of computing
> nodeids. In particular this change is necesary to implement manifest
> compression.
> 
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> old mode 100644
> new mode 100755
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -943,11 +943,15 @@
>  
>      def _checkhash(self, text, node, rev):
>          p1, p2 = self.parents(node)
> -        if node != hash(text, p1, p2):
> +        if not self._ishashcorrect(text, p1, p2, node):
>              raise RevlogError(_("integrity check failed on %s:%d")
>                                % (self.indexfile, rev))

Let's move the raise inside the method. Then the method can be renamed
to checkhash, with no return code.

> -    def addrevision(self, text, transaction, link, p1, p2, cachedelta=None):
> +    def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
> +                    node=None):

Separate patch, please.

> +            if not self._ishashcorrect(btext[0], p1, p2, node):
>                  raise RevlogError(_("consistency error in delta"))

You'll note this message is less informative than the earlier one.
Wojciech Lopata - Aug. 15, 2013, 11:29 p.m.
> From: Matt Mackall [mailto:mpm@selenic.com]
> Sent: Thursday, August 15, 2013 1:29 PM
> To: Wojciech Lopata
> Cc: mercurial-devel@selenic.com
> Subject: Re: [PATCH] revlog: allow custom nodeids in subclasses
> 
> On Wed, 2013-08-14 at 09:53 -0700, Wojciech Lopata wrote:
> > # HG changeset patch
> > # User Wojciech Lopata <lopek@fb.com>
> > # Date 1376497815 25200
> > #      Wed Aug 14 09:30:15 2013 -0700
> > # Node ID 1e1e57b10f2a1927781408b8c01ab84dc84e8d3f
> > # Parent  df54cc67b91bea97c4fc299241a67aeecf099d00
> > revlog: allow custom nodeids in subclasses
> >
> > Modify revlog class to allow its subclasses to use custom method of
> computing
> > nodeids. In particular this change is necesary to implement manifest
> > compression.
> >
> > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > old mode 100644
> > new mode 100755
> > --- a/mercurial/revlog.py
> > +++ b/mercurial/revlog.py
> > @@ -943,11 +943,15 @@
> >
> >      def _checkhash(self, text, node, rev):
> >          p1, p2 = self.parents(node)
> > -        if node != hash(text, p1, p2):
> > +        if not self._ishashcorrect(text, p1, p2, node):
> >              raise RevlogError(_("integrity check failed on %s:%d")
> >                                % (self.indexfile, rev))
> 
> Let's move the raise inside the method. Then the method can be renamed
> to checkhash, with no return code.
> 
> > -    def addrevision(self, text, transaction, link, p1, p2, cachedelta=None):
> > +    def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
> > +                    node=None):
> 
> Separate patch, please.
> 
> > +            if not self._ishashcorrect(btext[0], p1, p2, node):
> >                  raise RevlogError(_("consistency error in delta"))
> 
> You'll note this message is less informative than the earlier one.

Earlier message, that contains filename and rev, doesn't fit this situation - here we assert that nodeid and delta passed to _addrevision match (there is no concept of indexfile or rev yet). So I can't extract one 'checkhash' that would provide accurate error message in both situations. Unless I pass error message as argument to checkhash, what somewhat ruins idea of atomic oracle of hash correctness. Please advise.

> 
> --
> Mathematics is the supreme nostalgia of our time.
>
Matt Mackall - Aug. 19, 2013, 1:08 a.m.
On Thu, 2013-08-15 at 23:29 +0000, Wojciech Lopata wrote:
> > From: Matt Mackall [mailto:mpm@selenic.com]
> > Sent: Thursday, August 15, 2013 1:29 PM
> > To: Wojciech Lopata
> > Cc: mercurial-devel@selenic.com
> > Subject: Re: [PATCH] revlog: allow custom nodeids in subclasses
> > 
> > On Wed, 2013-08-14 at 09:53 -0700, Wojciech Lopata wrote:
> > > # HG changeset patch
> > > # User Wojciech Lopata <lopek@fb.com>
> > > # Date 1376497815 25200
> > > #      Wed Aug 14 09:30:15 2013 -0700
> > > # Node ID 1e1e57b10f2a1927781408b8c01ab84dc84e8d3f
> > > # Parent  df54cc67b91bea97c4fc299241a67aeecf099d00
> > > revlog: allow custom nodeids in subclasses
> > >
> > > Modify revlog class to allow its subclasses to use custom method of
> > computing
> > > nodeids. In particular this change is necesary to implement manifest
> > > compression.
> > >
> > > diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> > > old mode 100644
> > > new mode 100755
> > > --- a/mercurial/revlog.py
> > > +++ b/mercurial/revlog.py
> > > @@ -943,11 +943,15 @@
> > >
> > >      def _checkhash(self, text, node, rev):
> > >          p1, p2 = self.parents(node)
> > > -        if node != hash(text, p1, p2):
> > > +        if not self._ishashcorrect(text, p1, p2, node):
> > >              raise RevlogError(_("integrity check failed on %s:%d")
> > >                                % (self.indexfile, rev))
> > 
> > Let's move the raise inside the method. Then the method can be renamed
> > to checkhash, with no return code.
> > 
> > > -    def addrevision(self, text, transaction, link, p1, p2, cachedelta=None):
> > > +    def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
> > > +                    node=None):
> > 
> > Separate patch, please.
> > 
> > > +            if not self._ishashcorrect(btext[0], p1, p2, node):
> > >                  raise RevlogError(_("consistency error in delta"))
> > 
> > You'll note this message is less informative than the earlier one.
> 
> Earlier message, that contains filename and rev, doesn't fit this
> situation - here we assert that nodeid and delta passed to
> _addrevision match (there is no concept of indexfile or rev yet).

There definitely is an indexfile, it's given to revlog at init time. And
we don't really care about rev if we have a node.

Patch

diff --git a/mercurial/revlog.py b/mercurial/revlog.py
old mode 100644
new mode 100755
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -943,11 +943,15 @@ 
 
     def _checkhash(self, text, node, rev):
         p1, p2 = self.parents(node)
-        if node != hash(text, p1, p2):
+        if not self._ishashcorrect(text, p1, p2, node):
             raise RevlogError(_("integrity check failed on %s:%d")
                               % (self.indexfile, rev))
         return text
 
+    def _ishashcorrect(self, text, p1, p2, node):
+        # subclass will override this method if it uses custom hashing strategy
+        return node == hash(text, p1, p2)
+
     def checkinlinesize(self, tr, fp=None):
         if not self._inline or (self.start(-2) + self.length(-2)) < _maxinline:
             return
@@ -987,7 +991,8 @@ 
         tr.replace(self.indexfile, trindex * self._io.size)
         self._chunkclear()
 
-    def addrevision(self, text, transaction, link, p1, p2, cachedelta=None):
+    def addrevision(self, text, transaction, link, p1, p2, cachedelta=None,
+                    node=None):
         """add a revision to the log
 
         text - the revision data to add
@@ -995,11 +1000,14 @@ 
         link - the linkrev data to add
         p1, p2 - the parent nodeids of the revision
         cachedelta - an optional precomputed delta
+        node - nodeid of revision; typically node is not specified, and it is
+            computed by default as hash(text, p1, p2), however subclasses might
+            use different hashing method (they need to override _ishashcorrect)
         """
         if link == nullrev:
             raise RevlogError(_("attempted to add linkrev -1 to %s")
                               % self.indexfile)
-        node = hash(text, p1, p2)
+        node = node or hash(text, p1, p2)
         if node in self.nodemap:
             return node
 
@@ -1063,8 +1071,7 @@ 
             ifh.flush()
             basetext = self.revision(self.node(cachedelta[0]))
             btext[0] = mdiff.patch(basetext, cachedelta[1])
-            chk = hash(btext[0], p1, p2)
-            if chk != node:
+            if not self._ishashcorrect(btext[0], p1, p2, node):
                 raise RevlogError(_("consistency error in delta"))
             return btext[0]