Patchwork [2,of,6] revlog: add flagprocessor

login
register
mail settings
Submitter Remi Chaintron
Date Oct. 27, 2016, 3:04 p.m.
Message ID <c16fc21427f7d3db2adc.1477580640@remi-mbp2.dhcp.thefacebook.com>
Download mbox | patch
Permalink /patch/17203/
State Deferred
Headers show

Comments

Remi Chaintron - Oct. 27, 2016, 3:04 p.m.
# HG changeset patch
# User Remi Chaintron <remi@fb.com>
# Date 1477579974 -3600
#      Thu Oct 27 15:52:54 2016 +0100
# Branch stable
# Node ID c16fc21427f7d3db2adce1d34b6a8f7fb04f9ca7
# Parent  7423a5fcbc05f69c84a9e0eb4d4b4a94444d83f0
revlog: add flagprocessor

This adds a mechanism for extensions/subclasses of revlog to register transforms
on revlog operations and manage flags priority to handle non-reflexive
operations and lays the groundwork for the lfs extension's implementation.
Augie Fackler - Nov. 9, 2016, 5:17 p.m.
On Thu, Oct 27, 2016 at 04:04:00PM +0100, Remi Chaintron wrote:
> # HG changeset patch
> # User Remi Chaintron <remi@fb.com>
> # Date 1477579974 -3600
> #      Thu Oct 27 15:52:54 2016 +0100
> # Branch stable
> # Node ID c16fc21427f7d3db2adce1d34b6a8f7fb04f9ca7
> # Parent  7423a5fcbc05f69c84a9e0eb4d4b4a94444d83f0
> revlog: add flagprocessor
>
> This adds a mechanism for extensions/subclasses of revlog to register transforms
> on revlog operations and manage flags priority to handle non-reflexive
> operations and lays the groundwork for the lfs extension's implementation.

[...]

>
> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
> --- a/mercurial/revlog.py
> +++ b/mercurial/revlog.py
> @@ -76,6 +76,76 @@
>
>  _nullhash = hashlib.sha1(nullid)
>
> +class flagprocessor(object):
> +    """process revlog objects contents based on flags
> +
> +    flagprocessor objects are the interface between revlog objects and
> +    subclasses/extensions needing to change or update the contents of revlog
> +    objects, based on flags.
> +
> +    The role of the flagprocessor is to allow defining priorities between known
> +    extensions and register transforms to be applied on detecting such flags on
> +    revlog objects.
> +
> +    Defining priorities allows for non-reflexive operations to be composed and
> +    gracefully handled in reverse order depending on the operation on the revlog
> +    object (read or write).
> +    """
> +
> +    def __init__(self, revlogobject):
> +        self.flagsbypriority = [
> +            REVIDX_ISCENSORED,
> +        ]
> +        self.transformmap = {}
> +        self.revlogobject = revlogobject
> +
> +    def register(self, transformmap):
> +        """Register transforms to be applied for flags.
> +
> +        ``transformmap`` - a map of flag to transform
> +        """
> +        for flag in transformmap:
> +            if flag in self.flagsbypriority:
> +                self.transformmap[flag] = transformmap[flag]

It weirds me out a little to have a registrar function, but then to
have a flagsbypriority attribute that I guess clients just twiddle
manually. Could we come up with a better registrar mechanism here? I
guess we're trying to allow for a stable total ordering of the flag
processors? Can you elaborate why we'd need that?
Remi Chaintron - Nov. 16, 2016, 1:59 p.m.
On 11/9/16, 5:17 PM, "Augie Fackler" <raf@durin42.com> wrote:



    On Thu, Oct 27, 2016 at 04:04:00PM +0100, Remi Chaintron wrote:

    > # HG changeset patch


    > # User Remi Chaintron <remi@fb.com>


    > # Date 1477579974 -3600


    > #      Thu Oct 27 15:52:54 2016 +0100


    > # Branch stable


    > # Node ID c16fc21427f7d3db2adce1d34b6a8f7fb04f9ca7


    > # Parent  7423a5fcbc05f69c84a9e0eb4d4b4a94444d83f0


    > revlog: add flagprocessor


    >


    > This adds a mechanism for extensions/subclasses of revlog to register transforms


    > on revlog operations and manage flags priority to handle non-reflexive


    > operations and lays the groundwork for the lfs extension's implementation.




    [...]



    >


    > diff --git a/mercurial/revlog.py b/mercurial/revlog.py


    > --- a/mercurial/revlog.py


    > +++ b/mercurial/revlog.py


    > @@ -76,6 +76,76 @@


    >


    >  _nullhash = hashlib.sha1(nullid)


    >


    > +class flagprocessor(object):


    > +    """process revlog objects contents based on flags


    > +


    > +    flagprocessor objects are the interface between revlog objects and


    > +    subclasses/extensions needing to change or update the contents of revlog


    > +    objects, based on flags.


    > +


    > +    The role of the flagprocessor is to allow defining priorities between known


    > +    extensions and register transforms to be applied on detecting such flags on


    > +    revlog objects.


    > +


    > +    Defining priorities allows for non-reflexive operations to be composed and


    > +    gracefully handled in reverse order depending on the operation on the revlog


    > +    object (read or write).


    > +    """


    > +


    > +    def __init__(self, revlogobject):


    > +        self.flagsbypriority = [


    > +            REVIDX_ISCENSORED,


    > +        ]


    > +        self.transformmap = {}


    > +        self.revlogobject = revlogobject


    > +


    > +    def register(self, transformmap):


   > +        """Register transforms to be applied for flags.


    > +


    > +        ``transformmap`` - a map of flag to transform


    > +        """


    > +        for flag in transformmap:


    > +            if flag in self.flagsbypriority:


    > +                self.transformmap[flag] = transformmap[flag]




    It weirds me out a little to have a registrar function, but then to

    have a flagsbypriority attribute that I guess clients just twiddle

    manually. Could we come up with a better registrar mechanism here? I

    guess we're trying to allow for a stable total ordering of the flag

    processors? Can you elaborate why we'd need that?



The need for keeping the ordering of the flags when applying transforms on the contents of the filelog objects comes from the need to handle non-commutative operations (contrary to the commented erroneous non-reflexive).

In such cases, the operations need to be applied in the reverse order when read, compared to when written: A(B(text) -> newtext vs. B(A(newtext) -> text.

It is therefore important the ordering is kept, which indeed might create problems when opening the flagsbypriority attribute to modification by clients. Would you feel better about moving the ordering of such flags out of the flag processor in this case?
Augie Fackler - Nov. 16, 2016, 3:15 p.m.
(side note: whatever mail client you're using is really quite awful at handling quoting, and reading inline responses is somewhat confusing :/ )

On Nov 16, 2016, at 08:59, RĂ©mi Chaintron <remi@fb.com> wrote:
>>> 
>>>     > +        for flag in transformmap:
>>>     > +            if flag in self.flagsbypriority:
>>>     > +                self.transformmap[flag] = transformmap[flag]
>>     
>>     It weirds me out a little to have a registrar function, but then to
>>     have a flagsbypriority attribute that I guess clients just twiddle
>>     manually. Could we come up with a better registrar mechanism here? I
>>     guess we're trying to allow for a stable total ordering of the flag
>>     processors? Can you elaborate why we'd need that?
>>  
> The need for keeping the ordering of the flags when applying transforms on the contents of the filelog objects comes from the need to handle non-commutative operations (contrary to the commented erroneous non-reflexive).
> In such cases, the operations need to be applied in the reverse order when read, compared to when written: A(B(text) -> newtext vs. B(A(newtext) -> text.
> 
> It is therefore important the ordering is kept, which indeed might create problems when opening the flagsbypriority attribute to modification by clients. Would you feel better about moving the ordering of such flags out of the flag processor in this case?

I'm not yet sure I understand how we can fix the problem. Would it be sufficient to have the registrar take something like

register(name, function, priority)

where priority is an int or an entry in some enumeration we keep in hg? Then we'd do any tie-breaking using name?

(I guess the problem is if you've got a censored node of a lfs-managed file, you need to be sure the functions apply in the same order all the time - if not please correct that understanding.)

Patch

diff --git a/mercurial/bundlerepo.py b/mercurial/bundlerepo.py
--- a/mercurial/bundlerepo.py
+++ b/mercurial/bundlerepo.py
@@ -147,7 +147,9 @@ 
             delta = self._chunk(chain.pop())
             text = mdiff.patches(text, [delta])
 
-        self.checkhash(text, node, rev=rev)
+        text, validatehash = self.processflags(node, text, self.flags(rev))
+        if validatehash is True:
+            self.checkhash(text, node, rev=rev)
         self._cache = (node, rev, text)
         return text
 
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -76,6 +76,76 @@ 
 
 _nullhash = hashlib.sha1(nullid)
 
+class flagprocessor(object):
+    """process revlog objects contents based on flags
+
+    flagprocessor objects are the interface between revlog objects and
+    subclasses/extensions needing to change or update the contents of revlog
+    objects, based on flags.
+
+    The role of the flagprocessor is to allow defining priorities between known
+    extensions and register transforms to be applied on detecting such flags on
+    revlog objects.
+
+    Defining priorities allows for non-reflexive operations to be composed and
+    gracefully handled in reverse order depending on the operation on the revlog
+    object (read or write).
+    """
+
+    def __init__(self, revlogobject):
+        self.flagsbypriority = [
+            REVIDX_ISCENSORED,
+        ]
+        self.transformmap = {}
+        self.revlogobject = revlogobject
+
+    def register(self, transformmap):
+        """Register transforms to be applied for flags.
+
+        ``transformmap`` - a map of flag to transform
+        """
+        for flag in transformmap:
+            if flag in self.flagsbypriority:
+                self.transformmap[flag] = transformmap[flag]
+
+    def unregister(self, transformmap):
+        """Unregister transforms for flags."""
+        for flag in transformmap:
+            if flag in self.flagsbypriority:
+                self.transformmap[flag] = None
+
+    def processflags(self, node, text, flags, reverse=False):
+        """Process flags and apply registered transforms.
+
+        ``node`` - the noideid of revision
+        ``text`` - the revision data to process
+        ``flags`` - the flags applied to the revision
+        ``reverse`` - an optional argument describing whether the flags should
+          be processed in order according to the ``flagprocessor`` flag
+          priority. The flags should be processed in order of priority when
+          reading revisions, and in reverse order when writing revisions.
+          This allows for non-reflexive operations in extensions.
+
+        Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is the
+        processed text and ``validatehash`` is a bool indicating whether the
+        returned text should be checked for hash integrity.
+        """
+        validatehash = True
+
+        # Get flags by priority. Depending on the operation (read of write), the
+        # order might be reversed due to non-reflexive transforms.
+        orderedflags = self.flagsbypriority
+        if reverse is True:
+            orderedflags = reversed(orderedflags)
+
+        for flag in orderedflags:
+            # If there is a transform registered for the flag, apply and update
+            # return tuple
+            f = self.transformmap.get(flag & flags, None)
+            if f is not None:
+                text, validatehash = f(self.revlogobject, text)
+        return text, validatehash
+
 def hash(text, p1, p2):
     """generate a hash from the given text and its parent hashes
 
@@ -285,6 +355,7 @@ 
         self.version = v
         self._inline = v & REVLOGNGINLINEDATA
         self._generaldelta = v & REVLOGGENERALDELTA
+        self.flagprocessor = flagprocessor(self)
         flags = v & ~0xFFFF
         fmt = v & 0xFFFF
         if fmt == REVLOGV0 and flags:
@@ -1221,7 +1292,11 @@ 
             bins = bins[1:]
 
         text = mdiff.patches(text, bins)
-        self.checkhash(text, node, rev=rev)
+
+        text, validatehash = self.processflags(node, text, self.flags(rev))
+        if validatehash is True:
+            self.checkhash(text, node, rev=rev)
+
         self._cache = (node, rev, text)
         return text
 
@@ -1233,6 +1308,28 @@ 
         """
         return hash(text, p1, p2)
 
+    def processflags(self, node, text, flags, reverse=False):
+        """Process flags and apply registered transforms.
+
+        cf. `flagprocessor.processflags`` method for argument and return values
+        description.
+
+        ``node`` - the noideid of revision
+        ``text`` - the revision data to process
+        ``flags`` - the flags applied to the revision
+        ``reverse`` - an optional argument describing whether the flags should
+          be processed in order according to the ``flagprocessor`` flag
+          priority. The flags should be processed in order of priority when
+          reading revisions, and in reverse order when writing revisions.
+          This allows for non-reflexive operations in extensions.
+
+        Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is the
+        processed text and ``validatehash`` is a bool indicating whether the
+        returned text should be checked for hash integrity.
+        """
+        return self.flagprocessor.processflags(node, text, flags,
+                                               reverse=reverse)
+
     def checkhash(self, text, node, p1=None, p2=None, rev=None):
         """Check node hash integrity.
 
@@ -1414,8 +1511,12 @@ 
                     fh = dfh
                 basetext = self.revision(self.node(baserev), _df=fh)
                 btext[0] = mdiff.patch(basetext, delta)
+
             try:
-                self.checkhash(btext[0], node, p1=p1, p2=p2)
+                btext[0], validatehash = self.processflags(node, btext[0],
+                                                           flags, reverse=True)
+                if validatehash is True:
+                    self.checkhash(btext[0], node, p1=p1, p2=p2)
                 if flags & REVIDX_ISCENSORED:
                     raise RevlogError(_('node %s is not censored') % node)
             except CensoredNodeError: