Patchwork [2,of,5,v4] revlog: add flagprocessor

login
register
mail settings
Submitter Remi Chaintron
Date Nov. 23, 2016, 5:39 p.m.
Message ID <eb24cc60a279d614b318.1479922777@iphonesleepsort.dhcp.thefacebook.com>
Download mbox | patch
Permalink /patch/17726/
State Changes Requested
Headers show

Comments

Remi Chaintron - Nov. 23, 2016, 5:39 p.m.
# HG changeset patch
# User Remi Chaintron <remi@fb.com>
# Date 1479916365 0
#      Wed Nov 23 15:52:45 2016 +0000
# Branch stable
# Node ID eb24cc60a279d614b3181a84464200bbcf5f6bb4
# Parent  e908dd63d485424df4c4a4482b742d82652e2893
revlog: add flagprocessor

Add a mechanism for extensions/subclasses of revlog to register transforms on
revlog operations, based on the revision flags.
Due to some operations being non-commutative, transforms applied in a specific
order when writing the contents of the revlog need to be applied in the reverse
order when reading.
In order to allow the composition of such operations, the flagprocessor applies
flags in the stable order defined by REVIDX_FLAGS_PROCESSING_ORDER constant.

Due to the fact storing flags on the server side might not be enabled, the
flagprocessor class also handles the case when no flags are set. In such a case,
it defers to enabled extensions to register default transformations and use
heuristics to determine what to do with the observed contents.
Rémi Chaintron - Nov. 29, 2016, 5:16 p.m.
Thanks for the awesome review.
I included the changes in my current version and will update the stack once
I'm done with all required changes.

The approach I'm currently following relies on getting rid of the
flagprocessor object, instead relying on a single revlog.processflags()
method and an OrderedDict containing the flags and their associated
transforms.
The hybrid "no flags" design is going away too, which makes the code really
simpler.

On Tue, 29 Nov 2016 at 06:59 Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>  # max size of revlog with inline data
>  _maxinline = 131072
> @@ -76,6 +80,76 @@
>
>  _nullhash = hashlib.sha1(nullid)
>
> +class flagprocessor(object):
> +    """process revlog objects contents based on flags
> +
> +    flagprocessor objects are the interface between revlog objects and
> +    subclasses/extensions needing to change or update the contents of
revlog
> +    objects, based on flags.
> +
> +    The role of the flagprocessor is to apply transforms registered by
> +    extensions and revlog subclasses in the order defined by
> +    REVIDX_FLAGS_PROCESSING_ORDER and the type (read or write) of
operation.
> +    This allows the flag processor to modify the contents of revlog
objects and
> +    handle the composition of non-commutative operations.
> +    """
> +
> +    def __init__(self, revlogobject):
> +        self.transformmap = {}
> +        self.revlogobject = revlogobject

I'm not sure why we need a full object here, it seems like we just need
a dictionary mapping 'flag' to 'processors' and a function running them.
In the Mercurial code base we try to avoid creating too much new class.
It seems like we could do without the class here (unless I'm missing
something).

In addition to this local custom:

* an individual flagprocessor object is created for each revlog. I'm not
sure why? Can't we just use one mapping (as we do for ordering?)

* having a the revlog referencing the flagprocessor which itself
reference the revlog create a reference cycle. We avoid creating
reference cycle in Mercurial (both for design purpose and because python
is not great with them)


Good point, I have now removed the flagprocessor altogether, which looks
much nicer!


> +
> +    def register(self, transformmap):
> +        """Register transforms to be applied for flags.
> +
> +        ``transformmap`` - a map of flag to transform
> +        """
> +        for flag in transformmap:
> +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> +                self.transformmap[flag] = transformmap[flag]

* unknown flag seems to be silently ignored. It would be better to just
fail here.

* we have a mechanism to load extra elements (like flag processors are).
Have a look at 'mercurial.dispatch.extraloaders' (or use a registrar
API, but that one is probably overkill here)

* we do not handle collision at all, if multiple extensions try to
register a process for the same flag, we should not let the last one
just overwrite the other one. (Abort is the simplest handling).


As part of moving the flagprocessor out, I now have a single
'processflags(self, text, flags, reverse=True)' method in revlog, which I
updated to take care of unknown flags (which was previously done in
'revlog.revision()').

Collisions were explicitly not handled, as we were until now relying on a
flag per extension, but this is something I'm happy to discuss.

The current updated version I'm working on simply relies on the following
constant:
REVIDX_FLAGS_TRANSFORMS = collections.OrderedDict({
    REVIDX_ISCENSORED: None,
    REVIDX_ISLFS: None,
})
This takes care of both the ordering and a way to register transforms, but
this enforces using only one transform per flag.




> +
> +    def unregister(self, transformmap):
> +        """Unregister transforms for flags."""
> +        for flag in transformmap:
> +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> +                self.transformmap[flag] = None

What is the usecase for unregistering? Since we do not call the process
if the flag is not set, I can't think of a case were we need to
unregister. Am I missing something?


I will get rid of this, as we're moving away from non-flag based transforms
(cf. below).


> +    def processflags(self, node, text, revisionflags, reverse=False):
> +        """Process flags and apply registered transforms.
> +
> +        ``node`` - the noideid of revision
> +        ``text`` - the revision data to process
> +        ``revisionflags`` - the flags applied to the revision
> +        ``reverse`` - an optional argument describing whether the flags
should
> +          be processed in order according to the ``flagprocessor`` flag
> +          priority. The flags should be processed in order of priority
when
> +          reading revisions, and in reverse order when writing revisions.
> +
> +        Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is
the
> +        processed text and ``validatehash`` is a bool indicating whether
the
> +        returned text should be checked for hash integrity.
> +        """
> +        validatehash = True
> +
> +        # Depending on the operation (read or write), the order might be
> +        # reversed due to non-commutative transforms.
> +        orderedflags = REVIDX_FLAGS_PROCESSING_ORDER
> +        if reverse is True:

You want 'if reverse' here.


(Y)



> +            orderedflags = reversed(orderedflags)
> +
> +        for flag in orderedflags:
> +            # If a transform has been registered for a known flag, apply
it and
> +            # update the result tuple.
> +            # If no flag is set but transforms have been registered, it
is
> +            # assumed that the repository does not send flags over the
wire and
> +            # the extensions implement heuristics to determine what to
do based
> +            # on the contents of the filelog.

This special case on no flag set seems very suspicious to me.

* first it seems like there is two different case here (one were we just
use the flag and one were we check if we need to install a flag). Having
seem sharing the same code path and rely on heuristic to trigger
themselves seems sloppy, we should be more explicit about the case we
are in.

* This part about "does not send flags over the wire" is a bit strange
to me, can you elaborate ? We should enforce that peer involved support
exchanging the flag before exchanging content where the flag matters.


This is something Durham proposed a while ago, and we re-discussed with
Ryan McElroy today. The original idea was to allow repositories to not save
flags in the backend, by relying on heuristics instead to determine what to
do with filelog contents. We however do not like having to rely on
heuristics and would rather enforce sending the flags over the wire, to the
cost of updating backend dbs schemas to store the flags (defaulting to
zero, which would give REVIDX_DEFAULT_FLAGS as a default value) before
enabling lfs.

We're going to move away from this hybrid design and rely on flags only in
the update.

> +            if flag & revisionflags or revisionflags ==
REVIDX_DEFAULT_FLAGS:
> +                f = self.transformmap.get(flag, None)
> +                if f is not None:
> +                    text, validatehash = f(self.revlogobject, text)

If I read it correctly, this code is not checking for unknown flag (a
flag set but missing from REVIDX_FLAGS_PROCESSING_ORDER. Not processing
a flag seems very serious and we should not let it pass silently.


handled correctly in the new design.


> +
> +        return text, validatehash
> +
>  def hash(text, p1, p2):
>      """generate a hash from the given text and its parent hashes
>
> @@ -285,6 +359,7 @@
>          self.version = v
>          self._inline = v & REVLOGNGINLINEDATA
>          self._generaldelta = v & REVLOGGENERALDELTA
> +        self.flagprocessor = flagprocessor(self)

(note: illustration the cycle we are creating)


Will update with removed flagprocessor, removing cycling references.



>          flags = v & ~0xFFFF
>          fmt = v & 0xFFFF
>          if fmt == REVLOGV0 and flags:
> @@ -1221,7 +1296,11 @@
>              bins = bins[1:]
>
>          text = mdiff.patches(text, bins)
> -        self.checkhash(text, node, rev=rev)
> +
> +        text, validatehash = self.processflags(node, text,
self.flags(rev))
> +        if validatehash is True:

'if validatehash'


(Y)



> +            self.checkhash(text, node, rev=rev)
> +
>          self._cache = (node, rev, text)
>          return text
>
> @@ -1233,6 +1312,28 @@
>          """
>          return hash(text, p1, p2)
>
> +    def processflags(self, node, text, flags, reverse=False):
> +        """Process flags and apply registered transforms.
> +
> +        cf. `flagprocessor.processflags`` method for argument and return
values
> +        description.
> +
> +        ``node`` - the noideid of revision
> +        ``text`` - the revision data to process
> +        ``flags`` - the flags applied to the revision
> +        ``reverse`` - an optional argument describing whether the flags
should
> +          be processed in order according to the ``flagprocessor`` flag
> +          priority. The flags should be processed in order of priority
when
> +          reading revisions, and in reverse order when writing revisions.
> +          This allows for non-reflexive operations in extensions.
> +
> +        Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is
the
> +        processed text and ``validatehash`` is a bool indicating whether
the
> +        returned text should be checked for hash integrity.
> +        """
> +        return self.flagprocessor.processflags(node, text, flags,
> +                                               reverse=reverse)

That function seems to be a plain forward to flagprocessors. Why not
calling the flag processors method directly

(well as pointed above, I think we should entirely skip the object)


(Y)
Pierre-Yves David - Dec. 3, 2016, 2:49 a.m.
On 11/29/2016 06:16 PM, Rémi Chaintron wrote:
> Thanks for the awesome review.
> I included the changes in my current version and will update the stack
> once I'm done with all required changes.
>
> The approach I'm currently following relies on getting rid of the
> flagprocessor object, instead relying on a single revlog.processflags()
> method and an OrderedDict containing the flags and their associated
> transforms.

I don't think OrderedDict is suitable here. OrderedDict remember order 
it which value have been added. Inserting something in that order (or 
manipulating the order in genera) is a bit impractical. Your dict + list 
approach was good in my opinion.

> The hybrid "no flags" design is going away too, which makes the code
> really simpler.
>
> On Tue, 29 Nov 2016 at 06:59 Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>     >
>     >  # max size of revlog with inline data
>     >  _maxinline = 131072
>     > @@ -76,6 +80,76 @@
>     >
>     >  _nullhash = hashlib.sha1(nullid)
>     >
>     > +class flagprocessor(object):
>     > +    """process revlog objects contents based on flags
>     > +
>     > +    flagprocessor objects are the interface between revlog
>     objects and
>     > +    subclasses/extensions needing to change or update the
>     contents of revlog
>     > +    objects, based on flags.
>     > +
>     > +    The role of the flagprocessor is to apply transforms
>     registered by
>     > +    extensions and revlog subclasses in the order defined by
>     > +    REVIDX_FLAGS_PROCESSING_ORDER and the type (read or write) of
>     operation.
>     > +    This allows the flag processor to modify the contents of
>     revlog objects and
>     > +    handle the composition of non-commutative operations.
>     > +    """
>     > +
>     > +    def __init__(self, revlogobject):
>     > +        self.transformmap = {}
>     > +        self.revlogobject = revlogobject
>
>     I'm not sure why we need a full object here, it seems like we just need
>     a dictionary mapping 'flag' to 'processors' and a function running them.
>     In the Mercurial code base we try to avoid creating too much new class.
>     It seems like we could do without the class here (unless I'm missing
>     something).
>
>     In addition to this local custom:
>
>     * an individual flagprocessor object is created for each revlog. I'm not
>     sure why? Can't we just use one mapping (as we do for ordering?)
>
>     * having a the revlog referencing the flagprocessor which itself
>     reference the revlog create a reference cycle. We avoid creating
>     reference cycle in Mercurial (both for design purpose and because python
>     is not great with them)
>
>
> Good point, I have now removed the flagprocessor altogether, which looks
> much nicer!
>
>
>     > +
>     > +    def register(self, transformmap):
>     > +        """Register transforms to be applied for flags.
>     > +
>     > +        ``transformmap`` - a map of flag to transform
>     > +        """
>     > +        for flag in transformmap:
>     > +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
>     > +                self.transformmap[flag] = transformmap[flag]
>
>     * unknown flag seems to be silently ignored. It would be better to just
>     fail here.
>
>     * we have a mechanism to load extra elements (like flag processors are).
>     Have a look at 'mercurial.dispatch.extraloaders' (or use a registrar
>     API, but that one is probably overkill here)
>
>     * we do not handle collision at all, if multiple extensions try to
>     register a process for the same flag, we should not let the last one
>     just overwrite the other one. (Abort is the simplest handling).
>
>
> As part of moving the flagprocessor out, I now have a single
> 'processflags(self, text, flags, reverse=True)' method in revlog, which
> I updated to take care of unknown flags (which was previously done in
> 'revlog.revision()').
>
> Collisions were explicitly not handled, as we were until now relying on
> a flag per extension, but this is something I'm happy to discuss.

We cannot exclude the change for a collision, especially because 
multiple extension could use the same flag without having registered it 
with Mercurial ever. In addition, as discussed in Patch4 it would make 
sense to have multiple extensions implement different variant of the 
same flag/behavior.


> The current updated version I'm working on simply relies on the
> following constant:
> REVIDX_FLAGS_TRANSFORMS = collections.OrderedDict({
>     REVIDX_ISCENSORED: None,
>     REVIDX_ISLFS: None,
> })
> This takes care of both the ordering and a way to register transforms,
> but this enforces using only one transform per flag.

Not that in the example you provide, You are feeding a dict to 
OrderedDict, so the ordering is lost before reaching the OrderedDict.

In addition, using an ordered dict makes is hard to third party 
extension to add and use any non-declared flag.

>     > +
>     > +    def unregister(self, transformmap):
>     > +        """Unregister transforms for flags."""
>     > +        for flag in transformmap:
>     > +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
>     > +                self.transformmap[flag] = None
>
>     What is the usecase for unregistering? Since we do not call the process
>     if the flag is not set, I can't think of a case were we need to
>     unregister. Am I missing something?
>
>
> I will get rid of this, as we're moving away from non-flag based
> transforms (cf. below).
>
>
>     > +    def processflags(self, node, text, revisionflags, reverse=False):
>     > +        """Process flags and apply registered transforms.
>     > +
>     > +        ``node`` - the noideid of revision
>     > +        ``text`` - the revision data to process
>     > +        ``revisionflags`` - the flags applied to the revision
>     > +        ``reverse`` - an optional argument describing whether the
>     flags should
>     > +          be processed in order according to the
>     ``flagprocessor`` flag
>     > +          priority. The flags should be processed in order of
>     priority when
>     > +          reading revisions, and in reverse order when writing
>     revisions.
>     > +
>     > +        Returns a 2-tuple of ``(text, validatehash)`` where
>     ``text`` is the
>     > +        processed text and ``validatehash`` is a bool indicating
>     whether the
>     > +        returned text should be checked for hash integrity.
>     > +        """
>     > +        validatehash = True
>     > +
>     > +        # Depending on the operation (read or write), the order
>     might be
>     > +        # reversed due to non-commutative transforms.
>     > +        orderedflags = REVIDX_FLAGS_PROCESSING_ORDER
>     > +        if reverse is True:
>
>     You want 'if reverse' here.
>
>
> (Y)
>
>
>
>     > +            orderedflags = reversed(orderedflags)
>     > +
>     > +        for flag in orderedflags:
>     > +            # If a transform has been registered for a known
>     flag, apply it and
>     > +            # update the result tuple.
>     > +            # If no flag is set but transforms have been
>     registered, it is
>     > +            # assumed that the repository does not send flags
>     over the wire and
>     > +            # the extensions implement heuristics to determine
>     what to do based
>     > +            # on the contents of the filelog.
>
>     This special case on no flag set seems very suspicious to me.
>
>     * first it seems like there is two different case here (one were we just
>     use the flag and one were we check if we need to install a flag). Having
>     seem sharing the same code path and rely on heuristic to trigger
>     themselves seems sloppy, we should be more explicit about the case we
>     are in.
>
>     * This part about "does not send flags over the wire" is a bit strange
>     to me, can you elaborate ? We should enforce that peer involved support
>     exchanging the flag before exchanging content where the flag matters.
>
>
> This is something Durham proposed a while ago, and we re-discussed with
> Ryan McElroy today. The original idea was to allow repositories to not
> save flags in the backend, by relying on heuristics instead to determine
> what to do with filelog contents. We however do not like having to rely
> on heuristics and would rather enforce sending the flags over the wire,
> to the cost of updating backend dbs schemas to store the flags
> (defaulting to zero, which would give REVIDX_DEFAULT_FLAGS as a default
> value) before enabling lfs.
>
> We're going to move away from this hybrid design and rely on flags only
> in the update.
>
>     > +            if flag & revisionflags or revisionflags ==
>     REVIDX_DEFAULT_FLAGS:
>     > +                f = self.transformmap.get(flag, None)
>     > +                if f is not None:
>     > +                    text, validatehash = f(self.revlogobject, text)
>
>     If I read it correctly, this code is not checking for unknown flag (a
>     flag set but missing from REVIDX_FLAGS_PROCESSING_ORDER. Not processing
>     a flag seems very serious and we should not let it pass silently.
>
>
> handled correctly in the new design.
>
>
>     > +
>     > +        return text, validatehash
>     > +
>     >  def hash(text, p1, p2):
>     >      """generate a hash from the given text and its parent hashes
>     >
>     > @@ -285,6 +359,7 @@
>     >          self.version = v
>     >          self._inline = v & REVLOGNGINLINEDATA
>     >          self._generaldelta = v & REVLOGGENERALDELTA
>     > +        self.flagprocessor = flagprocessor(self)
>
>     (note: illustration the cycle we are creating)
>
>
> Will update with removed flagprocessor, removing cycling references.
>
>
>
>     >          flags = v & ~0xFFFF
>     >          fmt = v & 0xFFFF
>     >          if fmt == REVLOGV0 and flags:
>     > @@ -1221,7 +1296,11 @@
>     >              bins = bins[1:]
>     >
>     >          text = mdiff.patches(text, bins)
>     > -        self.checkhash(text, node, rev=rev)
>     > +
>     > +        text, validatehash = self.processflags(node, text,
>     self.flags(rev))
>     > +        if validatehash is True:
>
>     'if validatehash'
>
>
> (Y)
>
>
>
>     > +            self.checkhash(text, node, rev=rev)
>     > +
>     >          self._cache = (node, rev, text)
>     >          return text
>     >
>     > @@ -1233,6 +1312,28 @@
>     >          """
>     >          return hash(text, p1, p2)
>     >
>     > +    def processflags(self, node, text, flags, reverse=False):
>     > +        """Process flags and apply registered transforms.
>     > +
>     > +        cf. `flagprocessor.processflags`` method for argument and
>     return values
>     > +        description.
>     > +
>     > +        ``node`` - the noideid of revision
>     > +        ``text`` - the revision data to process
>     > +        ``flags`` - the flags applied to the revision
>     > +        ``reverse`` - an optional argument describing whether the
>     flags should
>     > +          be processed in order according to the
>     ``flagprocessor`` flag
>     > +          priority. The flags should be processed in order of
>     priority when
>     > +          reading revisions, and in reverse order when writing
>     revisions.
>     > +          This allows for non-reflexive operations in extensions.
>     > +
>     > +        Returns a 2-tuple of ``(text, validatehash)`` where
>     ``text`` is the
>     > +        processed text and ``validatehash`` is a bool indicating
>     whether the
>     > +        returned text should be checked for hash integrity.
>     > +        """
>     > +        return self.flagprocessor.processflags(node, text, flags,
>     > +                                               reverse=reverse)
>
>     That function seems to be a plain forward to flagprocessors. Why not
>     calling the flag processors method directly
>
>     (well as pointed above, I think we should entirely skip the object)
>
>
> (Y)
>
>
> --
> Rémi
> --
> Rémi
Rémi Chaintron - Dec. 12, 2016, 10 a.m.
On Sat, 3 Dec 2016 at 02:49 Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

> On 11/29/2016 06:16 PM, Rémi Chaintron wrote:
> > Thanks for the awesome review.
> >
> >     > +
> >     > +    def register(self, transformmap):
> >     > +        """Register transforms to be applied for flags.
> >     > +
> >     > +        ``transformmap`` - a map of flag to transform
> >     > +        """
> >     > +        for flag in transformmap:
> >     > +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> >     > +                self.transformmap[flag] = transformmap[flag]
> >
> >     * unknown flag seems to be silently ignored. It would be better to
> just
> >     fail here.
> >
> >     * we have a mechanism to load extra elements (like flag processors
> are).
> >     Have a look at 'mercurial.dispatch.extraloaders' (or use a registrar
> >     API, but that one is probably overkill here)
> >
> >     * we do not handle collision at all, if multiple extensions try to
> >     register a process for the same flag, we should not let the last one
> >     just overwrite the other one. (Abort is the simplest handling).
> >
> >
> > As part of moving the flagprocessor out, I now have a single
> > 'processflags(self, text, flags, reverse=True)' method in revlog, which
> > I updated to take care of unknown flags (which was previously done in
> > 'revlog.revision()').
> >
> > Collisions were explicitly not handled, as we were until now relying on
> > a flag per extension, but this is something I'm happy to discuss.
>
> We cannot exclude the change for a collision, especially because
> multiple extension could use the same flag without having registered it
> with Mercurial ever. In addition, as discussed in Patch4 it would make
> sense to have multiple extensions implement different variant of the
> same flag/behavior.
>

If I understand correctly, what you would like to see is moving towards a
generic flag (for example REVIDX_ISSTOREDEXTERNALLY), that several
extensions can register transforms for.

* My assumption is that we want only one extension to be able to register
transforms on a specific flag. (cf. last point on challenging this).

* An extension that registered a transform needs to be able to change the
transform it registered in order to cover all workflows. Some code paths
require both a read AND write transforms (e.g 'commit' calls revision()
followed by addrevision() and these need different transforms).

* Notwithstanding the fact any extension could force an update of the
transforms dictionary anyway, I'm not sure how to prevent collisions here.
Wouldn't it make sense to let users be responsible for which extensions
they enable?

* If not, I'm a bit unsure as how to enable extension A to register its
transforms while preventing extension B from doing so. I can think of ways
of how to detect genuine errors (i.e a developer enabling two of these
extensions by mistake) but I'm not sure how to keep the whitelisting of a
single extension going as from the point of view of the  flag transforms
dictionary, extensions are anonymous. Note that this would cover only the
aspect of genuine mistakes, not intentional tempering with the dictionary.

* We could also enable several extensions to register different transforms
on the same flag (which, ultimately we might have to in order to avoid
running out of flag bits) but that would require "de-anonymizing"
extensions from the point of view of REVIDX_FLAGS_TRANSFORMS and keeping an
ordering of such extensions to ensure non-commutative operations are
handled gracefully (relying on a double ordering of which flags handle
first and which extensions apply in which order for each flag).

Could you elaborate on what you'd like to see here?




> > The current updated version I'm working on simply relies on the
> > following constant:
> > REVIDX_FLAGS_TRANSFORMS = collections.OrderedDict({
> >     REVIDX_ISCENSORED: None,
> >     REVIDX_ISLFS: None,
> > })
> > This takes care of both the ordering and a way to register transforms,
> > but this enforces using only one transform per flag.
>
> Not that in the example you provide, You are feeding a dict to
> OrderedDict, so the ordering is lost before reaching the OrderedDict.
>
> In addition, using an ordered dict makes is hard to third party
> extension to add and use any non-declared flag.
>
> >     > +
> >     > +    def unregister(self, transformmap):
> >     > +        """Unregister transforms for flags."""
> >     > +        for flag in transformmap:
> >     > +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
> >     > +                self.transformmap[flag] = None
> >
> >     What is the usecase for unregistering? Since we do not call the
> process
> >     if the flag is not set, I can't think of a case were we need to
> >     unregister. Am I missing something?
> >
> >
> > I will get rid of this, as we're moving away from non-flag based
> > transforms (cf. below).
> >
> >
> >     > +    def processflags(self, node, text, revisionflags,
> reverse=False):
> >     > +        """Process flags and apply registered transforms.
> >     > +
> >     > +        ``node`` - the noideid of revision
> >     > +        ``text`` - the revision data to process
> >     > +        ``revisionflags`` - the flags applied to the revision
> >     > +        ``reverse`` - an optional argument describing whether the
> >     flags should
> >     > +          be processed in order according to the
> >     ``flagprocessor`` flag
> >     > +          priority. The flags should be processed in order of
> >     priority when
> >     > +          reading revisions, and in reverse order when writing
> >     revisions.
> >     > +
> >     > +        Returns a 2-tuple of ``(text, validatehash)`` where
> >     ``text`` is the
> >     > +        processed text and ``validatehash`` is a bool indicating
> >     whether the
> >     > +        returned text should be checked for hash integrity.
> >     > +        """
> >     > +        validatehash = True
> >     > +
> >     > +        # Depending on the operation (read or write), the order
> >     might be
> >     > +        # reversed due to non-commutative transforms.
> >     > +        orderedflags = REVIDX_FLAGS_PROCESSING_ORDER
> >     > +        if reverse is True:
> >
> >     You want 'if reverse' here.
> >
> >
> > (Y)
> >
> >
> >
> >     > +            orderedflags = reversed(orderedflags)
> >     > +
> >     > +        for flag in orderedflags:
> >     > +            # If a transform has been registered for a known
> >     flag, apply it and
> >     > +            # update the result tuple.
> >     > +            # If no flag is set but transforms have been
> >     registered, it is
> >     > +            # assumed that the repository does not send flags
> >     over the wire and
> >     > +            # the extensions implement heuristics to determine
> >     what to do based
> >     > +            # on the contents of the filelog.
> >
> >     This special case on no flag set seems very suspicious to me.
> >
> >     * first it seems like there is two different case here (one were we
> just
> >     use the flag and one were we check if we need to install a flag).
> Having
> >     seem sharing the same code path and rely on heuristic to trigger
> >     themselves seems sloppy, we should be more explicit about the case we
> >     are in.
> >
> >     * This part about "does not send flags over the wire" is a bit
> strange
> >     to me, can you elaborate ? We should enforce that peer involved
> support
> >     exchanging the flag before exchanging content where the flag matters.
> >
> >
> > This is something Durham proposed a while ago, and we re-discussed with
> > Ryan McElroy today. The original idea was to allow repositories to not
> > save flags in the backend, by relying on heuristics instead to determine
> > what to do with filelog contents. We however do not like having to rely
> > on heuristics and would rather enforce sending the flags over the wire,
> > to the cost of updating backend dbs schemas to store the flags
> > (defaulting to zero, which would give REVIDX_DEFAULT_FLAGS as a default
> > value) before enabling lfs.
> >
> > We're going to move away from this hybrid design and rely on flags only
> > in the update.
> >
> >     > +            if flag & revisionflags or revisionflags ==
> >     REVIDX_DEFAULT_FLAGS:
> >     > +                f = self.transformmap.get(flag, None)
> >     > +                if f is not None:
> >     > +                    text, validatehash = f(self.revlogobject,
> text)
> >
> >     If I read it correctly, this code is not checking for unknown flag (a
> >     flag set but missing from REVIDX_FLAGS_PROCESSING_ORDER. Not
> processing
> >     a flag seems very serious and we should not let it pass silently.
> >
> >
> > handled correctly in the new design.
> >
> >
> >     > +
> >     > +        return text, validatehash
> >     > +
> >     >  def hash(text, p1, p2):
> >     >      """generate a hash from the given text and its parent hashes
> >     >
> >     > @@ -285,6 +359,7 @@
> >     >          self.version = v
> >     >          self._inline = v & REVLOGNGINLINEDATA
> >     >          self._generaldelta = v & REVLOGGENERALDELTA
> >     > +        self.flagprocessor = flagprocessor(self)
> >
> >     (note: illustration the cycle we are creating)
> >
> >
> > Will update with removed flagprocessor, removing cycling references.
> >
> >
> >
> >     >          flags = v & ~0xFFFF
> >     >          fmt = v & 0xFFFF
> >     >          if fmt == REVLOGV0 and flags:
> >     > @@ -1221,7 +1296,11 @@
> >     >              bins = bins[1:]
> >     >
> >     >          text = mdiff.patches(text, bins)
> >     > -        self.checkhash(text, node, rev=rev)
> >     > +
> >     > +        text, validatehash = self.processflags(node, text,
> >     self.flags(rev))
> >     > +        if validatehash is True:
> >
> >     'if validatehash'
> >
> >
> > (Y)
> >
> >
> >
> >     > +            self.checkhash(text, node, rev=rev)
> >     > +
> >     >          self._cache = (node, rev, text)
> >     >          return text
> >     >
> >     > @@ -1233,6 +1312,28 @@
> >     >          """
> >     >          return hash(text, p1, p2)
> >     >
> >     > +    def processflags(self, node, text, flags, reverse=False):
> >     > +        """Process flags and apply registered transforms.
> >     > +
> >     > +        cf. `flagprocessor.processflags`` method for argument and
> >     return values
> >     > +        description.
> >     > +
> >     > +        ``node`` - the noideid of revision
> >     > +        ``text`` - the revision data to process
> >     > +        ``flags`` - the flags applied to the revision
> >     > +        ``reverse`` - an optional argument describing whether the
> >     flags should
> >     > +          be processed in order according to the
> >     ``flagprocessor`` flag
> >     > +          priority. The flags should be processed in order of
> >     priority when
> >     > +          reading revisions, and in reverse order when writing
> >     revisions.
> >     > +          This allows for non-reflexive operations in extensions.
> >     > +
> >     > +        Returns a 2-tuple of ``(text, validatehash)`` where
> >     ``text`` is the
> >     > +        processed text and ``validatehash`` is a bool indicating
> >     whether the
> >     > +        returned text should be checked for hash integrity.
> >     > +        """
> >     > +        return self.flagprocessor.processflags(node, text, flags,
> >     > +                                               reverse=reverse)
> >
> >     That function seems to be a plain forward to flagprocessors. Why not
> >     calling the flag processors method directly
> >
> >     (well as pointed above, I think we should entirely skip the object)
> >
> >
> > (Y)
> >
> >
> > --
> > Rémi
> > --
> > Rémi
>
> --
> Pierre-Yves David
>
Pierre-Yves David - Dec. 12, 2016, 1:22 p.m.
On 12/12/2016 11:00 AM, Rémi Chaintron wrote:
>
>
> On Sat, 3 Dec 2016 at 02:49 Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>     On 11/29/2016 06:16 PM, Rémi Chaintron wrote:
>     > Thanks for the awesome review.
>     >
>     >     > +
>     >     > +    def register(self, transformmap):
>     >     > +        """Register transforms to be applied for flags.
>     >     > +
>     >     > +        ``transformmap`` - a map of flag to transform
>     >     > +        """
>     >     > +        for flag in transformmap:
>     >     > +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
>     >     > +                self.transformmap[flag] = transformmap[flag]
>     >
>     >     * unknown flag seems to be silently ignored. It would be
>     better to just
>     >     fail here.
>     >
>     >     * we have a mechanism to load extra elements (like flag
>     processors are).
>     >     Have a look at 'mercurial.dispatch.extraloaders' (or use a
>     registrar
>     >     API, but that one is probably overkill here)
>     >
>     >     * we do not handle collision at all, if multiple extensions try to
>     >     register a process for the same flag, we should not let the
>     last one
>     >     just overwrite the other one. (Abort is the simplest handling).
>     >
>     >
>     > As part of moving the flagprocessor out, I now have a single
>     > 'processflags(self, text, flags, reverse=True)' method in revlog,
>     which
>     > I updated to take care of unknown flags (which was previously done in
>     > 'revlog.revision()').
>     >
>     > Collisions were explicitly not handled, as we were until now
>     relying on
>     > a flag per extension, but this is something I'm happy to discuss.
>
>     We cannot exclude the change for a collision, especially because
>     multiple extension could use the same flag without having registered it
>     with Mercurial ever. In addition, as discussed in Patch4 it would make
>     sense to have multiple extensions implement different variant of the
>     same flag/behavior.
>
>
> If I understand correctly, what you would like to see is moving towards
> a generic flag (for example REVIDX_ISSTOREDEXTERNALLY), that several
> extensions can register transforms for.
>
> * My assumption is that we want only one extension to be able to
> register transforms on a specific flag. (cf. last point on challenging
> this).
>
> * An extension that registered a transform needs to be able to change
> the transform it registered in order to cover all workflows. Some code
> paths require both a read AND write transforms (e.g 'commit' calls
> revision() followed by addrevision() and these need different transforms).
>
> * Notwithstanding the fact any extension could force an update of the
> transforms dictionary anyway, I'm not sure how to prevent collisions
> here. Wouldn't it make sense to let users be responsible for which
> extensions they enable?
>
> * If not, I'm a bit unsure as how to enable extension A to register its
> transforms while preventing extension B from doing so. I can think of
> ways of how to detect genuine errors (i.e a developer enabling two of
> these extensions by mistake) but I'm not sure how to keep the
> whitelisting of a single extension going as from the point of view of
> the  flag transforms dictionary, extensions are anonymous. Note that
> this would cover only the aspect of genuine mistakes, not intentional
> tempering with the dictionary.
 >
> […]
 >
> Could you elaborate on what you'd like to see here?

(small initial note: we talk a lot about extension, but some flag will 
end up handled by core too (some of censors for example)).

If you provide a small function in charge of registering a new flag 
processors, you can add logic to thing function to make sure there is 
not an existing processors that we are about to silently overwrite.
That function could be a decorator. You can find example of this here: 
https://www.mercurial-scm.org/repo/hg/file/tip/mercurial/exchange.py#l697

Otherwise, you cannot really rely on the extension coder to do the right 
thing, if the obvious API to add a transformer is to directly add it to 
the dictionary you can expect them to not check for an existing one 
before adding their own.

Of course they could still decide to ignore the official API and 
directly temper with the dictionary and all other kind of bad things, 
but providing an official API to register something should be a good 
enough effort to catch most of it.

In the same way, I would really no rely on actual user knowing what they 
doing regarding extension. They will enable incompatible extension 
without checking.

 >
 > * We could also enable several extensions to register different
 > transforms on the same flag (which, ultimately we might have to in order
 > to avoid running out of flag bits) but that would require
 > "de-anonymizing" extensions from the point of view of
 > REVIDX_FLAGS_TRANSFORMS and keeping an ordering of such extensions to
 > ensure non-commutative operations are handled gracefully (relying on a
 > double ordering of which flags handle first and which extensions apply
 > in which order for each flag).
 >

So I think the basic rules should be: each flag correspond to one 
transformer. Anything related to order and commutativity is handled by 
the ordering of flag processing. So we should make sure that the default 
mechanism around flag processors enforce that.

The advanced version of this is that in some case, it will make sense 
for a flag to have multiple extensions handling that flag 
"transformation" in different way. I think that external storage is a 
good use case for this. In particular, your version of external storage 
use the LFS protocol, but other protocols could be used. For example, 
the existing largefile logic and protocol could be ported to use that 
new flag. Allowing people to keep there existing infrastructure while 
taking advantage of the better internal logic. Other example, Big 
company like Google might come along with there own efficient protocol 
to locate large asset and use it, etc… It seems likely that people 
implementing different protocol will use different extensions.

So in this case it make sense to have multiple extensions collaborating 
on a single flag. But there should still be one and only one actually 
processing the revision content (no order, no commutativity at that 
level). The important part here is to be -eventually- able to have 
multiple extension collaborating. So it seems fine to have a first 
version directly using one flag processor for that flag. However, the 
-format- that we expect behind that flag should allow for a later 
dispatch between various protocol/method. This will allow use to 
implement a dispatch mechanism on top of it later on.

If things are still unclear to you after this, we should probably VC.

>     > The current updated version I'm working on simply relies on the
>     > following constant:
>     > REVIDX_FLAGS_TRANSFORMS = collections.OrderedDict({
>     >     REVIDX_ISCENSORED: None,
>     >     REVIDX_ISLFS: None,
>     > })
>     > This takes care of both the ordering and a way to register transforms,
>     > but this enforces using only one transform per flag.
>
>     Not that in the example you provide, You are feeding a dict to
>     OrderedDict, so the ordering is lost before reaching the OrderedDict.
>
>     In addition, using an ordered dict makes is hard to third party
>     extension to add and use any non-declared flag.
>
>     >     > +
>     >     > +    def unregister(self, transformmap):
>     >     > +        """Unregister transforms for flags."""
>     >     > +        for flag in transformmap:
>     >     > +            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
>     >     > +                self.transformmap[flag] = None
>     >
>     >     What is the usecase for unregistering? Since we do not call
>     the process
>     >     if the flag is not set, I can't think of a case were we need to
>     >     unregister. Am I missing something?
>     >
>     >
>     > I will get rid of this, as we're moving away from non-flag based
>     > transforms (cf. below).
>     >
>     >
>     >     > +    def processflags(self, node, text, revisionflags,
>     reverse=False):
>     >     > +        """Process flags and apply registered transforms.
>     >     > +
>     >     > +        ``node`` - the noideid of revision
>     >     > +        ``text`` - the revision data to process
>     >     > +        ``revisionflags`` - the flags applied to the revision
>     >     > +        ``reverse`` - an optional argument describing
>     whether the
>     >     flags should
>     >     > +          be processed in order according to the
>     >     ``flagprocessor`` flag
>     >     > +          priority. The flags should be processed in order of
>     >     priority when
>     >     > +          reading revisions, and in reverse order when writing
>     >     revisions.
>     >     > +
>     >     > +        Returns a 2-tuple of ``(text, validatehash)`` where
>     >     ``text`` is the
>     >     > +        processed text and ``validatehash`` is a bool
>     indicating
>     >     whether the
>     >     > +        returned text should be checked for hash integrity.
>     >     > +        """
>     >     > +        validatehash = True
>     >     > +
>     >     > +        # Depending on the operation (read or write), the order
>     >     might be
>     >     > +        # reversed due to non-commutative transforms.
>     >     > +        orderedflags = REVIDX_FLAGS_PROCESSING_ORDER
>     >     > +        if reverse is True:
>     >
>     >     You want 'if reverse' here.
>     >
>     >
>     > (Y)
>     >
>     >
>     >
>     >     > +            orderedflags = reversed(orderedflags)
>     >     > +
>     >     > +        for flag in orderedflags:
>     >     > +            # If a transform has been registered for a known
>     >     flag, apply it and
>     >     > +            # update the result tuple.
>     >     > +            # If no flag is set but transforms have been
>     >     registered, it is
>     >     > +            # assumed that the repository does not send flags
>     >     over the wire and
>     >     > +            # the extensions implement heuristics to determine
>     >     what to do based
>     >     > +            # on the contents of the filelog.
>     >
>     >     This special case on no flag set seems very suspicious to me.
>     >
>     >     * first it seems like there is two different case here (one
>     were we just
>     >     use the flag and one were we check if we need to install a
>     flag). Having
>     >     seem sharing the same code path and rely on heuristic to trigger
>     >     themselves seems sloppy, we should be more explicit about the
>     case we
>     >     are in.
>     >
>     >     * This part about "does not send flags over the wire" is a bit
>     strange
>     >     to me, can you elaborate ? We should enforce that peer
>     involved support
>     >     exchanging the flag before exchanging content where the flag
>     matters.
>     >
>     >
>     > This is something Durham proposed a while ago, and we re-discussed
>     with
>     > Ryan McElroy today. The original idea was to allow repositories to not
>     > save flags in the backend, by relying on heuristics instead to
>     determine
>     > what to do with filelog contents. We however do not like having to
>     rely
>     > on heuristics and would rather enforce sending the flags over the
>     wire,
>     > to the cost of updating backend dbs schemas to store the flags
>     > (defaulting to zero, which would give REVIDX_DEFAULT_FLAGS as a
>     default
>     > value) before enabling lfs.
>     >
>     > We're going to move away from this hybrid design and rely on flags
>     only
>     > in the update.
>     >
>     >     > +            if flag & revisionflags or revisionflags ==
>     >     REVIDX_DEFAULT_FLAGS:
>     >     > +                f = self.transformmap.get(flag, None)
>     >     > +                if f is not None:
>     >     > +                    text, validatehash =
>     f(self.revlogobject, text)
>     >
>     >     If I read it correctly, this code is not checking for unknown
>     flag (a
>     >     flag set but missing from REVIDX_FLAGS_PROCESSING_ORDER. Not
>     processing
>     >     a flag seems very serious and we should not let it pass silently.
>     >
>     >
>     > handled correctly in the new design.
>     >
>     >
>     >     > +
>     >     > +        return text, validatehash
>     >     > +
>     >     >  def hash(text, p1, p2):
>     >     >      """generate a hash from the given text and its parent
>     hashes
>     >     >
>     >     > @@ -285,6 +359,7 @@
>     >     >          self.version = v
>     >     >          self._inline = v & REVLOGNGINLINEDATA
>     >     >          self._generaldelta = v & REVLOGGENERALDELTA
>     >     > +        self.flagprocessor = flagprocessor(self)
>     >
>     >     (note: illustration the cycle we are creating)
>     >
>     >
>     > Will update with removed flagprocessor, removing cycling references.
>     >
>     >
>     >
>     >     >          flags = v & ~0xFFFF
>     >     >          fmt = v & 0xFFFF
>     >     >          if fmt == REVLOGV0 and flags:
>     >     > @@ -1221,7 +1296,11 @@
>     >     >              bins = bins[1:]
>     >     >
>     >     >          text = mdiff.patches(text, bins)
>     >     > -        self.checkhash(text, node, rev=rev)
>     >     > +
>     >     > +        text, validatehash = self.processflags(node, text,
>     >     self.flags(rev))
>     >     > +        if validatehash is True:
>     >
>     >     'if validatehash'
>     >
>     >
>     > (Y)
>     >
>     >
>     >
>     >     > +            self.checkhash(text, node, rev=rev)
>     >     > +
>     >     >          self._cache = (node, rev, text)
>     >     >          return text
>     >     >
>     >     > @@ -1233,6 +1312,28 @@
>     >     >          """
>     >     >          return hash(text, p1, p2)
>     >     >
>     >     > +    def processflags(self, node, text, flags, reverse=False):
>     >     > +        """Process flags and apply registered transforms.
>     >     > +
>     >     > +        cf. `flagprocessor.processflags`` method for
>     argument and
>     >     return values
>     >     > +        description.
>     >     > +
>     >     > +        ``node`` - the noideid of revision
>     >     > +        ``text`` - the revision data to process
>     >     > +        ``flags`` - the flags applied to the revision
>     >     > +        ``reverse`` - an optional argument describing
>     whether the
>     >     flags should
>     >     > +          be processed in order according to the
>     >     ``flagprocessor`` flag
>     >     > +          priority. The flags should be processed in order of
>     >     priority when
>     >     > +          reading revisions, and in reverse order when writing
>     >     revisions.
>     >     > +          This allows for non-reflexive operations in
>     extensions.
>     >     > +
>     >     > +        Returns a 2-tuple of ``(text, validatehash)`` where
>     >     ``text`` is the
>     >     > +        processed text and ``validatehash`` is a bool
>     indicating
>     >     whether the
>     >     > +        returned text should be checked for hash integrity.
>     >     > +        """
>     >     > +        return self.flagprocessor.processflags(node, text,
>     flags,
>     >     > +                                               reverse=reverse)
>     >
>     >     That function seems to be a plain forward to flagprocessors.
>     Why not
>     >     calling the flag processors method directly
>     >
>     >     (well as pointed above, I think we should entirely skip the
>     object)
>     >
>     >
>     > (Y)
>     >
>     >
>     > --
>     > Rémi
>     > --
>     > Rémi
>
>     --
>     Pierre-Yves David
>
> --
> Rémi

Patch

diff --git a/mercurial/bundlerepo.py b/mercurial/bundlerepo.py
--- a/mercurial/bundlerepo.py
+++ b/mercurial/bundlerepo.py
@@ -147,7 +147,9 @@ 
             delta = self._chunk(chain.pop())
             text = mdiff.patches(text, [delta])
 
-        self.checkhash(text, node, rev=rev)
+        text, validatehash = self.processflags(node, text, self.flags(rev))
+        if validatehash is True:
+            self.checkhash(text, node, rev=rev)
         self._cache = (node, rev, text)
         return text
 
diff --git a/mercurial/revlog.py b/mercurial/revlog.py
--- a/mercurial/revlog.py
+++ b/mercurial/revlog.py
@@ -56,6 +56,10 @@ 
 REVIDX_ISCENSORED = (1 << 15) # revision has censor metadata, must be verified
 REVIDX_DEFAULT_FLAGS = 0
 REVIDX_KNOWN_FLAGS = REVIDX_ISCENSORED
+# stable order in which flags need to be processed by the flagprocessor
+REVIDX_FLAGS_PROCESSING_ORDER = [
+    REVIDX_ISCENSORED,
+]
 
 # max size of revlog with inline data
 _maxinline = 131072
@@ -76,6 +80,76 @@ 
 
 _nullhash = hashlib.sha1(nullid)
 
+class flagprocessor(object):
+    """process revlog objects contents based on flags
+
+    flagprocessor objects are the interface between revlog objects and
+    subclasses/extensions needing to change or update the contents of revlog
+    objects, based on flags.
+
+    The role of the flagprocessor is to apply transforms registered by
+    extensions and revlog subclasses in the order defined by
+    REVIDX_FLAGS_PROCESSING_ORDER and the type (read or write) of operation.
+    This allows the flag processor to modify the contents of revlog objects and
+    handle the composition of non-commutative operations.
+    """
+
+    def __init__(self, revlogobject):
+        self.transformmap = {}
+        self.revlogobject = revlogobject
+
+    def register(self, transformmap):
+        """Register transforms to be applied for flags.
+
+        ``transformmap`` - a map of flag to transform
+        """
+        for flag in transformmap:
+            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
+                self.transformmap[flag] = transformmap[flag]
+
+    def unregister(self, transformmap):
+        """Unregister transforms for flags."""
+        for flag in transformmap:
+            if flag in REVIDX_FLAGS_PROCESSING_ORDER:
+                self.transformmap[flag] = None
+
+    def processflags(self, node, text, revisionflags, reverse=False):
+        """Process flags and apply registered transforms.
+
+        ``node`` - the noideid of revision
+        ``text`` - the revision data to process
+        ``revisionflags`` - the flags applied to the revision
+        ``reverse`` - an optional argument describing whether the flags should
+          be processed in order according to the ``flagprocessor`` flag
+          priority. The flags should be processed in order of priority when
+          reading revisions, and in reverse order when writing revisions.
+
+        Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is the
+        processed text and ``validatehash`` is a bool indicating whether the
+        returned text should be checked for hash integrity.
+        """
+        validatehash = True
+
+        # Depending on the operation (read or write), the order might be
+        # reversed due to non-commutative transforms.
+        orderedflags = REVIDX_FLAGS_PROCESSING_ORDER
+        if reverse is True:
+            orderedflags = reversed(orderedflags)
+
+        for flag in orderedflags:
+            # If a transform has been registered for a known flag, apply it and
+            # update the result tuple.
+            # If no flag is set but transforms have been registered, it is
+            # assumed that the repository does not send flags over the wire and
+            # the extensions implement heuristics to determine what to do based
+            # on the contents of the filelog.
+            if flag & revisionflags or revisionflags == REVIDX_DEFAULT_FLAGS:
+                f = self.transformmap.get(flag, None)
+                if f is not None:
+                    text, validatehash = f(self.revlogobject, text)
+
+        return text, validatehash
+
 def hash(text, p1, p2):
     """generate a hash from the given text and its parent hashes
 
@@ -285,6 +359,7 @@ 
         self.version = v
         self._inline = v & REVLOGNGINLINEDATA
         self._generaldelta = v & REVLOGGENERALDELTA
+        self.flagprocessor = flagprocessor(self)
         flags = v & ~0xFFFF
         fmt = v & 0xFFFF
         if fmt == REVLOGV0 and flags:
@@ -1221,7 +1296,11 @@ 
             bins = bins[1:]
 
         text = mdiff.patches(text, bins)
-        self.checkhash(text, node, rev=rev)
+
+        text, validatehash = self.processflags(node, text, self.flags(rev))
+        if validatehash is True:
+            self.checkhash(text, node, rev=rev)
+
         self._cache = (node, rev, text)
         return text
 
@@ -1233,6 +1312,28 @@ 
         """
         return hash(text, p1, p2)
 
+    def processflags(self, node, text, flags, reverse=False):
+        """Process flags and apply registered transforms.
+
+        cf. `flagprocessor.processflags`` method for argument and return values
+        description.
+
+        ``node`` - the noideid of revision
+        ``text`` - the revision data to process
+        ``flags`` - the flags applied to the revision
+        ``reverse`` - an optional argument describing whether the flags should
+          be processed in order according to the ``flagprocessor`` flag
+          priority. The flags should be processed in order of priority when
+          reading revisions, and in reverse order when writing revisions.
+          This allows for non-reflexive operations in extensions.
+
+        Returns a 2-tuple of ``(text, validatehash)`` where ``text`` is the
+        processed text and ``validatehash`` is a bool indicating whether the
+        returned text should be checked for hash integrity.
+        """
+        return self.flagprocessor.processflags(node, text, flags,
+                                               reverse=reverse)
+
     def checkhash(self, text, node, p1=None, p2=None, rev=None):
         """Check node hash integrity.
 
@@ -1415,7 +1516,10 @@ 
                 basetext = self.revision(self.node(baserev), _df=fh)
                 btext[0] = mdiff.patch(basetext, delta)
             try:
-                self.checkhash(btext[0], node, p1=p1, p2=p2)
+                btext[0], validatehash = self.processflags(node, btext[0],
+                                                           flags, reverse=True)
+                if validatehash is True:
+                    self.checkhash(btext[0], node, p1=p1, p2=p2)
                 if flags & REVIDX_ISCENSORED:
                     raise RevlogError(_('node %s is not censored') % node)
             except CensoredNodeError: