Patchwork [12,of,14] vfs: add the possibility to have a "ward" to check vfs usage

login
register
mail settings
Submitter Pierre-Yves David
Date July 2, 2017, 2:56 a.m.
Message ID <ebf81efdd6d7ff15c646.1498964197@nodosa.octopoid.net>
Download mbox | patch
Permalink /patch/21885/
State Accepted
Headers show

Comments

Pierre-Yves David - July 2, 2017, 2:56 a.m.
# HG changeset patch
# User Pierre-Yves David <pierre-yves.david@ens-lyon.org>
# Date 1470323266 -7200
#      Thu Aug 04 17:07:46 2016 +0200
# Node ID ebf81efdd6d7ff15c64683933635616c00475688
# Parent  34b8be7f0420b07d0f7b71379c6055e5b26223d5
# EXP-Topic vfs.ward
# Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
#              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r ebf81efdd6d7
vfs: add the possibility to have a "ward" to check vfs usage

The function will be called anytime we open a file. The first usage of this
'ward' will be to check that lock are properly taken before accessing file.
Later we might use it to ensure we use the right vfs to access files, allowing
more vfs to be introduced.

The feature has some similarity with the 'pathauditor', but further digging
show it make more sense to keep them separated. The path auditor is fairly
autonomous (create from vfs.__init__ without any extra information), and check
very generic details. On the other hand, the wards will have deeper knownledge
of the Mercurial logic and is created at the local repository level. Mixing the
two would add much complexity for no real gains.

We currently only apply the ward on 'open' operation. We will extend this to
other operations like copy, creation and removal later. The current readonlyvfs
seems to have the same shortcoming.
Yuya Nishihara - July 5, 2017, 2:55 p.m.
On Sun, 02 Jul 2017 04:56:37 +0200, Pierre-Yves David wrote:
> # HG changeset patch
> # User Pierre-Yves David <pierre-yves.david@ens-lyon.org>
> # Date 1470323266 -7200
> #      Thu Aug 04 17:07:46 2016 +0200
> # Node ID ebf81efdd6d7ff15c64683933635616c00475688
> # Parent  34b8be7f0420b07d0f7b71379c6055e5b26223d5
> # EXP-Topic vfs.ward
> # Available At https://www.mercurial-scm.org/repo/users/marmoute/mercurial/
> #              hg pull https://www.mercurial-scm.org/repo/users/marmoute/mercurial/ -r ebf81efdd6d7
> vfs: add the possibility to have a "ward" to check vfs usage

> The feature has some similarity with the 'pathauditor', but further digging
> show it make more sense to keep them separated. The path auditor is fairly
> autonomous (create from vfs.__init__ without any extra information), and check
> very generic details. On the other hand, the wards will have deeper knownledge
> of the Mercurial logic and is created at the local repository level. Mixing the
> two would add much complexity for no real gains.

My gut feeling is that vfs isn't the right place if they require deeper
knowledge of repository/dirstate logic. And the whole weakref business
floating around wards, hooks and transaction seems like just getting around
layering violation.

> We currently only apply the ward on 'open' operation. We will extend this to
> other operations like copy, creation and removal later. The current readonlyvfs
> seems to have the same shortcoming.
> 
> diff --git a/mercurial/vfs.py b/mercurial/vfs.py
> --- a/mercurial/vfs.py
> +++ b/mercurial/vfs.py
> @@ -284,7 +284,7 @@ class vfs(abstractvfs):
>      This class is used to hide the details of COW semantics and
>      remote file access from higher level code.
>      '''
> -    def __init__(self, base, expandpath=False, realpath=False):
> +    def __init__(self, base, expandpath=False, realpath=False, ward=None):
>          if expandpath:
>              base = util.expandpath(base)
>          if realpath:
> @@ -293,6 +293,11 @@ class vfs(abstractvfs):
>          self.audit = pathutil.pathauditor(self.base)
>          self.createmode = None
>          self._trustnlink = None
> +        # optional function to validate operation on file
> +        # intended to be user for developer checks.
> +        #
> +        # XXX should be call for other things than 'open'
> +        self._ward = ward
>  
>      @util.propertycache
>      def _cansymlink(self):
> @@ -343,6 +348,9 @@ class vfs(abstractvfs):
>          if not text and "b" not in mode:
>              mode += "b" # for that other OS
>  
> +        if self._ward is not None:
> +            self._ward(f, mode, atomictemp)

Do you have any idea how to expand this ward() callback to the other vfs
operations, which have different set of parameters?
Boris Feld - July 6, 2017, 6:53 p.m.
On Wed, 2017-07-05 at 23:55 +0900, Yuya Nishihara wrote:
> On Sun, 02 Jul 2017 04:56:37 +0200, Pierre-Yves David wrote:
> > # HG changeset patch
> > # User Pierre-Yves David <pierre-yves.david@ens-lyon.org>
> > # Date 1470323266 -7200
> > #      Thu Aug 04 17:07:46 2016 +0200
> > # Node ID ebf81efdd6d7ff15c64683933635616c00475688
> > # Parent  34b8be7f0420b07d0f7b71379c6055e5b26223d5
> > # EXP-Topic vfs.ward
> > # Available At https://www.mercurial-scm.org/repo/users/marmoute/me
> > rcurial/
> > #              hg pull https://www.mercurial-scm.org/repo/users/mar
> > moute/mercurial/ -r ebf81efdd6d7
> > vfs: add the possibility to have a "ward" to check vfs usage
> > The feature has some similarity with the 'pathauditor', but further
> > digging
> > show it make more sense to keep them separated. The path auditor is
> > fairly
> > autonomous (create from vfs.__init__ without any extra
> > information), and check
> > very generic details. On the other hand, the wards will have deeper
> > knownledge
> > of the Mercurial logic and is created at the local repository
> > level. Mixing the
> > two would add much complexity for no real gains.
> 
> My gut feeling is that vfs isn't the right place if they require
> deeper
> knowledge of repository/dirstate logic. And the whole weakref
> business
> floating around wards, hooks and transaction seems like just getting
> around
> layering violation.

We pondered on this choice a long time and couldn't find any other
layer that is used by both Mercurial core and the extensions to access
the file system. It seems to be the highest abstraction layer we could
hook into without introducing a new one and updating all callers.

Our reflexion is, as the vfs classes comes from the scmutil file, it
seems okay to have scm related logic in that layer.

What do you think? Do you have another layer in mind where the callback
could be hooked to be able to catch any file-related operation?
> 
> > We currently only apply the ward on 'open' operation. We will
> > extend this to
> > other operations like copy, creation and removal later. The current
> > readonlyvfs
> > seems to have the same shortcoming.
> > 
> > diff --git a/mercurial/vfs.py b/mercurial/vfs.py
> > --- a/mercurial/vfs.py
> > +++ b/mercurial/vfs.py
> > @@ -284,7 +284,7 @@ class vfs(abstractvfs):
> >      This class is used to hide the details of COW semantics and
> >      remote file access from higher level code.
> >      '''
> > -    def __init__(self, base, expandpath=False, realpath=False):
> > +    def __init__(self, base, expandpath=False, realpath=False,
> > ward=None):
> >          if expandpath:
> >              base = util.expandpath(base)
> >          if realpath:
> > @@ -293,6 +293,11 @@ class vfs(abstractvfs):
> >          self.audit = pathutil.pathauditor(self.base)
> >          self.createmode = None
> >          self._trustnlink = None
> > +        # optional function to validate operation on file
> > +        # intended to be user for developer checks.
> > +        #
> > +        # XXX should be call for other things than 'open'
> > +        self._ward = ward
> >  
> >      @util.propertycache
> >      def _cansymlink(self):
> > @@ -343,6 +348,9 @@ class vfs(abstractvfs):
> >          if not text and "b" not in mode:
> >              mode += "b" # for that other OS
> >  
> > +        if self._ward is not None:
> > +            self._ward(f, mode, atomictemp)
> 
> Do you have any idea how to expand this ward() callback to the other
> vfs
> operations, which have different set of parameters?

All other operations (move, delete, copy, etc) could be "translated" to
'read' or 'write' mode. An alternative would be to abstract the mode
with a couple of symbolic constant "create, update, append, delete, …".
What do you think?

Cheers,
Boris
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - July 7, 2017, 12:48 p.m.
On Thu, 06 Jul 2017 20:53:50 +0200, Boris Feld wrote:
> On Wed, 2017-07-05 at 23:55 +0900, Yuya Nishihara wrote:
> > On Sun, 02 Jul 2017 04:56:37 +0200, Pierre-Yves David wrote:
> > > # HG changeset patch
> > > # User Pierre-Yves David <pierre-yves.david@ens-lyon.org>
> > > # Date 1470323266 -7200
> > > #      Thu Aug 04 17:07:46 2016 +0200
> > > # Node ID ebf81efdd6d7ff15c64683933635616c00475688
> > > # Parent  34b8be7f0420b07d0f7b71379c6055e5b26223d5
> > > # EXP-Topic vfs.ward
> > > # Available At https://www.mercurial-scm.org/repo/users/marmoute/me
> > > rcurial/
> > > #              hg pull https://www.mercurial-scm.org/repo/users/mar
> > > moute/mercurial/ -r ebf81efdd6d7
> > > vfs: add the possibility to have a "ward" to check vfs usage
> > > The feature has some similarity with the 'pathauditor', but further
> > > digging
> > > show it make more sense to keep them separated. The path auditor is
> > > fairly
> > > autonomous (create from vfs.__init__ without any extra
> > > information), and check
> > > very generic details. On the other hand, the wards will have deeper
> > > knownledge
> > > of the Mercurial logic and is created at the local repository
> > > level. Mixing the
> > > two would add much complexity for no real gains.
> > 
> > My gut feeling is that vfs isn't the right place if they require
> > deeper
> > knowledge of repository/dirstate logic. And the whole weakref
> > business
> > floating around wards, hooks and transaction seems like just getting
> > around
> > layering violation.
> 
> We pondered on this choice a long time and couldn't find any other
> layer that is used by both Mercurial core and the extensions to access
> the file system. It seems to be the highest abstraction layer we could
> hook into without introducing a new one and updating all callers.
> 
> Our reflexion is, as the vfs classes comes from the scmutil file, it
> seems okay to have scm related logic in that layer.

IIRC, one possible alternative was to move lock itself to a vfs as the
whole vfs is theoretically covered by the lock.

BTW, the ward itself doesn't require deeper knowledge of repo object right
now. Neither does the auditor. How will the ward API be evolved to depend
on repository-level data?

> > Do you have any idea how to expand this ward() callback to the other
> > vfs
> > operations, which have different set of parameters?
> 
> All other operations (move, delete, copy, etc) could be "translated" to
> 'read' or 'write' mode. An alternative would be to abstract the mode
> with a couple of symbolic constant "create, update, append, delete, …".
> What do you think?

So it will be quite similar to audit(path) if it had a mode flag?
Boris Feld - July 7, 2017, 4:48 p.m.
On Fri, 2017-07-07 at 21:48 +0900, Yuya Nishihara wrote:
> On Thu, 06 Jul 2017 20:53:50 +0200, Boris Feld wrote:
> > On Wed, 2017-07-05 at 23:55 +0900, Yuya Nishihara wrote:
> > > On Sun, 02 Jul 2017 04:56:37 +0200, Pierre-Yves David wrote:
> > > > # HG changeset patch
> > > > # User Pierre-Yves David <pierre-yves.david@ens-lyon.org>
> > > > # Date 1470323266 -7200
> > > > #      Thu Aug 04 17:07:46 2016 +0200
> > > > # Node ID ebf81efdd6d7ff15c64683933635616c00475688
> > > > # Parent  34b8be7f0420b07d0f7b71379c6055e5b26223d5
> > > > # EXP-Topic vfs.ward
> > > > # Available At https://www.mercurial-scm.org/repo/users/marmout
> > > > e/me
> > > > rcurial/
> > > > #              hg pull https://www.mercurial-scm.org/repo/users
> > > > /mar
> > > > moute/mercurial/ -r ebf81efdd6d7
> > > > vfs: add the possibility to have a "ward" to check vfs usage
> > > > The feature has some similarity with the 'pathauditor', but
> > > > further
> > > > digging
> > > > show it make more sense to keep them separated. The path
> > > > auditor is
> > > > fairly
> > > > autonomous (create from vfs.__init__ without any extra
> > > > information), and check
> > > > very generic details. On the other hand, the wards will have
> > > > deeper
> > > > knownledge
> > > > of the Mercurial logic and is created at the local repository
> > > > level. Mixing the
> > > > two would add much complexity for no real gains.
> > > 
> > > My gut feeling is that vfs isn't the right place if they require
> > > deeper
> > > knowledge of repository/dirstate logic. And the whole weakref
> > > business
> > > floating around wards, hooks and transaction seems like just
> > > getting
> > > around
> > > layering violation.
> > 
> > We pondered on this choice a long time and couldn't find any other
> > layer that is used by both Mercurial core and the extensions to
> > access
> > the file system. It seems to be the highest abstraction layer we
> > could
> > hook into without introducing a new one and updating all callers.
> > 
> > Our reflexion is, as the vfs classes comes from the scmutil file,
> > it
> > seems okay to have scm related logic in that layer.
> 
> IIRC, one possible alternative was to move lock itself to a vfs as
> the
> whole vfs is theoretically covered by the lock.

The practice divert a bit from the theory. For example, the
".hg/bookmark" file is in theory covered by the 'wlock'. In practice it
is now covered by the transaction and therefor the 'lock'. On the other
hand there are multiple files handled by the vfs ('hgrc', 'dirstate',
'store/', etc) that are not covered by the wlock.

Right now, all these exceptions are stored on the repository, and
extension can update them. To us, the repository layers seems the right
layer for logic regarding individual file semantic. We also want to add
some logic regarding open transaction to the ward, something unrelated
to the vfs.

The initial motivation for this series (beside catching bug) is to make
it safer to introduce new vfs. For example, the share extensions
suffers from lack of cache sharing, each share has its own '.hg/cache/'
directory. Introducing a 'repo.cvfs' dedicated to cache could fix that,
but we wants to warn people still accessing 'cache/' through
'repo.vfs'.

To push that logic further, "share" actually needs a better distinction
between the working copy specific piece and the more generic pieces. An
extra split of "repo.vfs" could take care of that. However, some of the
file, like bookmark, can be shared or not depending on some config and
share option. So the ward needs access to that logic (already living on
the repository) too.

We need to eventually rework the repository format to a more race free
data structure. This will likely involve more vfs to access data
dispatched in various directory (eg: append-only vs full update). The
ward should help a lot here but it need to be aware of many high level
logic regarding file semantic.

We plan to add more vfs-s and the current locking design seems better
than having one lock per vfs, as multiple vfs will be covered by the
same lock. If we move the locks on the vfs layer, we either would need
to update it again later or introduce much more complex code to manage
the interaction between vfs-s locks.

> 
> BTW, the ward itself doesn't require deeper knowledge of repo object
> right
> now. Neither does the auditor. How will the ward API be evolved to
> depend
> on repository-level data?

In this series, the vfs ward use data from "repo._wlockfreeprefix" to
read its exception. Given the type of data and the need for extension
to wrap it. The repo seems a good place to hold this data.

> 
> > > Do you have any idea how to expand this ward() callback to the
> > > other
> > > vfs
> > > operations, which have different set of parameters?
> > 
> > All other operations (move, delete, copy, etc) could be
> > "translated" to
> > 'read' or 'write' mode. An alternative would be to abstract the
> > mode
> > with a couple of symbolic constant "create, update, append, delete,
> > …".
> > What do you think?
> 
> So it will be quite similar to audit(path) if it had a mode flag?

In the end likely, but updating vfs code to call audit for each
operation and the audit functions source code right now would requires
too much works for this series.

We plan to send a new series after this one that will refactor this
part and it's likely that pathauditor will call the ward with the same
flag as we converge toward something usable everywhere.
Yuya Nishihara - July 8, 2017, 7:36 a.m.
On Fri, 07 Jul 2017 18:48:44 +0200, Boris Feld wrote:
> On Fri, 2017-07-07 at 21:48 +0900, Yuya Nishihara wrote:
> > On Thu, 06 Jul 2017 20:53:50 +0200, Boris Feld wrote:
> > > On Wed, 2017-07-05 at 23:55 +0900, Yuya Nishihara wrote:
> > > > On Sun, 02 Jul 2017 04:56:37 +0200, Pierre-Yves David wrote:
> > > > > # HG changeset patch
> > > > > # User Pierre-Yves David <pierre-yves.david@ens-lyon.org>
> > > > > # Date 1470323266 -7200
> > > > > #      Thu Aug 04 17:07:46 2016 +0200
> > > > > # Node ID ebf81efdd6d7ff15c64683933635616c00475688
> > > > > # Parent  34b8be7f0420b07d0f7b71379c6055e5b26223d5
> > > > > # EXP-Topic vfs.ward
> > > > > # Available At https://www.mercurial-scm.org/repo/users/marmout
> > > > > e/me
> > > > > rcurial/
> > > > > #              hg pull https://www.mercurial-scm.org/repo/users
> > > > > /mar
> > > > > moute/mercurial/ -r ebf81efdd6d7
> > > > > vfs: add the possibility to have a "ward" to check vfs usage
> > > > > The feature has some similarity with the 'pathauditor', but
> > > > > further
> > > > > digging
> > > > > show it make more sense to keep them separated. The path
> > > > > auditor is
> > > > > fairly
> > > > > autonomous (create from vfs.__init__ without any extra
> > > > > information), and check
> > > > > very generic details. On the other hand, the wards will have
> > > > > deeper
> > > > > knownledge
> > > > > of the Mercurial logic and is created at the local repository
> > > > > level. Mixing the
> > > > > two would add much complexity for no real gains.
> > > > 
> > > > My gut feeling is that vfs isn't the right place if they require
> > > > deeper
> > > > knowledge of repository/dirstate logic. And the whole weakref
> > > > business
> > > > floating around wards, hooks and transaction seems like just
> > > > getting
> > > > around
> > > > layering violation.
> > > 
> > > We pondered on this choice a long time and couldn't find any other
> > > layer that is used by both Mercurial core and the extensions to
> > > access
> > > the file system. It seems to be the highest abstraction layer we
> > > could
> > > hook into without introducing a new one and updating all callers.
> > > 
> > > Our reflexion is, as the vfs classes comes from the scmutil file,
> > > it
> > > seems okay to have scm related logic in that layer.
> > 
> > IIRC, one possible alternative was to move lock itself to a vfs as
> > the
> > whole vfs is theoretically covered by the lock.
> 
> The practice divert a bit from the theory. For example, the
> ".hg/bookmark" file is in theory covered by the 'wlock'. In practice it
> is now covered by the transaction and therefor the 'lock'. On the other
> hand there are multiple files handled by the vfs ('hgrc', 'dirstate',
> 'store/', etc) that are not covered by the wlock.
> 
> Right now, all these exceptions are stored on the repository, and
> extension can update them. To us, the repository layers seems the right
> layer for logic regarding individual file semantic. We also want to add
> some logic regarding open transaction to the ward, something unrelated
> to the vfs.
> 
> The initial motivation for this series (beside catching bug) is to make
> it safer to introduce new vfs. For example, the share extensions
> suffers from lack of cache sharing, each share has its own '.hg/cache/'
> directory. Introducing a 'repo.cvfs' dedicated to cache could fix that,
> but we wants to warn people still accessing 'cache/' through
> 'repo.vfs'.
>
> To push that logic further, "share" actually needs a better distinction
> between the working copy specific piece and the more generic pieces. An
> extra split of "repo.vfs" could take care of that. However, some of the
> file, like bookmark, can be shared or not depending on some config and
> share option. So the ward needs access to that logic (already living on
> the repository) too.

It's probably okay to handle simple repo-related business by hooking vfs
calls (via auditor/ward), but the future story sounds pretty complicated.
I don't think it's great idea to introduce many-consumers <-> many-vfs
relation mediated by a low-level hook, "ward".

> We need to eventually rework the repository format to a more race free
> data structure. This will likely involve more vfs to access data
> dispatched in various directory (eg: append-only vs full update). The
> ward should help a lot here but it need to be aware of many high level
> logic regarding file semantic.

At this point, I think we'll need a proper storage layer which shadows
low-level vfs operation.

> We plan to add more vfs-s and the current locking design seems better
> than having one lock per vfs, as multiple vfs will be covered by the
> same lock. If we move the locks on the vfs layer, we either would need
> to update it again later or introduce much more complex code to manage
> the interaction between vfs-s locks.

Yeah, it's annoying vfs itself has physical layering violation.

> > > > Do you have any idea how to expand this ward() callback to the
> > > > other
> > > > vfs
> > > > operations, which have different set of parameters?
> > > 
> > > All other operations (move, delete, copy, etc) could be
> > > "translated" to
> > > 'read' or 'write' mode. An alternative would be to abstract the
> > > mode
> > > with a couple of symbolic constant "create, update, append, delete,
> > > …".
> > > What do you think?
> > 
> > So it will be quite similar to audit(path) if it had a mode flag?
> 
> In the end likely, but updating vfs code to call audit for each
> operation and the audit functions source code right now would requires
> too much works for this series.

Really? IIUC, it's just to add mode parameter to auditor and wrap vfs.audit
by a ward function.

FWIW, a ward object could manage lock, wlock and transaction to get rid of
the weakref.
Boris Feld - July 11, 2017, 2:32 p.m.
On Sat, 2017-07-08 at 16:36 +0900, Yuya Nishihara wrote:
> On Fri, 07 Jul 2017 18:48:44 +0200, Boris Feld wrote:
> > On Fri, 2017-07-07 at 21:48 +0900, Yuya Nishihara wrote:
> > > On Thu, 06 Jul 2017 20:53:50 +0200, Boris Feld wrote:
> > > > On Wed, 2017-07-05 at 23:55 +0900, Yuya Nishihara wrote:
> > > > > On Sun, 02 Jul 2017 04:56:37 +0200, Pierre-Yves David wrote:
> > > > > > # HG changeset patch
> > > > > > # User Pierre-Yves David <pierre-yves.david@ens-lyon.org>
> > > > > > # Date 1470323266 -7200
> > > > > > #      Thu Aug 04 17:07:46 2016 +0200
> > > > > > # Node ID ebf81efdd6d7ff15c64683933635616c00475688
> > > > > > # Parent  34b8be7f0420b07d0f7b71379c6055e5b26223d5
> > > > > > # EXP-Topic vfs.ward
> > > > > > # Available At https://www.mercurial-scm.org/repo/users/mar
> > > > > > mout
> > > > > > e/me
> > > > > > rcurial/
> > > > > > #              hg pull https://www.mercurial-scm.org/repo/u
> > > > > > sers
> > > > > > /mar
> > > > > > moute/mercurial/ -r ebf81efdd6d7
> > > > > > vfs: add the possibility to have a "ward" to check vfs
> > > > > > usage
> > > > > > The feature has some similarity with the 'pathauditor', but
> > > > > > further
> > > > > > digging
> > > > > > show it make more sense to keep them separated. The path
> > > > > > auditor is
> > > > > > fairly
> > > > > > autonomous (create from vfs.__init__ without any extra
> > > > > > information), and check
> > > > > > very generic details. On the other hand, the wards will
> > > > > > have
> > > > > > deeper
> > > > > > knownledge
> > > > > > of the Mercurial logic and is created at the local
> > > > > > repository
> > > > > > level. Mixing the
> > > > > > two would add much complexity for no real gains.
> > > > > 
> > > > > My gut feeling is that vfs isn't the right place if they
> > > > > require
> > > > > deeper
> > > > > knowledge of repository/dirstate logic. And the whole weakref
> > > > > business
> > > > > floating around wards, hooks and transaction seems like just
> > > > > getting
> > > > > around
> > > > > layering violation.
> > > > 
> > > > We pondered on this choice a long time and couldn't find any
> > > > other
> > > > layer that is used by both Mercurial core and the extensions to
> > > > access
> > > > the file system. It seems to be the highest abstraction layer
> > > > we
> > > > could
> > > > hook into without introducing a new one and updating all
> > > > callers.
> > > > 
> > > > Our reflexion is, as the vfs classes comes from the scmutil
> > > > file,
> > > > it
> > > > seems okay to have scm related logic in that layer.
> > > 
> > > IIRC, one possible alternative was to move lock itself to a vfs
> > > as
> > > the
> > > whole vfs is theoretically covered by the lock.
> > 
> > The practice divert a bit from the theory. For example, the
> > ".hg/bookmark" file is in theory covered by the 'wlock'. In
> > practice it
> > is now covered by the transaction and therefor the 'lock'. On the
> > other
> > hand there are multiple files handled by the vfs ('hgrc',
> > 'dirstate',
> > 'store/', etc) that are not covered by the wlock.
> > 
> > Right now, all these exceptions are stored on the repository, and
> > extension can update them. To us, the repository layers seems the
> > right
> > layer for logic regarding individual file semantic. We also want to
> > add
> > some logic regarding open transaction to the ward, something
> > unrelated
> > to the vfs.
> > 
> > The initial motivation for this series (beside catching bug) is to
> > make
> > it safer to introduce new vfs. For example, the share extensions
> > suffers from lack of cache sharing, each share has its own
> > '.hg/cache/'
> > directory. Introducing a 'repo.cvfs' dedicated to cache could fix
> > that,
> > but we wants to warn people still accessing 'cache/' through
> > 'repo.vfs'.
> > 
> > To push that logic further, "share" actually needs a better
> > distinction
> > between the working copy specific piece and the more generic
> > pieces. An
> > extra split of "repo.vfs" could take care of that. However, some of
> > the
> > file, like bookmark, can be shared or not depending on some config
> > and
> > share option. So the ward needs access to that logic (already
> > living on
> > the repository) too.
> 
> It's probably okay to handle simple repo-related business by hooking
> vfs
> calls (via auditor/ward), but the future story sounds pretty
> complicated.
> I don't think it's great idea to introduce many-consumers <-> many-
> vfs
> relation mediated by a low-level hook, "ward".
> 
> > We need to eventually rework the repository format to a more race
> > free
> > data structure. This will likely involve more vfs to access data
> > dispatched in various directory (eg: append-only vs full update).
> > The
> > ward should help a lot here but it need to be aware of many high
> > level
> > logic regarding file semantic.
> 
> At this point, I think we'll need a proper storage layer which
> shadows
> low-level vfs operation.
> 
> > We plan to add more vfs-s and the current locking design seems
> > better
> > than having one lock per vfs, as multiple vfs will be covered by
> > the
> > same lock. If we move the locks on the vfs layer, we either would
> > need
> > to update it again later or introduce much more complex code to
> > manage
> > the interaction between vfs-s locks.
> 
> Yeah, it's annoying vfs itself has physical layering violation.
> 
> > > > > Do you have any idea how to expand this ward() callback to
> > > > > the
> > > > > other
> > > > > vfs
> > > > > operations, which have different set of parameters?
> > > > 
> > > > All other operations (move, delete, copy, etc) could be
> > > > "translated" to
> > > > 'read' or 'write' mode. An alternative would be to abstract the
> > > > mode
> > > > with a couple of symbolic constant "create, update, append,
> > > > delete,
> > > > …".
> > > > What do you think?
> > > 
> > > So it will be quite similar to audit(path) if it had a mode flag?
> > 
> > In the end likely, but updating vfs code to call audit for each
> > operation and the audit functions source code right now would
> > requires
> > too much works for this series.
> 
> Really? IIUC, it's just to add mode parameter to auditor and wrap
> vfs.audit
> by a ward function.
> 
> FWIW, a ward object could manage lock, wlock and transaction to get
> rid of
> the weakref.

We are not sure to follow you here. Introducing a proper storage layer
seems complicated and out-of-scope of this series. I cannot see how we
could make sure that both core and extensions are forced to use this
new storage layer instead of vfs directly. Could you clarify what do
you have in mind?

In the mean time of this refactoring, we're gonna send a V2 that follow
your advice of wrapping audit with the ward function, the code is
cleaner that we anticipated, so it seems fine. It's probably simpler
that the previous design.

We are not sure adding a full ward object responsible for the lock
makes sense. The check are only here for devel-warning so adding a new
object just for them seems a bit too much. What is your opinion about
that? Do you see other use-cases of having this ward object?

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - July 13, 2017, 3:52 p.m.
On Tue, 11 Jul 2017 16:32:46 +0200, Boris Feld wrote:
> On Sat, 2017-07-08 at 16:36 +0900, Yuya Nishihara wrote:
> > On Fri, 07 Jul 2017 18:48:44 +0200, Boris Feld wrote:
> > > On Fri, 2017-07-07 at 21:48 +0900, Yuya Nishihara wrote:
> > > > On Thu, 06 Jul 2017 20:53:50 +0200, Boris Feld wrote:
> > > > > On Wed, 2017-07-05 at 23:55 +0900, Yuya Nishihara wrote:
> > > > > > On Sun, 02 Jul 2017 04:56:37 +0200, Pierre-Yves David wrote:
> > > > > > > # HG changeset patch
> > > > > > > # User Pierre-Yves David <pierre-yves.david@ens-lyon.org>
> > > > > > > # Date 1470323266 -7200
> > > > > > > #      Thu Aug 04 17:07:46 2016 +0200
> > > > > > > # Node ID ebf81efdd6d7ff15c64683933635616c00475688
> > > > > > > # Parent  34b8be7f0420b07d0f7b71379c6055e5b26223d5
> > > > > > > # EXP-Topic vfs.ward
> > > > > > > # Available At https://www.mercurial-scm.org/repo/users/mar
> > > > > > > mout
> > > > > > > e/me
> > > > > > > rcurial/
> > > > > > > #              hg pull https://www.mercurial-scm.org/repo/u
> > > > > > > sers
> > > > > > > /mar
> > > > > > > moute/mercurial/ -r ebf81efdd6d7
> > > > > > > vfs: add the possibility to have a "ward" to check vfs
> > > > > > > usage
> > > > > > > The feature has some similarity with the 'pathauditor', but
> > > > > > > further
> > > > > > > digging
> > > > > > > show it make more sense to keep them separated. The path
> > > > > > > auditor is
> > > > > > > fairly
> > > > > > > autonomous (create from vfs.__init__ without any extra
> > > > > > > information), and check
> > > > > > > very generic details. On the other hand, the wards will
> > > > > > > have
> > > > > > > deeper
> > > > > > > knownledge
> > > > > > > of the Mercurial logic and is created at the local
> > > > > > > repository
> > > > > > > level. Mixing the
> > > > > > > two would add much complexity for no real gains.
> > > > > > 
> > > > > > My gut feeling is that vfs isn't the right place if they
> > > > > > require
> > > > > > deeper
> > > > > > knowledge of repository/dirstate logic. And the whole weakref
> > > > > > business
> > > > > > floating around wards, hooks and transaction seems like just
> > > > > > getting
> > > > > > around
> > > > > > layering violation.
> > > > > 
> > > > > We pondered on this choice a long time and couldn't find any
> > > > > other
> > > > > layer that is used by both Mercurial core and the extensions to
> > > > > access
> > > > > the file system. It seems to be the highest abstraction layer
> > > > > we
> > > > > could
> > > > > hook into without introducing a new one and updating all
> > > > > callers.
> > > > > 
> > > > > Our reflexion is, as the vfs classes comes from the scmutil
> > > > > file,
> > > > > it
> > > > > seems okay to have scm related logic in that layer.
> > > > 
> > > > IIRC, one possible alternative was to move lock itself to a vfs
> > > > as
> > > > the
> > > > whole vfs is theoretically covered by the lock.
> > > 
> > > The practice divert a bit from the theory. For example, the
> > > ".hg/bookmark" file is in theory covered by the 'wlock'. In
> > > practice it
> > > is now covered by the transaction and therefor the 'lock'. On the
> > > other
> > > hand there are multiple files handled by the vfs ('hgrc',
> > > 'dirstate',
> > > 'store/', etc) that are not covered by the wlock.
> > > 
> > > Right now, all these exceptions are stored on the repository, and
> > > extension can update them. To us, the repository layers seems the
> > > right
> > > layer for logic regarding individual file semantic. We also want to
> > > add
> > > some logic regarding open transaction to the ward, something
> > > unrelated
> > > to the vfs.
> > > 
> > > The initial motivation for this series (beside catching bug) is to
> > > make
> > > it safer to introduce new vfs. For example, the share extensions
> > > suffers from lack of cache sharing, each share has its own
> > > '.hg/cache/'
> > > directory. Introducing a 'repo.cvfs' dedicated to cache could fix
> > > that,
> > > but we wants to warn people still accessing 'cache/' through
> > > 'repo.vfs'.
> > > 
> > > To push that logic further, "share" actually needs a better
> > > distinction
> > > between the working copy specific piece and the more generic
> > > pieces. An
> > > extra split of "repo.vfs" could take care of that. However, some of
> > > the
> > > file, like bookmark, can be shared or not depending on some config
> > > and
> > > share option. So the ward needs access to that logic (already
> > > living on
> > > the repository) too.
> > 
> > It's probably okay to handle simple repo-related business by hooking
> > vfs
> > calls (via auditor/ward), but the future story sounds pretty
> > complicated.
> > I don't think it's great idea to introduce many-consumers <-> many-
> > vfs
> > relation mediated by a low-level hook, "ward".
> > 
> > > We need to eventually rework the repository format to a more race
> > > free
> > > data structure. This will likely involve more vfs to access data
> > > dispatched in various directory (eg: append-only vs full update).
> > > The
> > > ward should help a lot here but it need to be aware of many high
> > > level
> > > logic regarding file semantic.
> > 
> > At this point, I think we'll need a proper storage layer which
> > shadows
> > low-level vfs operation.

[...]

> We are not sure to follow you here. Introducing a proper storage layer
> seems complicated and out-of-scope of this series. I cannot see how we
> could make sure that both core and extensions are forced to use this
> new storage layer instead of vfs directly. Could you clarify what do
> you have in mind?

No idea how it will look like. Sometimes I heard read-write order of
bookmarks, changelog, etc. was wrong for example, so I don't think
inspecting vfs operations will prevent us from doing common bad things.
That's one reason why I had a gut feeling that we'll need a higher-level
object to manage repository data.

Patch

diff --git a/mercurial/vfs.py b/mercurial/vfs.py
--- a/mercurial/vfs.py
+++ b/mercurial/vfs.py
@@ -284,7 +284,7 @@  class vfs(abstractvfs):
     This class is used to hide the details of COW semantics and
     remote file access from higher level code.
     '''
-    def __init__(self, base, expandpath=False, realpath=False):
+    def __init__(self, base, expandpath=False, realpath=False, ward=None):
         if expandpath:
             base = util.expandpath(base)
         if realpath:
@@ -293,6 +293,11 @@  class vfs(abstractvfs):
         self.audit = pathutil.pathauditor(self.base)
         self.createmode = None
         self._trustnlink = None
+        # optional function to validate operation on file
+        # intended to be user for developer checks.
+        #
+        # XXX should be call for other things than 'open'
+        self._ward = ward
 
     @util.propertycache
     def _cansymlink(self):
@@ -343,6 +348,9 @@  class vfs(abstractvfs):
         if not text and "b" not in mode:
             mode += "b" # for that other OS
 
+        if self._ward is not None:
+            self._ward(f, mode, atomictemp)
+
         nlink = -1
         if mode not in ('r', 'rb'):
             dirname, basename = util.split(f)