Patchwork [3,of,5] workingctx: add a way for extensions to run code at status fixup time

login
register
mail settings
Submitter Siddharth Agarwal
Date June 12, 2017, 10:36 p.m.
Message ID <f4563ba01434aec8b238.1497306997@devvm31800.prn1.facebook.com>
Download mbox | patch
Permalink /patch/21352/
State Accepted
Headers show

Comments

Siddharth Agarwal - June 12, 2017, 10:36 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1497301010 25200
#      Mon Jun 12 13:56:50 2017 -0700
# Node ID f4563ba01434aec8b238693721c246fc86a40732
# Parent  3097d182d5b89ba643e834215dc41e7e34716857
workingctx: add a way for extensions to run code at status fixup time

Some extensions like fsmonitor need to run code after dirstate.status is
called, but while the wlock is held. The extensions could grab the wlock again,
but that has its own peculiar race issues. For example, fsmonitor would not
like its state to be written out if the dirstate has changed underneath (see
issue5581 for what can go wrong in that sort of case).

To protect against these sorts of issues, allow extensions to declare that they
would like to run some code to run at fixup time.

fsmonitor will switch to using this in the next patch in the series.
Sean Farley - June 12, 2017, 10:56 p.m.
Siddharth Agarwal <sid0@fb.com> writes:

> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1497301010 25200
> #      Mon Jun 12 13:56:50 2017 -0700
> # Node ID f4563ba01434aec8b238693721c246fc86a40732
> # Parent  3097d182d5b89ba643e834215dc41e7e34716857
> workingctx: add a way for extensions to run code at status fixup time
>
> Some extensions like fsmonitor need to run code after dirstate.status is
> called, but while the wlock is held. The extensions could grab the wlock again,
> but that has its own peculiar race issues. For example, fsmonitor would not
> like its state to be written out if the dirstate has changed underneath (see
> issue5581 for what can go wrong in that sort of case).
>
> To protect against these sorts of issues, allow extensions to declare that they
> would like to run some code to run at fixup time.
>
> fsmonitor will switch to using this in the next patch in the series.
>
> diff --git a/mercurial/context.py b/mercurial/context.py
> --- a/mercurial/context.py
> +++ b/mercurial/context.py
> @@ -1742,7 +1742,8 @@ class workingctx(committablectx):
>  
>      def _poststatusfixup(self, status, fixup):
>          """update dirstate for files that are actually clean"""
> -        if fixup:
> +        poststatus = self._repo.postdsstatus()
> +        if fixup or poststatus:
>              try:
>                  oldid = self._repo.dirstate.identity()
>  
> @@ -1752,15 +1753,20 @@ class workingctx(committablectx):
>                  # taking the lock
>                  with self._repo.wlock(False):
>                      if self._repo.dirstate.identity() == oldid:
> -                        normal = self._repo.dirstate.normal
> -                        for f in fixup:
> -                            normal(f)
> -                        # write changes out explicitly, because nesting
> -                        # wlock at runtime may prevent 'wlock.release()'
> -                        # after this block from doing so for subsequent
> -                        # changing files
> -                        tr = self._repo.currenttransaction()
> -                        self._repo.dirstate.write(tr)
> +                        if fixup:
> +                            normal = self._repo.dirstate.normal
> +                            for f in fixup:
> +                                normal(f)
> +                            # write changes out explicitly, because nesting
> +                            # wlock at runtime may prevent 'wlock.release()'
> +                            # after this block from doing so for subsequent
> +                            # changing files
> +                            tr = self._repo.currenttransaction()
> +                            self._repo.dirstate.write(tr)
> +
> +                        if poststatus:
> +                            for ps in poststatus:
> +                                ps(self, status)
>                      else:
>                          # in this case, writing changes out breaks
>                          # consistency, because .hg/dirstate was
> @@ -1770,6 +1776,9 @@ class workingctx(committablectx):
>                                              'identity mismatch\n')
>              except error.LockError:
>                  pass
> +            finally:
> +                # Even if the wlock couldn't be grabbed, clear out the list.
> +                self._repo.clearpostdsstatus()
>  
>      def _dirstatestatus(self, match=None, ignored=False, clean=False,
>                          unknown=False):
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -400,6 +400,9 @@ class localrepository(object):
>          # - bookmark changes
>          self.filteredrevcache = {}
>  
> +        # post-dirstate-status hooks
> +        self._postdsstatus = []
> +
>          # generic mapping between names and nodes
>          self.names = namespaces.namespaces()
>  
> @@ -1884,6 +1887,36 @@ class localrepository(object):
>          return self[node1].status(node2, match, ignored, clean, unknown,
>                                    listsubrepos)
>  
> +    def addpostdsstatus(self, ps):
> +        """Add a callback to run within the wlock, at the point at which status
> +        fixups happen.
> +
> +        On status completion, callback(wctx, status) will be called with the
> +        wlock held, unless the dirstate has changed from underneath or the wlock
> +        couldn't be grabbed.
> +
> +        Callbacks should not capture and use a cached copy of the dirstate --
> +        it might change in the meanwhile. Instead, they should access the
> +        dirstate via wctx.repo().dirstate.
> +
> +        This list is emptied out after each status run -- extensions should
> +        make sure it adds to this list each time dirstate.status is called.
> +        Extensions should also make sure they don't call this for statuses
> +        that don't involve the dirstate.
> +        """
> +
> +        # The list is located here for uniqueness reasons -- it is actually
> +        # managed by the workingctx, but that isn't unique per-repo.
> +        self._postdsstatus.append(ps)
> +
> +    def postdsstatus(self):
> +        """Used by workingctx to get the list of post-dirstate-status hooks."""
> +        return self._postdsstatus
> +
> +    def clearpostdsstatus(self):
> +        """Used by workingctx to clear post-dirstate-status hooks."""
> +        del self._postdsstatus[:]
> +

The series' direction (adding a callback within a lock) looks good to
me. My only feedback is that I'm currently ripping out dirstate from
localrepo and putting it into the workingctx itself. Can these callbacks
live in dirstate.py? Or maybe even workingctx?
Siddharth Agarwal - June 12, 2017, 10:57 p.m.
On 6/12/17 3:56 PM, Sean Farley wrote:
> The series' direction (adding a callback within a lock) looks good to
> me. My only feedback is that I'm currently ripping out dirstate from
> localrepo and putting it into the workingctx itself. Can these callbacks
> live in dirstate.py? Or maybe even workingctx?


I'm afraid not, because just like the workingctx, the dirstate isn't 
unique per-repo. The localrepo is the only place this state can live.
Sean Farley - June 12, 2017, 10:59 p.m.
Siddharth Agarwal <sid@less-broken.com> writes:

> On 6/12/17 3:56 PM, Sean Farley wrote:
>> The series' direction (adding a callback within a lock) looks good to
>> me. My only feedback is that I'm currently ripping out dirstate from
>> localrepo and putting it into the workingctx itself. Can these callbacks
>> live in dirstate.py? Or maybe even workingctx?
>
>
> I'm afraid not, because just like the workingctx, the dirstate isn't 
> unique per-repo. The localrepo is the only place this state can live.

The ownership could be in localrepo, for sure. But the methods could
still live in workingctx (and just call self._repo._dirstate). I'm also
ok with doing the refactor myself and can queue this as-is.
Siddharth Agarwal - June 12, 2017, 11:06 p.m.
On 6/12/17 3:59 PM, Sean Farley wrote:
> The ownership could be in localrepo, for sure. But the methods could
> still live in workingctx (and just call self._repo._dirstate). I'm also
> ok with doing the refactor myself and can queue this as-is.


Hmm, that might be inconvenient for extensions that need to use it, 
since (as in patch 4) they may plug in at the dirstate.status or 
localrepo.status level.

Seems to me that creating a wctx through repo[None] just to mutate some 
state that lives on the localrepo anyway is unnecessarily indirect.

This might make a bit more sense once dirstate is in workingctx, I 
guess. (I don't know what your plans are around dirstate uniqueness -- 
if the dirstate object isn't destroyed on invalidation like it is today, 
then this state can live there.)
Sean Farley - June 12, 2017, 11:14 p.m.
Siddharth Agarwal <sid@less-broken.com> writes:

> On 6/12/17 3:59 PM, Sean Farley wrote:
>> The ownership could be in localrepo, for sure. But the methods could
>> still live in workingctx (and just call self._repo._dirstate). I'm also
>> ok with doing the refactor myself and can queue this as-is.
>
>
> Hmm, that might be inconvenient for extensions that need to use it, 
> since (as in patch 4) they may plug in at the dirstate.status or 
> localrepo.status level.

localrepo.status is just a call (and I guess could be deprecated
now) for wctx.status. I think dirstate.status is fine where it is.

> Seems to me that creating a wctx through repo[None] just to mutate some 
> state that lives on the localrepo anyway is unnecessarily indirect.

I'm hoping it won't be but you might have a point here. My goal is to
more or less make interacting with a repo through context objects (this
could have the side-effect of making localrepo immutable). To that
extent, my current plan is to have anything touching the working
directory go through a workingctx (dirstate, wvfs, etc.)

> This might make a bit more sense once dirstate is in workingctx, I 
> guess. (I don't know what your plans are around dirstate uniqueness -- 
> if the dirstate object isn't destroyed on invalidation like it is today, 
> then this state can live there.)

For now, I'm leaving dirstate in the localrepo (and changing it to
repo._dirstate).

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -1742,7 +1742,8 @@  class workingctx(committablectx):
 
     def _poststatusfixup(self, status, fixup):
         """update dirstate for files that are actually clean"""
-        if fixup:
+        poststatus = self._repo.postdsstatus()
+        if fixup or poststatus:
             try:
                 oldid = self._repo.dirstate.identity()
 
@@ -1752,15 +1753,20 @@  class workingctx(committablectx):
                 # taking the lock
                 with self._repo.wlock(False):
                     if self._repo.dirstate.identity() == oldid:
-                        normal = self._repo.dirstate.normal
-                        for f in fixup:
-                            normal(f)
-                        # write changes out explicitly, because nesting
-                        # wlock at runtime may prevent 'wlock.release()'
-                        # after this block from doing so for subsequent
-                        # changing files
-                        tr = self._repo.currenttransaction()
-                        self._repo.dirstate.write(tr)
+                        if fixup:
+                            normal = self._repo.dirstate.normal
+                            for f in fixup:
+                                normal(f)
+                            # write changes out explicitly, because nesting
+                            # wlock at runtime may prevent 'wlock.release()'
+                            # after this block from doing so for subsequent
+                            # changing files
+                            tr = self._repo.currenttransaction()
+                            self._repo.dirstate.write(tr)
+
+                        if poststatus:
+                            for ps in poststatus:
+                                ps(self, status)
                     else:
                         # in this case, writing changes out breaks
                         # consistency, because .hg/dirstate was
@@ -1770,6 +1776,9 @@  class workingctx(committablectx):
                                             'identity mismatch\n')
             except error.LockError:
                 pass
+            finally:
+                # Even if the wlock couldn't be grabbed, clear out the list.
+                self._repo.clearpostdsstatus()
 
     def _dirstatestatus(self, match=None, ignored=False, clean=False,
                         unknown=False):
diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -400,6 +400,9 @@  class localrepository(object):
         # - bookmark changes
         self.filteredrevcache = {}
 
+        # post-dirstate-status hooks
+        self._postdsstatus = []
+
         # generic mapping between names and nodes
         self.names = namespaces.namespaces()
 
@@ -1884,6 +1887,36 @@  class localrepository(object):
         return self[node1].status(node2, match, ignored, clean, unknown,
                                   listsubrepos)
 
+    def addpostdsstatus(self, ps):
+        """Add a callback to run within the wlock, at the point at which status
+        fixups happen.
+
+        On status completion, callback(wctx, status) will be called with the
+        wlock held, unless the dirstate has changed from underneath or the wlock
+        couldn't be grabbed.
+
+        Callbacks should not capture and use a cached copy of the dirstate --
+        it might change in the meanwhile. Instead, they should access the
+        dirstate via wctx.repo().dirstate.
+
+        This list is emptied out after each status run -- extensions should
+        make sure it adds to this list each time dirstate.status is called.
+        Extensions should also make sure they don't call this for statuses
+        that don't involve the dirstate.
+        """
+
+        # The list is located here for uniqueness reasons -- it is actually
+        # managed by the workingctx, but that isn't unique per-repo.
+        self._postdsstatus.append(ps)
+
+    def postdsstatus(self):
+        """Used by workingctx to get the list of post-dirstate-status hooks."""
+        return self._postdsstatus
+
+    def clearpostdsstatus(self):
+        """Used by workingctx to clear post-dirstate-status hooks."""
+        del self._postdsstatus[:]
+
     def heads(self, start=None):
         if start is None:
             cl = self.changelog