Patchwork [1,of,2] fsmonitor: acquire localrepo.wlock prior to emitting hg.update state

login
register
mail settings
Submitter Wez Furlong
Date May 18, 2017, 9:22 p.m.
Message ID <f4b76c106f3af915667d.1495142522@devbig310.prn1.facebook.com>
Download mbox | patch
Permalink /patch/20690/
State Accepted
Headers show

Comments

Wez Furlong - May 18, 2017, 9:22 p.m.
# HG changeset patch
# User Wez Furlong <wez@fb.com>
# Date 1495136887 25200
#      Thu May 18 12:48:07 2017 -0700
# Node ID f4b76c106f3af915667db7696e27780601e93074
# Parent  8a87bfc5bebbbe0ac996ac8e047a029eb931af45
fsmonitor: acquire localrepo.wlock prior to emitting hg.update state

we see some weird things in the watchman logs where the mercurial
process is seemingly confused about which hg.update state it is publishing
through watchman.

On closer examination, we're seeing conflicting pids for the clients involved
and this implies a race.

To resolve this, we extend the wlock around the state-enter/state-leave
events that are emitted to watchman.

Test Plan:
Some manual testing:

In one window, run this, and then checkout a different rev:

```
$ watchman -p -j <<<'["subscribe", "/data/users/wez/fbsource", "wez", {"expression": ["name", ".hg/updatestate"]}]'
{
    "version": "4.9.0",
    "subscribe": "wez",
    "clock": "c:1495034090:814028:1:312576"
}
{
    "state-enter": "hg.update",
    "version": "4.9.0",
    "clock": "c:1495034090:814028:1:312596",
    "unilateral": true,
    "subscription": "wez",
    "metadata": {
        "status": "ok",
        "distance": 125,
        "rev": "a1275d79ffa6c58b53116c8ec401c275ca6c1e2a",
        "partial": false
    },
    "root": "/data/users/wez/fbsource"
}
{
    "root": "/data/users/wez/fbsource",
    "metadata": {
        "status": "ok",
        "distance": 125,
        "rev": "a1275d79ffa6c58b53116c8ec401c275ca6c1e2a",
        "partial": false
    },
    "subscription": "wez",
    "unilateral": true,
    "version": "4.9.0",
    "clock": "c:1495034090:814028:1:312627",
    "state-leave": "hg.update"
}
```

Tailed the watchman log file and looked for invalid state assertion errors,
then ran my `rebase-all` script to update/rebase all of my heads.

Didn't trigger the error condition (but couldn't reliably trigger it previously
anyway), and the output captured above shows that the states are being emitted
correctly.

Patch

diff --git a/hgext/fsmonitor/__init__.py b/hgext/fsmonitor/__init__.py
--- a/hgext/fsmonitor/__init__.py
+++ b/hgext/fsmonitor/__init__.py
@@ -600,14 +600,25 @@ 
         self.node = node
         self.distance = distance
         self.partial = partial
+        self._lock = None
 
     def __enter__(self):
+        # We explicitly need to take a lock here, before we proceed to update
+        # watchman about the update operation, so that we don't race with
+        # some other actor.  merge.update is going to take the wlock almost
+        # immediately anyway, so this is effectively extending the lock
+        # around a couple of short sanity checks.
+        self._lock = self.repo.wlock()
         self._state('state-enter')
         return self
 
     def __exit__(self, type_, value, tb):
-        status = 'ok' if type_ is None else 'failed'
-        self._state('state-leave', status=status)
+        try:
+            status = 'ok' if type_ is None else 'failed'
+            self._state('state-leave', status=status)
+        finally:
+            if self._lock:
+                self._lock.release()
 
     def _state(self, cmd, status='ok'):
         if not util.safehasattr(self.repo, '_watchmanclient'):