Patchwork [4,of,4,STABLE] lock: show about possibility of lock corruption for empty locker

login
register
mail settings
Submitter Katsunori FUJIWARA
Date April 29, 2017, 8:38 a.m.
Message ID <650d9f6bf1e7c1f6dee9.1493455090@juju>
Download mbox | patch
Permalink /patch/20312/
State Superseded
Headers show

Comments

Katsunori FUJIWARA - April 29, 2017, 8:38 a.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1493454424 -32400
#      Sat Apr 29 17:27:04 2017 +0900
# Branch stable
# Node ID 650d9f6bf1e7c1f6dee9d82616af2a56aaf56a2e
# Parent  b64713f57a22301b6f481cd8e401542f46356c94
lock: show about possibility of lock corruption for empty locker

Before this patch, if corrupted lock file makes vfs.readlink() return
empty string, lock acquisition waits for a while, even though lock is
never acquired in such case (this issue was reported by a user using
Mercurial on Windows).

This issue occurs in steps below:

  1. vfs.readlink() returns empty string as "locker" for corrupted
     lock file

  2. lock._testlock() returns empty string, because split(":", 1) on
     "locker" fails by ValueError

  3. lock._trylock() raises LockHeld(EAGAIN), because empty string
     "locker" isn't None

  4. lock.lock() sleeps for next trial

  5. steps above are repeated while timeout, because corrupted lock
     file isn't removed by lock._testlock() at (2)

  6. lock acquisition (and "hg" command, maybe) is aborted by timeout

Then, lock acquisition fails until removal of corrupted lock file by
hand ("hg debuglocks" or so).

On the other hand, it isn't safe to assume that lock was corrupted at
unlocking. There is no mechanical way to determine whether "locker"
process of corrupted lock is still alive or not exactly.

In addition to it, empty "locker" might mean "locking in progress" on
no-symlink platform, because vfs.makelock() isn't atomic action on
such platform (consisting of "open", "write', and "close").

Therefore, it should be responsibility of end users to determine
whether such corrupted lock can be safely removed or not.

This patch shows about possibility of lock corruption for empty
"locker", and suggests use of "hg debuglocks" in "hint" message to
resolve an issue, even though debug commands should be hidden from
ordinary users.

BTW, there is no client code path, which causes "inst.errno !=
errno.ETIMEDOUT" route in callcatch() of scmutil.py, in Mercurial
source tree, AFAIK. lock.py implementation always re-raise LockHeld
with errno.ETIMEDOUT for internal LockHeld. This is reason why this
patch doesn't have test for this route.
Matt Harbison - April 29, 2017, 3:28 p.m.
On Sat, 29 Apr 2017 04:38:10 -0400, FUJIWARA Katsunori  
<foozy@lares.dti.ne.jp> wrote:

> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1493454424 -32400
> #      Sat Apr 29 17:27:04 2017 +0900
> # Branch stable
> # Node ID 650d9f6bf1e7c1f6dee9d82616af2a56aaf56a2e
> # Parent  b64713f57a22301b6f481cd8e401542f46356c94
> lock: show about possibility of lock corruption for empty locker

> +  $ echo foo > foo
> +  $ hg add foo
> +  waiting for lock on working directory of $TESTTMP/t2, but might be  
> corrupted (glob)

Maybe "..., but the lock might be corrupted"?

It seems better to be very explicit when mentioning "working directory"  
and "corrupted" in the same sentence.

> +  abort: working directory of $TESTTMP/t2: timed out waiting for lock,  
> but might be corrupted (glob)

> +  (use 'hg debuglocks' to check/unlock corrupted lock)
> +  [255]
> +
> +(check still corrupted)
> +
> +  $ hg debuglocks
> +  lock:  free
> +  wlock: user *, but corrupted (*s) (glob)
> +  [1]
> +
> +(check normal waiting/timedout messages for safety, too)
> +
> +  $ printf '1234' > .hg/wlock
> +
> +  $ hg add foo
> +  waiting for lock on working directory of $TESTTMP/t2 held by '1234'  
> (glob)
> +  abort: working directory of $TESTTMP/t2: timed out waiting for lock  
> held by 1234 (glob)
> +  [255]
> +
> +  $ printf 'foobar:1234' > .hg/wlock
> +
> +  $ hg add foo
> +  waiting for lock on working directory of $TESTTMP/t2 held by process  
> '1234' on host 'foobar' (glob)
> +  abort: working directory of $TESTTMP/t2: timed out waiting for lock  
> held by foobar:1234 (glob)
> +  [255]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Katsunori FUJIWARA - April 29, 2017, 6:37 p.m.
At Sat, 29 Apr 2017 11:28:14 -0400,
Matt Harbison wrote:
> 
> On Sat, 29 Apr 2017 04:38:10 -0400, FUJIWARA Katsunori  
> <foozy@lares.dti.ne.jp> wrote:
> 
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1493454424 -32400
> > #      Sat Apr 29 17:27:04 2017 +0900
> > # Branch stable
> > # Node ID 650d9f6bf1e7c1f6dee9d82616af2a56aaf56a2e
> > # Parent  b64713f57a22301b6f481cd8e401542f46356c94
> > lock: show about possibility of lock corruption for empty locker
> 
> > +  $ echo foo > foo
> > +  $ hg add foo
> > +  waiting for lock on working directory of $TESTTMP/t2, but might be  
> > corrupted (glob)
> 
> Maybe "..., but the lock might be corrupted"?
> 
> It seems better to be very explicit when mentioning "working directory"  
> and "corrupted" in the same sentence.

Thank you for suggestion!

I'll send revised series after waiting for other comments for a while.

> > +  abort: working directory of $TESTTMP/t2: timed out waiting for lock,  
> > but might be corrupted (glob)
> 
> > +  (use 'hg debuglocks' to check/unlock corrupted lock)
> > +  [255]
> > +
> > +(check still corrupted)
> > +
> > +  $ hg debuglocks
> > +  lock:  free
> > +  wlock: user *, but corrupted (*s) (glob)
> > +  [1]
> > +
> > +(check normal waiting/timedout messages for safety, too)
> > +
> > +  $ printf '1234' > .hg/wlock
> > +
> > +  $ hg add foo
> > +  waiting for lock on working directory of $TESTTMP/t2 held by '1234'  
> > (glob)
> > +  abort: working directory of $TESTTMP/t2: timed out waiting for lock  
> > held by 1234 (glob)
> > +  [255]
> > +
> > +  $ printf 'foobar:1234' > .hg/wlock
> > +
> > +  $ hg add foo
> > +  waiting for lock on working directory of $TESTTMP/t2 held by process  
> > '1234' on host 'foobar' (glob)
> > +  abort: working directory of $TESTTMP/t2: timed out waiting for lock  
> > held by foobar:1234 (glob)
> > +  [255]
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Yuya Nishihara - April 30, 2017, 1:13 p.m.
On Sat, 29 Apr 2017 17:38:10 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1493454424 -32400
> #      Sat Apr 29 17:27:04 2017 +0900
> # Branch stable
> # Node ID 650d9f6bf1e7c1f6dee9d82616af2a56aaf56a2e
> # Parent  b64713f57a22301b6f481cd8e401542f46356c94
> lock: show about possibility of lock corruption for empty locker

The change looks good to me.

> --- a/mercurial/lock.py
> +++ b/mercurial/lock.py
> @@ -161,7 +161,8 @@ class lock(object):
>          for new-style locks.
>          """
>          try:
> -            return self.vfs.readlock(self.f)
> +            # normalize white space by strip() to detect corruption easily
> +            return self.vfs.readlock(self.f).strip()

I'm not sure if stripping whitespace makes sense. Do we write '\n' to a
lockfile on Windows?

> --- a/mercurial/scmutil.py
> +++ b/mercurial/scmutil.py
> @@ -151,10 +151,20 @@ def callcatch(ui, func):
>      # Mercurial-specific first, followed by built-in and library exceptions
>      except error.LockHeld as inst:
>          if inst.errno == errno.ETIMEDOUT:
> -            reason = _('timed out waiting for lock held by %s') % inst.locker
> +            if inst.locker:
> +                reason = _('timed out waiting for lock held by %s'
> +                           ) % inst.locker
> +            else:
> +                reason = _('timed out waiting for lock, but might be corrupted')
>          else:
> -            reason = _('lock held by %s') % inst.locker
> +            if inst.locker:
> +                reason = _('lock held by %s') % inst.locker
> +            else:
> +                reason = _('lock held, but might be corrupted')

I would simply say "lock held by (unknown)" since I don't think a corrupted
lock file is more likely to be empty.

>          ui.warn(_("abort: %s: %s\n") % (inst.desc or inst.filename, reason))
> +        if not inst.locker:
> +            ui.warn('(%s)\n' %
> +                    _("use 'hg debuglocks' to check/unlock corrupted lock"))
Katsunori FUJIWARA - April 30, 2017, 5:07 p.m.
At Sun, 30 Apr 2017 22:13:57 +0900,
Yuya Nishihara wrote:
> 
> On Sat, 29 Apr 2017 17:38:10 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1493454424 -32400
> > #      Sat Apr 29 17:27:04 2017 +0900
> > # Branch stable
> > # Node ID 650d9f6bf1e7c1f6dee9d82616af2a56aaf56a2e
> > # Parent  b64713f57a22301b6f481cd8e401542f46356c94
> > lock: show about possibility of lock corruption for empty locker
> 
> The change looks good to me.
> 
> > --- a/mercurial/lock.py
> > +++ b/mercurial/lock.py
> > @@ -161,7 +161,8 @@ class lock(object):
> >          for new-style locks.
> >          """
> >          try:
> > -            return self.vfs.readlock(self.f)
> > +            # normalize white space by strip() to detect corruption easily
> > +            return self.vfs.readlock(self.f).strip()
> 
> I'm not sure if stripping whitespace makes sense. Do we write '\n' to a
> lockfile on Windows?

I apply strip() on locker, because we can't assume that corrupted file
always causes empty string. It might consist of white space, and
showing such string as locker is meaningless for users.

But, I'm OK to omit strip() on non-None locker string.


> > --- a/mercurial/scmutil.py
> > +++ b/mercurial/scmutil.py
> > @@ -151,10 +151,20 @@ def callcatch(ui, func):
> >      # Mercurial-specific first, followed by built-in and library exceptions
> >      except error.LockHeld as inst:
> >          if inst.errno == errno.ETIMEDOUT:
> > -            reason = _('timed out waiting for lock held by %s') % inst.locker
> > +            if inst.locker:
> > +                reason = _('timed out waiting for lock held by %s'
> > +                           ) % inst.locker
> > +            else:
> > +                reason = _('timed out waiting for lock, but might be corrupted')
> >          else:
> > -            reason = _('lock held by %s') % inst.locker
> > +            if inst.locker:
> > +                reason = _('lock held by %s') % inst.locker
> > +            else:
> > +                reason = _('lock held, but might be corrupted')
> 
> I would simply say "lock held by (unknown)" since I don't think a corrupted
> lock file is more likely to be empty.

Are you OK for a hint message itself like below for an empty lock
file, because an empty lock file is more likely to be corrupted,
especially after timeout ?

  abort: loock held by unknown
  (it might be corrupted, use 'hg debuglocks' to check/unlock)


> >          ui.warn(_("abort: %s: %s\n") % (inst.desc or inst.filename, reason))
> > +        if not inst.locker:
> > +            ui.warn('(%s)\n' %
> > +                    _("use 'hg debuglocks' to check/unlock corrupted lock"))
>
Yuya Nishihara - May 1, 2017, 2:43 a.m.
On Mon, 01 May 2017 02:07:56 +0900, FUJIWARA Katsunori wrote:
> At Sun, 30 Apr 2017 22:13:57 +0900,
> Yuya Nishihara wrote:
> > On Sat, 29 Apr 2017 17:38:10 +0900, FUJIWARA Katsunori wrote:
> > > # HG changeset patch
> > > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > > # Date 1493454424 -32400
> > > #      Sat Apr 29 17:27:04 2017 +0900
> > > # Branch stable
> > > # Node ID 650d9f6bf1e7c1f6dee9d82616af2a56aaf56a2e
> > > # Parent  b64713f57a22301b6f481cd8e401542f46356c94
> > > lock: show about possibility of lock corruption for empty locker
> > 
> > The change looks good to me.
> > 
> > > --- a/mercurial/lock.py
> > > +++ b/mercurial/lock.py
> > > @@ -161,7 +161,8 @@ class lock(object):
> > >          for new-style locks.
> > >          """
> > >          try:
> > > -            return self.vfs.readlock(self.f)
> > > +            # normalize white space by strip() to detect corruption easily
> > > +            return self.vfs.readlock(self.f).strip()
> > 
> > I'm not sure if stripping whitespace makes sense. Do we write '\n' to a
> > lockfile on Windows?
> 
> I apply strip() on locker, because we can't assume that corrupted file
> always causes empty string. It might consist of white space, and
> showing such string as locker is meaningless for users.

That's the problem in ui layer. Maybe we can show the locker with quotes?

  lock held by '%s'

> > I would simply say "lock held by (unknown)" since I don't think a corrupted
> > lock file is more likely to be empty.
> 
> Are you OK for a hint message itself like below for an empty lock
> file, because an empty lock file is more likely to be corrupted,
> especially after timeout ?
> 
>   abort: loock held by unknown
>   (it might be corrupted, use 'hg debuglocks' to check/unlock)

Yeah, showing a hint seems reasonable.

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -1347,9 +1347,13 @@  class localrepository(object):
                 self.ui.warn(
                     _("waiting for lock on %s held by process %r "
                       "on host %r\n") % (desc, pid, host))
-            else:
+            elif inst.locker:
                 self.ui.warn(_("waiting for lock on %s held by %r\n") %
                              (desc, inst.locker))
+            else:
+                self.ui.warn(_("waiting for lock on %s,"
+                               " but might be corrupted\n") %
+                             (desc))
             # default to 600 seconds timeout
             l = lockmod.lock(vfs, lockname,
                              int(self.ui.config("ui", "timeout", "600")),
diff --git a/mercurial/lock.py b/mercurial/lock.py
--- a/mercurial/lock.py
+++ b/mercurial/lock.py
@@ -161,7 +161,8 @@  class lock(object):
         for new-style locks.
         """
         try:
-            return self.vfs.readlock(self.f)
+            # normalize white space by strip() to detect corruption easily
+            return self.vfs.readlock(self.f).strip()
         except (OSError, IOError) as why:
             if why.errno == errno.ENOENT:
                 return None
diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
--- a/mercurial/scmutil.py
+++ b/mercurial/scmutil.py
@@ -151,10 +151,20 @@  def callcatch(ui, func):
     # Mercurial-specific first, followed by built-in and library exceptions
     except error.LockHeld as inst:
         if inst.errno == errno.ETIMEDOUT:
-            reason = _('timed out waiting for lock held by %s') % inst.locker
+            if inst.locker:
+                reason = _('timed out waiting for lock held by %s'
+                           ) % inst.locker
+            else:
+                reason = _('timed out waiting for lock, but might be corrupted')
         else:
-            reason = _('lock held by %s') % inst.locker
+            if inst.locker:
+                reason = _('lock held by %s') % inst.locker
+            else:
+                reason = _('lock held, but might be corrupted')
         ui.warn(_("abort: %s: %s\n") % (inst.desc or inst.filename, reason))
+        if not inst.locker:
+            ui.warn('(%s)\n' %
+                    _("use 'hg debuglocks' to check/unlock corrupted lock"))
     except error.LockUnavailable as inst:
         ui.warn(_("abort: could not lock %s: %s\n") %
                (inst.desc or inst.filename, inst.strerror))
diff --git a/tests/test-no-symlinks.t b/tests/test-no-symlinks.t
--- a/tests/test-no-symlinks.t
+++ b/tests/test-no-symlinks.t
@@ -64,6 +64,12 @@  Test handling lock corruption
   lock:  free
   wlock: free
 
+  $ cat >> .hg/hgrc <<EOF
+  > [ui]
+  > # for immediate timeout
+  > timeout = 0
+  > EOF
+
 (emulate lock corruption by empty file)
 
   $ printf '' > .hg/wlock
@@ -72,3 +78,33 @@  Test handling lock corruption
   lock:  free
   wlock: user *, but corrupted (*s) (glob)
   [1]
+
+  $ echo foo > foo
+  $ hg add foo
+  waiting for lock on working directory of $TESTTMP/t2, but might be corrupted (glob)
+  abort: working directory of $TESTTMP/t2: timed out waiting for lock, but might be corrupted (glob)
+  (use 'hg debuglocks' to check/unlock corrupted lock)
+  [255]
+
+(check still corrupted)
+
+  $ hg debuglocks
+  lock:  free
+  wlock: user *, but corrupted (*s) (glob)
+  [1]
+
+(check normal waiting/timedout messages for safety, too)
+
+  $ printf '1234' > .hg/wlock
+
+  $ hg add foo
+  waiting for lock on working directory of $TESTTMP/t2 held by '1234' (glob)
+  abort: working directory of $TESTTMP/t2: timed out waiting for lock held by 1234 (glob)
+  [255]
+
+  $ printf 'foobar:1234' > .hg/wlock
+
+  $ hg add foo
+  waiting for lock on working directory of $TESTTMP/t2 held by process '1234' on host 'foobar' (glob)
+  abort: working directory of $TESTTMP/t2: timed out waiting for lock held by foobar:1234 (glob)
+  [255]