Patchwork [2,of,2,RFC] localrepo: make it possible to use flock in a repo

login
register
mail settings
Submitter Jun Wu
Date May 19, 2017, 4:33 p.m.
Message ID <e59c24bd146290c91094.1495211592@x1c>
Download mbox | patch
Permalink /patch/20735/
State Changes Requested
Headers show

Comments

Jun Wu - May 19, 2017, 4:33 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1495051320 25200
#      Wed May 17 13:02:00 2017 -0700
# Node ID e59c24bd146290c910945e84ac27a7cb4edbf4dd
# Parent  d1ddcac58ac7085b1b0238b74e38871343730c91
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r e59c24bd1462
localrepo: make it possible to use flock in a repo

This patch adds a "flock" repo requirement, once set, Mercurial will attempt
to use flock to lock the repo and store on supported platforms.

The flock will be performed on directory, so no extra files will be created:

  wlock: flock(".hg")
  lock:  flock(".hg/store")

This makes it impossible to have deadlock or repo corruption (no /proc
mount) when running hg inside different Linux pid namespaces with a shared
filesystem.

For concerns about repo corruption on network filesystem or portability,
this should be safe because util.makelock will double-check filesystem type
so nfs/cifs/sshfs and Windows will just error out correctly.

Note: in weird setups, like .hg/store lives on NFS when .hg is on local
disk, the lock is not safe. But we have broken locks on that setup already,
so this patch is at least not making things worse. It even makes things
better in local shared repo use-case [1].

[1]: Consider two repos sharing a store: repo1/.hg and repo2/.hg and
shared/.hg/store. We should really make "repo.lock()" write a lock file at
"shared/.hg/store/lock" but not "repo1/.hg/lock" or "repo2/.hg/lock".  With
this patch, "shared/.hg/store" gets locked properly and there is no such
problem.
Jun Wu - May 19, 2017, 4:41 p.m.
Excerpts from Jun Wu's message of 2017-05-19 09:33:12 -0700:
> Note: in weird setups, like .hg/store lives on NFS when .hg is on local
> disk, the lock is not safe. But we have broken locks on that setup already,
> so this patch is at least not making things worse. It even makes things
> better in local shared repo use-case [1].
> 
> [1]: Consider two repos sharing a store: repo1/.hg and repo2/.hg and
> shared/.hg/store. We should really make "repo.lock()" write a lock file at
> "shared/.hg/store/lock" but not "repo1/.hg/lock" or "repo2/.hg/lock".  With
> this patch, "shared/.hg/store" gets locked properly and there is no such
> problem.

Sorry but I realized this is not true. I thought both wlock and lock are
using vfs, but one of them uses svfs. So it should just work as expected.
Gregory Szorc - May 20, 2017, 1:03 a.m.
On Fri, May 19, 2017 at 9:33 AM, Jun Wu <quark@fb.com> wrote:

> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1495051320 25200
> #      Wed May 17 13:02:00 2017 -0700
> # Node ID e59c24bd146290c910945e84ac27a7cb4edbf4dd
> # Parent  d1ddcac58ac7085b1b0238b74e38871343730c91
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r
> e59c24bd1462
> localrepo: make it possible to use flock in a repo
>
> This patch adds a "flock" repo requirement, once set, Mercurial will
> attempt
> to use flock to lock the repo and store on supported platforms.
>
> The flock will be performed on directory, so no extra files will be
> created:
>
>   wlock: flock(".hg")
>   lock:  flock(".hg/store")
>
> This makes it impossible to have deadlock or repo corruption (no /proc
> mount) when running hg inside different Linux pid namespaces with a shared
> filesystem.
>
> For concerns about repo corruption on network filesystem or portability,
> this should be safe because util.makelock will double-check filesystem type
> so nfs/cifs/sshfs and Windows will just error out correctly.
>
> Note: in weird setups, like .hg/store lives on NFS when .hg is on local
> disk, the lock is not safe. But we have broken locks on that setup already,
> so this patch is at least not making things worse. It even makes things
> better in local shared repo use-case [1].
>
> [1]: Consider two repos sharing a store: repo1/.hg and repo2/.hg and
> shared/.hg/store. We should really make "repo.lock()" write a lock file at
> "shared/.hg/store/lock" but not "repo1/.hg/lock" or "repo2/.hg/lock".  With
> this patch, "shared/.hg/store" gets locked properly and there is no such
> problem.
>

From a high level, this approach seems reasonable. There are some obvious
missing components to the implementation:

* Docs for the new requirement in internals/requirements.txt
* Ability to create new repos with the requirement

I initially thought it surprising that we didn't check for flock support at
repo open time as part of validating requirements. But your commit message
explains that we'll error at lock time upon missing flock support. Since we
don't have reader locks and I think it is user hostile to lock readers out
of repos no longer supporting flock, I think this is fine. But there should
be a test demonstrating the behavior of a flock requirement without flock
support.

Also, I'm going to bikeshed the requirement name. Requirements are global
and will persist for all of time. With that context in mind, "flock" is
arguably too generic. What this feature/requirement actually resembles is
"use flock() for the locking strategy used by the 'store'
layout/requirement which implies use of flock() on the lock and wlock
files." The meaning of the "flock" requirement can thus only be interpreted
in the presence of another requirement, such as "store." After all, if
there is a "store2" requirement, then "flock" likely takes on new meaning
(since we'd likely use a different locking strategy). I believe
requirements should be narrowly tailored to the exact feature they support.
Otherwise if their meaning changes over time, that opens us up to bugs and
possibly repo corruption. So I think a more appropriate requirement name is
something like "store-flock" so the limits of its application (to the lock
and wlock files for the "store" requirement) are explicit and not subject
to reinterpretation. Semantically, this is probably ideally expressed as a
sub-requirement. e.g. "store+flock." But I'd rather punt on that discussion
until we have a few more requirements and the relationship between them is
less clear.


>
> diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
> --- a/mercurial/localrepo.py
> +++ b/mercurial/localrepo.py
> @@ -260,4 +260,5 @@ class localrepository(object):
>          'relshared',
>          'dotencode',
> +        'flock',
>      }
>      openerreqs = {
> @@ -367,4 +368,9 @@ class localrepository(object):
>                  self.requirements, self.sharedpath, vfsmod.vfs)
>          self.spath = self.store.path
> +
> +        if 'flock' in self.requirements and self.spath == self.path:
> +            raise error.RepoError(
> +                _('flock requires store path and repo path to be
> different'))
> +
>          self.svfs = self.store.vfs
>          self.sjoin = self.store.join
> @@ -1335,5 +1341,5 @@ class localrepository(object):
>
>      def _lock(self, vfs, lockname, wait, releasefn, acquirefn, desc,
> -              inheritchecker=None, parentenvvar=None):
> +              inheritchecker=None, parentenvvar=None, flockpath=None):
>          parentlock = None
>          # the contents of parentenvvar are used by the underlying lock to
> @@ -1345,5 +1351,5 @@ class localrepository(object):
>                               acquirefn=acquirefn, desc=desc,
>                               inheritchecker=inheritchecker,
> -                             parentlock=parentlock)
> +                             parentlock=parentlock, flockpath=flockpath)
>          except error.LockHeld as inst:
>              if not wait:
> @@ -1391,6 +1397,12 @@ class localrepository(object):
>              return l
>
> +        if 'flock' in self.requirements:
> +            flockpath = ''
> +        else:
> +            flockpath = None
> +
>          l = self._lock(self.svfs, "lock", wait, None,
> -                       self.invalidate, _('repository %s') %
> self.origroot)
> +                       self.invalidate, _('repository %s') %
> self.origroot,
> +                       flockpath=flockpath)
>          self._lockref = weakref.ref(l)
>          return l
> @@ -1429,9 +1441,14 @@ class localrepository(object):
>              self._filecache['dirstate'].refresh()
>
> +        if 'flock' in self.requirements:
> +            flockpath = ''
> +        else:
> +            flockpath = None
> +
>          l = self._lock(self.vfs, "wlock", wait, unlock,
>                         self.invalidatedirstate, _('working directory of
> %s') %
>                         self.origroot,
>                         inheritchecker=self._wlockchecktransaction,
> -                       parentenvvar='HG_WLOCK_LOCKER')
> +                       parentenvvar='HG_WLOCK_LOCKER',
> flockpath=flockpath)
>          self._wlockref = weakref.ref(l)
>          return l
> diff --git a/tests/hghave.py b/tests/hghave.py
> --- a/tests/hghave.py
> +++ b/tests/hghave.py
> @@ -217,4 +217,18 @@ def has_fifo():
>          return False
>
> +@check("flock", "flock on whitelisted filesystems")
> +def has_flock():
> +    try:
> +        import fcntl
> +        fcntl.flock
> +    except ImportError:
> +        return False
> +    from mercurial import util
> +    try:
> +        fstype = util.getfstype('.')
> +    except OSError:
> +        return False
> +    return fstype in util._flockfswhitelist
> +
>  @check("killdaemons", 'killdaemons.py support')
>  def has_killdaemons():
> diff --git a/tests/test-lock-badness.t b/tests/test-lock-badness.t
> --- a/tests/test-lock-badness.t
> +++ b/tests/test-lock-badness.t
> @@ -1,7 +1,15 @@
> +#testcases default useflock
>  #require unix-permissions no-root no-windows
>
> +#if useflock
> +#require flock
> +#endif
> +
>  Prepare
>
>    $ hg init a
> +#if useflock
> +  $ echo flock >> a/.hg/requires
> +#endif
>    $ echo a > a/a
>    $ hg -R a ci -A -m a
> @@ -11,4 +19,7 @@ Prepare
>    updating to branch default
>    1 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +#if useflock
> +  $ echo flock >> b/.hg/requires
> +#endif
>
>  Test that raising an exception in the release function doesn't cause the
> lock to choke
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Jun Wu - May 20, 2017, 3:26 a.m.
Thanks for the detailed comment! Replied inline.

Excerpts from Gregory Szorc's message of 2017-05-19 18:03:58 -0700:
> From a high level, this approach seems reasonable. There are some obvious
> missing components to the implementation:
> 
> * Docs for the new requirement in internals/requirements.txt
> * Ability to create new repos with the requirement
>
> I initially thought it surprising that we didn't check for flock support at
> repo open time as part of validating requirements. But your commit message
> explains that we'll error at lock time upon missing flock support. Since we
> don't have reader locks and I think it is user hostile to lock readers out
> of repos no longer supporting flock, I think this is fine. But there should
> be a test demonstrating the behavior of a flock requirement without flock
> support.

Lacking of tests, docs, debug commands (to turn this on and off) will be
addressed once this graduates from RFC.

> Also, I'm going to bikeshed the requirement name. Requirements are global
> and will persist for all of time. With that context in mind, "flock" is
> arguably too generic. What this feature/requirement actually resembles is
> "use flock() for the locking strategy used by the 'store'

More precisely, the requirement is "svfs.base" being a different directory
from "vfs.base".

> layout/requirement which implies use of flock() on the lock and wlock
> files." The meaning of the "flock" requirement can thus only be interpreted
> in the presence of another requirement, such as "store." After all, if
> there is a "store2" requirement, then "flock" likely takes on new meaning

If "store2" provides a directory, like repo.svfs = vfs('.hg/store2').
The existing code will just work without modification.

So I don't think it's necessary to distinguish "store-flock" and
"store2-flock".

> (since we'd likely use a different locking strategy). I believe
> requirements should be narrowly tailored to the exact feature they support.
> Otherwise if their meaning changes over time, that opens us up to bugs and
> possibly repo corruption. So I think a more appropriate requirement name is
> something like "store-flock" so the limits of its application (to the lock
> and wlock files for the "store" requirement) are explicit and not subject
> to reinterpretation. Semantically, this is probably ideally expressed as a
> sub-requirement. e.g. "store+flock." But I'd rather punt on that discussion
> until we have a few more requirements and the relationship between them is
> less clear.

With a second thought, I think it may be possible to just unify the locks if
svfs.base == vfs.base. That could be done by letting repo.lock returns
repo.wlock() in this special case (no store + flock).
Gregory Szorc - May 20, 2017, 3:49 a.m.
On Fri, May 19, 2017 at 8:26 PM, Jun Wu <quark@fb.com> wrote:

> Thanks for the detailed comment! Replied inline.
>
> Excerpts from Gregory Szorc's message of 2017-05-19 18:03:58 -0700:
> > From a high level, this approach seems reasonable. There are some obvious
> > missing components to the implementation:
> >
> > * Docs for the new requirement in internals/requirements.txt
> > * Ability to create new repos with the requirement
> >
> > I initially thought it surprising that we didn't check for flock support
> at
> > repo open time as part of validating requirements. But your commit
> message
> > explains that we'll error at lock time upon missing flock support. Since
> we
> > don't have reader locks and I think it is user hostile to lock readers
> out
> > of repos no longer supporting flock, I think this is fine. But there
> should
> > be a test demonstrating the behavior of a flock requirement without flock
> > support.
>
> Lacking of tests, docs, debug commands (to turn this on and off) will be
> addressed once this graduates from RFC.
>
> > Also, I'm going to bikeshed the requirement name. Requirements are global
> > and will persist for all of time. With that context in mind, "flock" is
> > arguably too generic. What this feature/requirement actually resembles is
> > "use flock() for the locking strategy used by the 'store'
>
> More precisely, the requirement is "svfs.base" being a different directory
> from "vfs.base".
>
> > layout/requirement which implies use of flock() on the lock and wlock
> > files." The meaning of the "flock" requirement can thus only be
> interpreted
> > in the presence of another requirement, such as "store." After all, if
> > there is a "store2" requirement, then "flock" likely takes on new meaning
>
> If "store2" provides a directory, like repo.svfs = vfs('.hg/store2').
> The existing code will just work without modification.
>

OK. I failed to understand that it is locking directories and not files.

I don't want to go down this rat hole, but since it may help my point...

The next store format will likely use some form of "generational store."
Files protected by transactions that aren't append only (like bookmarks and
phases) will exist in a directory corresponding to their
transaction/generation number. e.g. .hg/store/txn0/bookmarks,
.hg/store/txn1500/phaseroots. This allows readers to have consistent
snapshots since we never rewrite existing bytes: we just copy or create
files and have a pointer to the "current" generation. In this model you
have reader locks so you know when there are no more active consumers for a
generation and can GC its files.

Anyway, the locking model may be radically different. You can't rely on it
being based on whole directories. So the semantics of a "flock" requirement
could vary drastically depending on the store and other repo features in
use. And I think having the semantics of a requirement vary depending on
the presence of other requirements is not ideal. You want a requirement to
mean one thing and one thing only and not be subject to multiple
interpretations.


>
> So I don't think it's necessary to distinguish "store-flock" and
> "store2-flock".
>
> > (since we'd likely use a different locking strategy). I believe
> > requirements should be narrowly tailored to the exact feature they
> support.
> > Otherwise if their meaning changes over time, that opens us up to bugs
> and
> > possibly repo corruption. So I think a more appropriate requirement name
> is
> > something like "store-flock" so the limits of its application (to the
> lock
> > and wlock files for the "store" requirement) are explicit and not subject
> > to reinterpretation. Semantically, this is probably ideally expressed as
> a
> > sub-requirement. e.g. "store+flock." But I'd rather punt on that
> discussion
> > until we have a few more requirements and the relationship between them
> is
> > less clear.
>
> With a second thought, I think it may be possible to just unify the locks
> if
> svfs.base == vfs.base. That could be done by letting repo.lock returns
> repo.wlock() in this special case (no store + flock).
>

The "store" requirement was added in Mercurial 0.9.2. The chances of this
special case of
no store + flock occurring in 2017 are approximately 0%. I dare say we
should add code to
prevent combinations of requirements like this because they make no sense
and are a footgun waiting to fire. To prove my point, I just created a repo
with format.usestore=false and experimental.treemanifest=true. It did seem
to work, surprisingly. But it would be foolish to run such a config given
the pre-store format has been out of style for over 10 years.
Jun Wu - May 20, 2017, 4:06 a.m.
Excerpts from Gregory Szorc's message of 2017-05-19 20:49:47 -0700:
> OK. I failed to understand that it is locking directories and not files.
> 
> I don't want to go down this rat hole, but since it may help my point...
> 
> The next store format will likely use some form of "generational store."
> Files protected by transactions that aren't append only (like bookmarks and
> phases) will exist in a directory corresponding to their
> transaction/generation number. e.g. .hg/store/txn0/bookmarks,
> .hg/store/txn1500/phaseroots. This allows readers to have consistent
> snapshots since we never rewrite existing bytes: we just copy or create
> files and have a pointer to the "current" generation. In this model you
> have reader locks so you know when there are no more active consumers for a
> generation and can GC its files.
>
> Anyway, the locking model may be radically different. You can't rely on it
> being based on whole directories. So the semantics of a "flock" requirement
> could vary drastically depending on the store and other repo features in
> use. And I think having the semantics of a requirement vary depending on
> the presence of other requirements is not ideal. You want a requirement to
> mean one thing and one thing only and not be subject to multiple
> interpretations.

How about creating files then? For every existing file existence lock at
$PATH, create $PATH.flock first (but never delete or replace it), and then
flock it. That'll be compatible with every file-existence lock (like two
locks under a same directory).

> The "store" requirement was added in Mercurial 0.9.2. The chances of this
> special case of
> no store + flock occurring in 2017 are approximately 0%. I dare say we
> should add code to
> prevent combinations of requirements like this because they make no sense
> and are a footgun waiting to fire. To prove my point, I just created a repo
> with format.usestore=false and experimental.treemanifest=true. It did seem
> to work, surprisingly. But it would be foolish to run such a config given
> the pre-store format has been out of style for over 10 years.
Pierre-Yves David - May 20, 2017, 4:38 p.m.
On 05/20/2017 03:03 AM, Gregory Szorc wrote:
> On Fri, May 19, 2017 at 9:33 AM, Jun Wu <quark@fb.com
> <mailto:quark@fb.com>> wrote:
>
>     # HG changeset patch
>     # User Jun Wu <quark@fb.com <mailto:quark@fb.com>>
>     # Date 1495051320 25200
>     #      Wed May 17 13:02:00 2017 -0700
>     # Node ID e59c24bd146290c910945e84ac27a7cb4edbf4dd
>     # Parent  d1ddcac58ac7085b1b0238b74e38871343730c91
>     # Available At https://bitbucket.org/quark-zju/hg-draft
>     <https://bitbucket.org/quark-zju/hg-draft>
>     #              hg pull https://bitbucket.org/quark-zju/hg-draft
>     <https://bitbucket.org/quark-zju/hg-draft> -r e59c24bd1462
>     localrepo: make it possible to use flock in a repo
>
>     This patch adds a "flock" repo requirement, once set, Mercurial will
>     attempt
>     to use flock to lock the repo and store on supported platforms.
>
>     The flock will be performed on directory, so no extra files will be
>     created:
>
>       wlock: flock(".hg")
>       lock:  flock(".hg/store")
>
>     This makes it impossible to have deadlock or repo corruption (no /proc
>     mount) when running hg inside different Linux pid namespaces with a
>     shared
>     filesystem.
>
>     For concerns about repo corruption on network filesystem or portability,
>     this should be safe because util.makelock will double-check
>     filesystem type
>     so nfs/cifs/sshfs and Windows will just error out correctly.
>
>     Note: in weird setups, like .hg/store lives on NFS when .hg is on local
>     disk, the lock is not safe. But we have broken locks on that setup
>     already,
>     so this patch is at least not making things worse. It even makes things
>     better in local shared repo use-case [1].
>
>     [1]: Consider two repos sharing a store: repo1/.hg and repo2/.hg and
>     shared/.hg/store. We should really make "repo.lock()" write a lock
>     file at
>     "shared/.hg/store/lock" but not "repo1/.hg/lock" or
>     "repo2/.hg/lock".  With
>     this patch, "shared/.hg/store" gets locked properly and there is no such
>     problem.
>
>
> From a high level, this approach seems reasonable. There are some
> obvious missing components to the implementation:
>
> * Docs for the new requirement in internals/requirements.txt
> * Ability to create new repos with the requirement

+1. The usual way to do this is to add a 'format.usexyz' config option. 
Adding support for upgrading the repository in `hg debugupgraderepo` 
would be nice too.

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -260,4 +260,5 @@  class localrepository(object):
         'relshared',
         'dotencode',
+        'flock',
     }
     openerreqs = {
@@ -367,4 +368,9 @@  class localrepository(object):
                 self.requirements, self.sharedpath, vfsmod.vfs)
         self.spath = self.store.path
+
+        if 'flock' in self.requirements and self.spath == self.path:
+            raise error.RepoError(
+                _('flock requires store path and repo path to be different'))
+
         self.svfs = self.store.vfs
         self.sjoin = self.store.join
@@ -1335,5 +1341,5 @@  class localrepository(object):
 
     def _lock(self, vfs, lockname, wait, releasefn, acquirefn, desc,
-              inheritchecker=None, parentenvvar=None):
+              inheritchecker=None, parentenvvar=None, flockpath=None):
         parentlock = None
         # the contents of parentenvvar are used by the underlying lock to
@@ -1345,5 +1351,5 @@  class localrepository(object):
                              acquirefn=acquirefn, desc=desc,
                              inheritchecker=inheritchecker,
-                             parentlock=parentlock)
+                             parentlock=parentlock, flockpath=flockpath)
         except error.LockHeld as inst:
             if not wait:
@@ -1391,6 +1397,12 @@  class localrepository(object):
             return l
 
+        if 'flock' in self.requirements:
+            flockpath = ''
+        else:
+            flockpath = None
+
         l = self._lock(self.svfs, "lock", wait, None,
-                       self.invalidate, _('repository %s') % self.origroot)
+                       self.invalidate, _('repository %s') % self.origroot,
+                       flockpath=flockpath)
         self._lockref = weakref.ref(l)
         return l
@@ -1429,9 +1441,14 @@  class localrepository(object):
             self._filecache['dirstate'].refresh()
 
+        if 'flock' in self.requirements:
+            flockpath = ''
+        else:
+            flockpath = None
+
         l = self._lock(self.vfs, "wlock", wait, unlock,
                        self.invalidatedirstate, _('working directory of %s') %
                        self.origroot,
                        inheritchecker=self._wlockchecktransaction,
-                       parentenvvar='HG_WLOCK_LOCKER')
+                       parentenvvar='HG_WLOCK_LOCKER', flockpath=flockpath)
         self._wlockref = weakref.ref(l)
         return l
diff --git a/tests/hghave.py b/tests/hghave.py
--- a/tests/hghave.py
+++ b/tests/hghave.py
@@ -217,4 +217,18 @@  def has_fifo():
         return False
 
+@check("flock", "flock on whitelisted filesystems")
+def has_flock():
+    try:
+        import fcntl
+        fcntl.flock
+    except ImportError:
+        return False
+    from mercurial import util
+    try:
+        fstype = util.getfstype('.')
+    except OSError:
+        return False
+    return fstype in util._flockfswhitelist
+
 @check("killdaemons", 'killdaemons.py support')
 def has_killdaemons():
diff --git a/tests/test-lock-badness.t b/tests/test-lock-badness.t
--- a/tests/test-lock-badness.t
+++ b/tests/test-lock-badness.t
@@ -1,7 +1,15 @@ 
+#testcases default useflock
 #require unix-permissions no-root no-windows
 
+#if useflock
+#require flock
+#endif
+
 Prepare
 
   $ hg init a
+#if useflock
+  $ echo flock >> a/.hg/requires
+#endif
   $ echo a > a/a
   $ hg -R a ci -A -m a
@@ -11,4 +19,7 @@  Prepare
   updating to branch default
   1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+#if useflock
+  $ echo flock >> b/.hg/requires
+#endif
 
 Test that raising an exception in the release function doesn't cause the lock to choke