Patchwork [STABLE,v2] hg: obtain lock when creating share from pooled repo (issue5104)

login
register
mail settings
Submitter Gregory Szorc
Date Feb. 28, 2016, 2:21 a.m.
Message ID <37ba948d189f0f8b3a07.1456626073@ubuntu-vm-main>
Download mbox | patch
Permalink /patch/13438/
State Superseded
Headers show

Comments

Gregory Szorc - Feb. 28, 2016, 2:21 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1456625878 28800
#      Sat Feb 27 18:17:58 2016 -0800
# Branch stable
# Node ID 37ba948d189f0f8b3a0726c4c89038a452d6fd4c
# Parent  cb6a952efbf48d306a7c82d7fe46115f072cbb1d
hg: obtain lock when creating share from pooled repo (issue5104)

There are race conditions between clients performing a shared clone
to pooled storage:

1) Clients race to create the new shared repo in the pool directory
2) 1 client is seeding the repo in the pool directory and another goes
   to share it before it is fully cloned

We prevent these race conditions by obtaining a lock in the pool
directory that is derived from the name of the repo we will be
accessing.

To test this, a simple generic "lockdelay" extension has been added.
The extension inserts an optional, configurable delay before or after
lock acquisition. In the test, we delay 2 seconds after lock acquisition
in the first process and 1 second before lock acquisition in the 2nd
process. This means the first process has 1s to obtain the lock. There
is a race condition here. If we encounter it in the wild, we could
change the dummy extension to wait on the lock file to appear instead
of relying on timing. But that's more complicated. Let's see what
happens first.
Gregory Szorc - Feb. 28, 2016, 2:26 a.m.
On Sat, Feb 27, 2016 at 6:21 PM, Gregory Szorc <gregory.szorc@gmail.com>
wrote:

> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1456625878 28800
> #      Sat Feb 27 18:17:58 2016 -0800
> # Branch stable
> # Node ID 37ba948d189f0f8b3a0726c4c89038a452d6fd4c
> # Parent  cb6a952efbf48d306a7c82d7fe46115f072cbb1d
> hg: obtain lock when creating share from pooled repo (issue5104)
>
> There are race conditions between clients performing a shared clone
> to pooled storage:
>
> 1) Clients race to create the new shared repo in the pool directory
> 2) 1 client is seeding the repo in the pool directory and another goes
>    to share it before it is fully cloned
>
> We prevent these race conditions by obtaining a lock in the pool
> directory that is derived from the name of the repo we will be
> accessing.
>
> To test this, a simple generic "lockdelay" extension has been added.
> The extension inserts an optional, configurable delay before or after
> lock acquisition. In the test, we delay 2 seconds after lock acquisition
> in the first process and 1 second before lock acquisition in the 2nd
> process. This means the first process has 1s to obtain the lock. There
> is a race condition here. If we encounter it in the wild, we could
> change the dummy extension to wait on the lock file to appear instead
> of relying on timing. But that's more complicated. Let's see what
> happens first.
>
> diff --git a/mercurial/hg.py b/mercurial/hg.py
> --- a/mercurial/hg.py
> +++ b/mercurial/hg.py
> @@ -330,27 +330,40 @@ def clonewithshare(ui, peeropts, sharepa
>      revs = None
>      if rev:
>          if not srcpeer.capable('lookup'):
>              raise error.Abort(_("src repository does not support "
>                                 "revision lookup and so doesn't "
>                                 "support clone by revision"))
>          revs = [srcpeer.lookup(r) for r in rev]
>
> +    # Obtain a lock before checking for or cloning the pooled repo
> otherwise
> +    # 2 clients may race creating or populating it.
> +    pooldir = os.path.dirname(sharepath)
> +    # lock class requires the directory to exist.
> +    try:
> +        util.makedir(pooldir, False)
> +    except OSError as e:
> +        if e.errno != errno.EEXIST:
> +            raise
> +
> +    poolvfs = scmutil.vfs(pooldir)
>      basename = os.path.basename(sharepath)
>
> -    if os.path.exists(sharepath):
> -        ui.status(_('(sharing from existing pooled repository %s)\n') %
> -                  basename)
> -    else:
> -        ui.status(_('(sharing from new pooled repository %s)\n') %
> basename)
> -        # Always use pull mode because hardlinks in share mode don't work
> well.
> -        # Never update because working copies aren't necessary in share
> mode.
> -        clone(ui, peeropts, source, dest=sharepath, pull=True,
> -              rev=rev, update=False, stream=stream)
> +    with lock.lock(poolvfs, '%s.lock' % basename):
> +        if os.path.exists(sharepath):
> +            ui.status(_('(sharing from existing pooled repository %s)\n')
> %
> +                      basename)
> +        else:
> +            ui.status(_('(sharing from new pooled repository %s)\n') %
> basename)
> +            # Always use pull mode because hardlinks in share mode don't
> work
> +            # well. Never update because working copies aren't necessary
> in
> +            # share mode.
> +            clone(ui, peeropts, source, dest=sharepath, pull=True,
> +                  rev=rev, update=False, stream=stream)
>
>      sharerepo = repository(ui, path=sharepath)
>      share(ui, sharerepo, dest=dest, update=update, bookmarks=False)
>
>      # We need to perform a pull against the dest repo to fetch bookmarks
>      # and other non-store data that isn't shared by default. In the case
> of
>      # non-existing shared repo, this means we pull from the remote twice.
> This
>      # is a bit weird. But at the time it was implemented, there wasn't an
> easy
> diff --git a/tests/lockdelay.py b/tests/lockdelay.py
> new file mode 100644
> --- /dev/null
> +++ b/tests/lockdelay.py
> @@ -0,0 +1,26 @@
> +# Dummy extension that adds a delay after acquiring a lock.
> +#
> +# This extension can be used to test race conditions between lock
> acquisition.
> +
> +from __future__ import absolute_import
> +
> +import os
> +import time
> +
> +from mercurial import (
> +    lock as lockmod,
> +)
> +
> +class delaylock(lockmod.lock):
> +    def lock(self):
> +        delay = float(os.environ.get('HGPRELOCKDELAY', '0.0'))
> +        if delay:
> +            time.sleep(delay)
> +        res = super(delaylock, self).lock()
> +        delay = float(os.environ.get('HGPOSTLOCKDELAY', '0.0'))
> +        if delay:
> +            time.sleep(delay)
> +        return res
> +
> +def extsetup(ui):
> +    lockmod.lock = delaylock
> \ No newline at end of file
>

Oops. This upsets test-check-code.t. I'll send a V3.


> diff --git a/tests/test-clone.t b/tests/test-clone.t
> --- a/tests/test-clone.t
> +++ b/tests/test-clone.t
> @@ -1020,8 +1020,54 @@ Test that auto sharing doesn't cause fai
>    $ hg -R a id -r 0
>    acb14030fe0a
>    $ hg id -R remote -r 0
>    abort: repository remote not found!
>    [255]
>    $ hg --config share.pool=share -q clone -e "python
> \"$TESTDIR/dummyssh\"" a ssh://user@dummy/remote
>    $ hg -R remote id -r 0
>    acb14030fe0a
> +
> +Cloning into pooled storage doesn't race (issue5104)
> +
> +  $ HGPOSTLOCKDELAY=2.0 hg --config share.pool=racepool --config
> extensions.lockdelay=$TESTDIR/lockdelay.py clone source1a share-destrace1 >
> race1.log 2>&1 &
> +  $ HGPRELOCKDELAY=1.0 hg --config share.pool=racepool --config
> extensions.lockdelay=$TESTDIR/lockdelay.py clone source1a share-destrace2
> > race2.log 2>&1
> +  $ wait
> +
> +  $ hg -R share-destrace1 log -r tip
> +  changeset:   2:e5bfe23c0b47
> +  bookmark:    bookA
> +  tag:         tip
> +  user:        test
> +  date:        Thu Jan 01 00:00:00 1970 +0000
> +  summary:     1a
> +
> +
> +  $ hg -R share-destrace2 log -r tip
> +  changeset:   2:e5bfe23c0b47
> +  bookmark:    bookA
> +  tag:         tip
> +  user:        test
> +  date:        Thu Jan 01 00:00:00 1970 +0000
> +  summary:     1a
> +
> +  $ cat race1.log
> +  (sharing from new pooled repository
> b5f04eac9d8f7a6a9fcb070243cccea7dc5ea0c1)
> +  requesting all changes
> +  adding changesets
> +  adding manifests
> +  adding file changes
> +  added 3 changesets with 3 changes to 1 files
> +  updating working directory
> +  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  searching for changes
> +  no changes found
> +  adding remote bookmark bookA
> +
> +  $ cat race2.log
> +  (sharing from existing pooled repository
> b5f04eac9d8f7a6a9fcb070243cccea7dc5ea0c1)
> +  updating working directory
> +  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
> +  waiting for lock on repository share-destrace2 held by * (glob)
> +  got lock after \d+ seconds (re)
> +  searching for changes
> +  no changes found
> +  adding remote bookmark bookA
>

Patch

diff --git a/mercurial/hg.py b/mercurial/hg.py
--- a/mercurial/hg.py
+++ b/mercurial/hg.py
@@ -330,27 +330,40 @@  def clonewithshare(ui, peeropts, sharepa
     revs = None
     if rev:
         if not srcpeer.capable('lookup'):
             raise error.Abort(_("src repository does not support "
                                "revision lookup and so doesn't "
                                "support clone by revision"))
         revs = [srcpeer.lookup(r) for r in rev]
 
+    # Obtain a lock before checking for or cloning the pooled repo otherwise
+    # 2 clients may race creating or populating it.
+    pooldir = os.path.dirname(sharepath)
+    # lock class requires the directory to exist.
+    try:
+        util.makedir(pooldir, False)
+    except OSError as e:
+        if e.errno != errno.EEXIST:
+            raise
+
+    poolvfs = scmutil.vfs(pooldir)
     basename = os.path.basename(sharepath)
 
-    if os.path.exists(sharepath):
-        ui.status(_('(sharing from existing pooled repository %s)\n') %
-                  basename)
-    else:
-        ui.status(_('(sharing from new pooled repository %s)\n') % basename)
-        # Always use pull mode because hardlinks in share mode don't work well.
-        # Never update because working copies aren't necessary in share mode.
-        clone(ui, peeropts, source, dest=sharepath, pull=True,
-              rev=rev, update=False, stream=stream)
+    with lock.lock(poolvfs, '%s.lock' % basename):
+        if os.path.exists(sharepath):
+            ui.status(_('(sharing from existing pooled repository %s)\n') %
+                      basename)
+        else:
+            ui.status(_('(sharing from new pooled repository %s)\n') % basename)
+            # Always use pull mode because hardlinks in share mode don't work
+            # well. Never update because working copies aren't necessary in
+            # share mode.
+            clone(ui, peeropts, source, dest=sharepath, pull=True,
+                  rev=rev, update=False, stream=stream)
 
     sharerepo = repository(ui, path=sharepath)
     share(ui, sharerepo, dest=dest, update=update, bookmarks=False)
 
     # We need to perform a pull against the dest repo to fetch bookmarks
     # and other non-store data that isn't shared by default. In the case of
     # non-existing shared repo, this means we pull from the remote twice. This
     # is a bit weird. But at the time it was implemented, there wasn't an easy
diff --git a/tests/lockdelay.py b/tests/lockdelay.py
new file mode 100644
--- /dev/null
+++ b/tests/lockdelay.py
@@ -0,0 +1,26 @@ 
+# Dummy extension that adds a delay after acquiring a lock.
+#
+# This extension can be used to test race conditions between lock acquisition.
+
+from __future__ import absolute_import
+
+import os
+import time
+
+from mercurial import (
+    lock as lockmod,
+)
+
+class delaylock(lockmod.lock):
+    def lock(self):
+        delay = float(os.environ.get('HGPRELOCKDELAY', '0.0'))
+        if delay:
+            time.sleep(delay)
+        res = super(delaylock, self).lock()
+        delay = float(os.environ.get('HGPOSTLOCKDELAY', '0.0'))
+        if delay:
+            time.sleep(delay)
+        return res
+
+def extsetup(ui):
+    lockmod.lock = delaylock
\ No newline at end of file
diff --git a/tests/test-clone.t b/tests/test-clone.t
--- a/tests/test-clone.t
+++ b/tests/test-clone.t
@@ -1020,8 +1020,54 @@  Test that auto sharing doesn't cause fai
   $ hg -R a id -r 0
   acb14030fe0a
   $ hg id -R remote -r 0
   abort: repository remote not found!
   [255]
   $ hg --config share.pool=share -q clone -e "python \"$TESTDIR/dummyssh\"" a ssh://user@dummy/remote
   $ hg -R remote id -r 0
   acb14030fe0a
+
+Cloning into pooled storage doesn't race (issue5104)
+
+  $ HGPOSTLOCKDELAY=2.0 hg --config share.pool=racepool --config extensions.lockdelay=$TESTDIR/lockdelay.py clone source1a share-destrace1 > race1.log 2>&1 &
+  $ HGPRELOCKDELAY=1.0 hg --config share.pool=racepool --config extensions.lockdelay=$TESTDIR/lockdelay.py clone source1a share-destrace2  > race2.log 2>&1
+  $ wait
+
+  $ hg -R share-destrace1 log -r tip
+  changeset:   2:e5bfe23c0b47
+  bookmark:    bookA
+  tag:         tip
+  user:        test
+  date:        Thu Jan 01 00:00:00 1970 +0000
+  summary:     1a
+  
+
+  $ hg -R share-destrace2 log -r tip
+  changeset:   2:e5bfe23c0b47
+  bookmark:    bookA
+  tag:         tip
+  user:        test
+  date:        Thu Jan 01 00:00:00 1970 +0000
+  summary:     1a
+  
+  $ cat race1.log
+  (sharing from new pooled repository b5f04eac9d8f7a6a9fcb070243cccea7dc5ea0c1)
+  requesting all changes
+  adding changesets
+  adding manifests
+  adding file changes
+  added 3 changesets with 3 changes to 1 files
+  updating working directory
+  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  searching for changes
+  no changes found
+  adding remote bookmark bookA
+
+  $ cat race2.log
+  (sharing from existing pooled repository b5f04eac9d8f7a6a9fcb070243cccea7dc5ea0c1)
+  updating working directory
+  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
+  waiting for lock on repository share-destrace2 held by * (glob)
+  got lock after \d+ seconds (re)
+  searching for changes
+  no changes found
+  adding remote bookmark bookA