Patchwork [STABLE] posix: avoid race condition while checking for symlinking capability

login
register
mail settings
Submitter Matt Mackall
Date Nov. 6, 2015, 9:24 p.m.
Message ID <1446845070.11504.27.camel@selenic.com>
Download mbox | patch
Permalink /patch/11312/
State Rejected
Headers show

Comments

Matt Mackall - Nov. 6, 2015, 9:24 p.m.
On Fri, 2015-11-06 at 10:55 -0600, Mathias De Maré wrote:
> # HG changeset patch
> # User Mathias De Maré <mathias.demare@gmail.com>
> # Date 1446828659 -3600
> #      Fri Nov 06 17:50:59 2015 +0100
> # Branch stable
> # Node ID f8ff6c7234fec490e5ab98633bdcabbb403d48a1
> # Parent  e7c618cee8df35aefedad15b991d628bae1c60c8
> posix: avoid race condition while checking for symlinking capability
> 
> We start several workers (one per core), so it's possible
> two workers get the same result for 'mktemp'.
> This can result in one of the workers failing to create a symlink,
> causing a percentage of the symlinks to be created as regular files.
> 
> This happens very rarely, but I've seen it occur on a repository
> with ~6000 symbolic links and with a machine with 32 cores.

This making another directory is expensive, and I think we should be
heeding the wisdom of the comment: the race doesn't matter because we
get an EEXIST error. Instead, just wrap the test in a loop and retry on
collision:



I pulled this out into a test script that tried this continuously and
it's solid under load. Any objections?

-- 
Mathematics is the supreme nostalgia of our time.
Mathias De Maré - Nov. 9, 2015, 8:05 a.m.
On Fri, Nov 6, 2015 at 10:24 PM, Matt Mackall <mpm@selenic.com> wrote:

> On Fri, 2015-11-06 at 10:55 -0600, Mathias De Maré wrote:
> > # HG changeset patch
> > # User Mathias De Maré <mathias.demare@gmail.com>
> > # Date 1446828659 -3600
> > #      Fri Nov 06 17:50:59 2015 +0100
> > # Branch stable
> > # Node ID f8ff6c7234fec490e5ab98633bdcabbb403d48a1
> > # Parent  e7c618cee8df35aefedad15b991d628bae1c60c8
> > posix: avoid race condition while checking for symlinking capability
> >
> > We start several workers (one per core), so it's possible
> > two workers get the same result for 'mktemp'.
> > This can result in one of the workers failing to create a symlink,
> > causing a percentage of the symlinks to be created as regular files.
> >
> > This happens very rarely, but I've seen it occur on a repository
> > with ~6000 symbolic links and with a machine with 32 cores.
>
> This making another directory is expensive, and I think we should be
> heeding the wisdom of the comment: the race doesn't matter because we
> get an EEXIST error. Instead, just wrap the test in a loop and retry on
> collision:
>
> diff -r dae888685988 mercurial/posix.py
> --- a/mercurial/posix.py        Thu Nov 05 17:30:10 2015 -0600
> +++ b/mercurial/posix.py        Fri Nov 06 15:15:24 2015 -0600
> @@ -170,22 +170,26 @@
>      """check whether the given path is on a symlink-capable filesystem"""
>      # mktemp is not racy because symlink creation will fail if the
>      # file already exists
> -    name = tempfile.mktemp(dir=path, prefix='hg-checklink-')
> -    try:
> -        fd = tempfile.NamedTemporaryFile(dir=path, prefix='hg-checklink-')
> +    while True:
> +        name = tempfile.mktemp(dir=path, prefix='hg-checklink-')
>          try:
> -            os.symlink(os.path.basename(fd.name), name)
> -            os.unlink(name)
> -            return True
> -        finally:
> -            fd.close()
> -    except AttributeError:
> -        return False
> -    except OSError as inst:
> -        # sshfs might report failure while successfully creating the link
> -        if inst[0] == errno.EIO and os.path.exists(name):
> -            os.unlink(name)
> -        return False
> +            fd = tempfile.NamedTemporaryFile(dir=path,
> prefix='hg-checklink-')
> +            try:
> +                os.symlink(os.path.basename(fd.name), name)
> +                os.unlink(name)
> +                return True
> +            except OSError as inst:
> +                # link creation might race, try again
> +                if inst[0] == errno.EEXIST:
> +                    continue
> +                # sshfs might report failure while successfully creating
> the link
> +                if inst[0] == errno.EIO and os.path.exists(name):
> +                    os.unlink(name)
> +                return False
> +            finally:
> +                fd.close()
> +        except AttributeError:
> +            return False
>
>  def checkosfilename(path):
>      '''Check that the base-relative path is a valid filename on this
> platform.
>
>
> I pulled this out into a test script that tried this continuously and
> it's solid under load. Any objections?
>
Looks good to me. I can't reproduce the issue with this fix :-)

Thanks!
Mathias

Patch

diff -r dae888685988 mercurial/posix.py
--- a/mercurial/posix.py	Thu Nov 05 17:30:10 2015 -0600
+++ b/mercurial/posix.py	Fri Nov 06 15:15:24 2015 -0600
@@ -170,22 +170,26 @@ 
     """check whether the given path is on a symlink-capable filesystem"""
     # mktemp is not racy because symlink creation will fail if the
     # file already exists
-    name = tempfile.mktemp(dir=path, prefix='hg-checklink-')
-    try:
-        fd = tempfile.NamedTemporaryFile(dir=path, prefix='hg-checklink-')
+    while True:
+        name = tempfile.mktemp(dir=path, prefix='hg-checklink-')
         try:
-            os.symlink(os.path.basename(fd.name), name)
-            os.unlink(name)
-            return True
-        finally:
-            fd.close()
-    except AttributeError:
-        return False
-    except OSError as inst:
-        # sshfs might report failure while successfully creating the link
-        if inst[0] == errno.EIO and os.path.exists(name):
-            os.unlink(name)
-        return False
+            fd = tempfile.NamedTemporaryFile(dir=path, prefix='hg-checklink-')
+            try:
+                os.symlink(os.path.basename(fd.name), name)
+                os.unlink(name)
+                return True
+            except OSError as inst:
+                # link creation might race, try again
+                if inst[0] == errno.EEXIST:
+                    continue
+                # sshfs might report failure while successfully creating the link
+                if inst[0] == errno.EIO and os.path.exists(name):
+                    os.unlink(name)
+                return False
+            finally:
+                fd.close()
+        except AttributeError:
+            return False
 
 def checkosfilename(path):
     '''Check that the base-relative path is a valid filename on this platform.