Patchwork [1,of,2] pathauditor: change parts verification order to be root first

login
register
mail settings
Submitter Durham Goode
Date Feb. 12, 2016, 1:24 a.m.
Message ID <127e46787efbcec30834.1455240254@dev8486.prn1.facebook.com>
Download mbox | patch
Permalink /patch/13124/
State Accepted
Headers show

Comments

Durham Goode - Feb. 12, 2016, 1:24 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1455239073 28800
#      Thu Feb 11 17:04:33 2016 -0800
# Node ID 127e46787efbcec30834c7334ed1d4f509da9c3a
# Parent  a036e1ae1fbe88ab99cb861ebfc2e4da7a3912ca
pathauditor: change parts verification order to be root first

Previously, when we verified the parts of a path in the auditor, we would
validate the deepest directory first, then it's parent, and so on up to the
root. If there happened to be a symlink in the chain, that meant our first check
would likely traverse that symlink.  In some cases that symlink might point to
a network filesystem that is expensive, and therefore this simple check could be
very slow.

The fix is to check the path parts starting at the root and working our way
down.

This has a minor performance difference in that we used to be able to short
circuit from the audit if we reached a directory that had already been checked.
Now we can't, but the cost is N dictionary look ups, where N is the number of
parts in the path, which should be fairly minor.
Pierre-Yves David - Feb. 12, 2016, 3:11 p.m.
On 02/12/2016 01:24 AM, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1455239073 28800
> #      Thu Feb 11 17:04:33 2016 -0800
> # Node ID 127e46787efbcec30834c7334ed1d4f509da9c3a
> # Parent  a036e1ae1fbe88ab99cb861ebfc2e4da7a3912ca
> pathauditor: change parts verification order to be root first
>
> Previously, when we verified the parts of a path in the auditor, we would
> validate the deepest directory first, then it's parent, and so on up to the
> root. If there happened to be a symlink in the chain, that meant our first check
> would likely traverse that symlink.  In some cases that symlink might point to
> a network filesystem that is expensive, and therefore this simple check could be
> very slow.
>
> The fix is to check the path parts starting at the root and working our way
> down.
>
> This has a minor performance difference in that we used to be able to short
> circuit from the audit if we reached a directory that had already been checked.
> Now we can't, but the cost is N dictionary look ups, where N is the number of
> parts in the path, which should be fairly minor.
>
> diff --git a/mercurial/pathutil.py b/mercurial/pathutil.py
> --- a/mercurial/pathutil.py
> +++ b/mercurial/pathutil.py
> @@ -83,16 +83,17 @@ class pathauditor(object):
>           parts.pop()
>           normparts.pop()
>           prefixes = []
> -        while parts:
> -            prefix = os.sep.join(parts)
> -            normprefix = os.sep.join(normparts)
> +        # It's important that we check the path parts starting from the root.
> +        # This means we won't accidentaly traverse a symlink into some other
> +        # filesystem (which is potentially expensive to access).
> +        for i in range(0, len(parts)):

range(len(parts)) is equivalent, I've updated the patch.

I would probably go for 'enumarate', but I don't want to amend this too 
much.

Patch

diff --git a/mercurial/pathutil.py b/mercurial/pathutil.py
--- a/mercurial/pathutil.py
+++ b/mercurial/pathutil.py
@@ -83,16 +83,17 @@  class pathauditor(object):
         parts.pop()
         normparts.pop()
         prefixes = []
-        while parts:
-            prefix = os.sep.join(parts)
-            normprefix = os.sep.join(normparts)
+        # It's important that we check the path parts starting from the root.
+        # This means we won't accidentaly traverse a symlink into some other
+        # filesystem (which is potentially expensive to access).
+        for i in range(0, len(parts)):
+            prefix = os.sep.join(parts[:i + 1])
+            normprefix = os.sep.join(normparts[:i + 1])
             if normprefix in self.auditeddir:
-                break
+                continue
             if self._realfs:
                 self._checkfs(prefix, path)
             prefixes.append(normprefix)
-            parts.pop()
-            normparts.pop()
 
         self.audited.add(normpath)
         # only add prefixes to the cache after checking everything: we don't