Submitter | Durham Goode |
---|---|
Date | Feb. 12, 2016, 1:24 a.m. |
Message ID | <127e46787efbcec30834.1455240254@dev8486.prn1.facebook.com> |
Download | mbox | patch |
Permalink | /patch/13124/ |
State | Accepted |
Headers | show |
Comments
On 02/12/2016 01:24 AM, Durham Goode wrote: > # HG changeset patch > # User Durham Goode <durham@fb.com> > # Date 1455239073 28800 > # Thu Feb 11 17:04:33 2016 -0800 > # Node ID 127e46787efbcec30834c7334ed1d4f509da9c3a > # Parent a036e1ae1fbe88ab99cb861ebfc2e4da7a3912ca > pathauditor: change parts verification order to be root first > > Previously, when we verified the parts of a path in the auditor, we would > validate the deepest directory first, then it's parent, and so on up to the > root. If there happened to be a symlink in the chain, that meant our first check > would likely traverse that symlink. In some cases that symlink might point to > a network filesystem that is expensive, and therefore this simple check could be > very slow. > > The fix is to check the path parts starting at the root and working our way > down. > > This has a minor performance difference in that we used to be able to short > circuit from the audit if we reached a directory that had already been checked. > Now we can't, but the cost is N dictionary look ups, where N is the number of > parts in the path, which should be fairly minor. > > diff --git a/mercurial/pathutil.py b/mercurial/pathutil.py > --- a/mercurial/pathutil.py > +++ b/mercurial/pathutil.py > @@ -83,16 +83,17 @@ class pathauditor(object): > parts.pop() > normparts.pop() > prefixes = [] > - while parts: > - prefix = os.sep.join(parts) > - normprefix = os.sep.join(normparts) > + # It's important that we check the path parts starting from the root. > + # This means we won't accidentaly traverse a symlink into some other > + # filesystem (which is potentially expensive to access). > + for i in range(0, len(parts)): range(len(parts)) is equivalent, I've updated the patch. I would probably go for 'enumarate', but I don't want to amend this too much.
Patch
diff --git a/mercurial/pathutil.py b/mercurial/pathutil.py --- a/mercurial/pathutil.py +++ b/mercurial/pathutil.py @@ -83,16 +83,17 @@ class pathauditor(object): parts.pop() normparts.pop() prefixes = [] - while parts: - prefix = os.sep.join(parts) - normprefix = os.sep.join(normparts) + # It's important that we check the path parts starting from the root. + # This means we won't accidentaly traverse a symlink into some other + # filesystem (which is potentially expensive to access). + for i in range(0, len(parts)): + prefix = os.sep.join(parts[:i + 1]) + normprefix = os.sep.join(normparts[:i + 1]) if normprefix in self.auditeddir: - break + continue if self._realfs: self._checkfs(prefix, path) prefixes.append(normprefix) - parts.pop() - normparts.pop() self.audited.add(normpath) # only add prefixes to the cache after checking everything: we don't