Patchwork [4,of,4] dirstate: walk returns None for files under symlink directory

login
register
mail settings
Submitter Durham Goode
Date Feb. 5, 2013, 3:20 a.m.
Message ID <6d9b67fed529efcff1f2.1360034449@dev350.prn1.facebook.com>
Download mbox | patch
Permalink /patch/806/
State Changes Requested
Headers show

Comments

Durham Goode - Feb. 5, 2013, 3:20 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1360016835 28800
# Node ID 6d9b67fed529efcff1f210abd694c551d85cdea5
# Parent  1c47ffd98e0ee0ddccfe339e2a28cf1841be52a4
dirstate: walk returns None for files under symlink directory

Previously dirstate.walk would return a stat object for files in the dmap
that were under a symlink directory.  Now it will return None (when the unknown
parameter is True) to indicate that they are no longer considered part of the
repository.

In a situation like this:
  mkdir foo && touch foo/a && hg commit -Am "a"
  mv foo bar
  ln -s bar foo

'hg status' will now show '! foo/a', whereas before it incorrectly considered
'foo/a' to be unchanged.

On a large repository this brings addremove from 6.3 seconds down to 3.65 (42%)
since addremove no longer has to stat every directory of every file to determine
if the file is inside a symlink directory. I see no perf hit to any other commands.

Patch

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -709,10 +709,37 @@ 
 
         # step 3: report unseen items in the dmap hash
         if not skipstep3 and not exact:
-            visit = sorted([f for f in dmap if f not in results and matchfn(f)])
-            nf = iter(visit).next
-            for st in util.statfiles([join(i) for i in visit]):
-                results[nf()] = st
+            audit_path = scmutil.pathauditor(self._root)
+            def _auditpath(path):
+                try:
+                    audit_path(path)
+                except (OSError, util.Abort):
+                    return False
+
+                return True
+
+            if unknown:
+                # unknown == True means we walked the full directory tree above.
+                # So if a file is not seen it was either a) not matching matchfn
+                # b) ignored, c) missing, or d) under a symlink directory.
+                unvisitedmatches = [f for f in dmap if f not in
+                    results and matchfn(f)]
+                for nf in iter(unvisitedmatches):
+                    # Report ignored items in the dmap as long as they are not
+                    # under a symlink directory.
+                    if ignore(nf) and _auditpath(nf):
+                        results[nf] = util.statfiles([join(nf)])[0]
+                    else:
+                        # It's either missing or under a symlink directory
+                        results[nf] = None
+            else:
+                # We may not have walked the full directory tree above,
+                # so stat everything we missed.
+                visit = sorted([f for f in dmap if f not in
+                    results and matchfn(f)])
+                nf = iter(visit).next
+                for st in util.statfiles([join(i) for i in visit]):
+                    results[nf()] = st
         for s in subrepos:
             del results[s]
         del results['.hg']
diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
--- a/mercurial/scmutil.py
+++ b/mercurial/scmutil.py
@@ -733,24 +733,26 @@ 
     rejected = []
     m.bad = lambda x, y: rejected.append(x)
 
+    def _auditpath(path):
+        try:
+            audit_path(path)
+        except (OSError, util.Abort):
+            return False
+
+        return True
+
     ctx = repo[None]
     walkresults = repo.dirstate.walk(m, sorted(ctx.substate), True, False)
     for abs in sorted(walkresults):
-        good = True
-        try:
-            audit_path(abs)
-        except (OSError, util.Abort):
-            good = False
-
         st = walkresults[abs]
         dstate = repo.dirstate[abs]
-        if good and dstate == '?':
+        if dstate == '?' and _auditpath(abs):
             unknown.append(abs)
             if repo.ui.verbose or not m.exact(abs):
                 rel = m.rel(abs)
                 repo.ui.status(_('adding %s\n') % ((pats and rel) or abs))
         elif (dstate != 'r' and
-              (not good or not st or
+              (not st or
                (stat.S_ISDIR(st.st_mode) and not stat.S_ISLNK(st.st_mode)))):
             deleted.append(abs)
             if repo.ui.verbose or not m.exact(abs):
diff --git a/tests/test-symlinks.t b/tests/test-symlinks.t
--- a/tests/test-symlinks.t
+++ b/tests/test-symlinks.t
@@ -149,6 +149,10 @@ 
   adding foo/a
   $ mv foo bar
   $ ln -s bar foo
+  $ hg status
+  ! foo/a
+  ? bar/a
+  ? foo
 
 now addremove should remove old files