Patchwork [2,of,4] addremove: don't call lexists, isdir, and islink

login
register
mail settings
Submitter Durham Goode
Date Feb. 5, 2013, 3:20 a.m.
Message ID <f8d0ae2b9805840a73f4.1360034447@dev350.prn1.facebook.com>
Download mbox | patch
Permalink /patch/804/
State Accepted
Commit d1582dd6288e959d668e2841e9d076cd442d43bf
Headers show

Comments

Durham Goode - Feb. 5, 2013, 3:20 a.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1360015580 28800
# Node ID f8d0ae2b9805840a73f4c3e44d64043cf47691de
# Parent  3457575aad13d6e7b8ccb9a45f9e49cf1f2b27fe
addremove: don't call lexists, isdir, and islink

The dirstate walk results contain the stat information for each path, so we
don't need to query it again.  On a large repo this makes addremove go from
8.35 seconds to 7.1 (15%).
Matt Mackall - Feb. 5, 2013, 9:30 p.m.
On Mon, 2013-02-04 at 19:20 -0800, Durham Goode wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1360015580 28800
> # Node ID f8d0ae2b9805840a73f4c3e44d64043cf47691de
> # Parent  3457575aad13d6e7b8ccb9a45f9e49cf1f2b27fe
> addremove: don't call lexists, isdir, and islink
> 
> The dirstate walk results contain the stat information for each path, so we
> don't need to query it again.  On a large repo this makes addremove go from
> 8.35 seconds to 7.1 (15%).

...
> -    for abs in repo.walk(m):
...
> +    walkresults = repo.dirstate.walk(m, sorted(ctx.substate), True, False)
> +    for abs in sorted(walkresults):
...

Uhh, what's this bit about??

repo.walk:

    def walk(self, match, node=None):
        '''                                                                                                                                                                                       
        walk recursively through the directory tree or a given                                                                                                                                    
        changeset, finding all files matched by the match                                                                                                                                         
        function                                                                                                                                                                                  
        '''
	return self[node].walk(match)

workingcontext.walk:

    def walk(self, match):
        return sorted(self._repo.dirstate.walk(match, sorted(self.substate),
                                               True, False))

I see, we're sneaking in a nano-optimization that's busting through two
layers of abstraction to avoid the overhead of two function calls.

As slow as Python is, it's not going to be a measurable performance
improvement as it's outside the loop. I just spent ten minutes trying to
understand this change. At about .1us saved per addremove, we're going
to need about 6000 Facebook-sized projects doing addremove on every
commit to break even. And that's assuming no one else ever tries to
understand this line of code.

Let's not do that sort of thing.

Patch

diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
--- a/mercurial/scmutil.py
+++ b/mercurial/scmutil.py
@@ -733,8 +733,9 @@ 
     rejected = []
     m.bad = lambda x, y: rejected.append(x)
 
-    for abs in repo.walk(m):
-        target = repo.wjoin(abs)
+    ctx = repo[None]
+    walkresults = repo.dirstate.walk(m, sorted(ctx.substate), True, False)
+    for abs in sorted(walkresults):
         good = True
         try:
             audit_path(abs)
@@ -743,14 +744,15 @@ 
         rel = m.rel(abs)
         exact = m.exact(abs)
 
+        st = walkresults[abs]
         dstate = repo.dirstate[abs]
         if good and dstate == '?':
             unknown.append(abs)
             if repo.ui.verbose or not exact:
                 repo.ui.status(_('adding %s\n') % ((pats and rel) or abs))
         elif (dstate != 'r' and
-              (not good or not os.path.lexists(target) or
-               (os.path.isdir(target) and not os.path.islink(target)))):
+              (not good or not st or
+               (stat.S_ISDIR(st.st_mode) and not stat.S_ISLNK(st.st_mode)))):
             deleted.append(abs)
             if repo.ui.verbose or not exact:
                 repo.ui.status(_('removing %s\n') % ((pats and rel) or abs))