Patchwork dirstate._walkexplicit: don't bother normalizing ''

login
register
mail settings
Submitter Siddharth Agarwal
Date March 30, 2015, 3:40 a.m.
Message ID <da9074bcbd34008b2615.1427686810@devbig136.prn2.facebook.com>
Download mbox | patch
Permalink /patch/8365/
State Rejected
Headers show

Comments

Siddharth Agarwal - March 30, 2015, 3:40 a.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1427678928 25200
#      Sun Mar 29 18:28:48 2015 -0700
# Node ID da9074bcbd34008b2615f019fd57d203b80425e1
# Parent  efa094701a05d58d505c3b0c3b3c73dba4e51e97
dirstate._walkexplicit: don't bother normalizing ''

The overwhelmingly common case is running commands like 'hg diff' with no
arguments. Therefore the only file that'll be listed is the root directory.
Normalizing that's just a waste of time.

This means that for a plain 'hg diff' we'll never need to construct the
foldmap, saving us a significant chunk of time.

On case-insensitive HFS+ on OS X, for a large repository with over 200,000
files, this brings down 'hg diff' from 2.97 seconds to 2.36.
Martin von Zweigbergk - March 30, 2015, 4:31 a.m.
On Sun, Mar 29, 2015 at 8:41 PM Siddharth Agarwal <sid0@fb.com> wrote:

> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1427678928 25200
> #      Sun Mar 29 18:28:48 2015 -0700
> # Node ID da9074bcbd34008b2615f019fd57d203b80425e1
> # Parent  efa094701a05d58d505c3b0c3b3c73dba4e51e97
> dirstate._walkexplicit: don't bother normalizing ''
>
> The overwhelmingly common case is running commands like 'hg diff' with no
> arguments. Therefore the only file that'll be listed is the root directory.
> Normalizing that's just a waste of time.
>
> This means that for a plain 'hg diff' we'll never need to construct the
> foldmap, saving us a significant chunk of time.
>
> On case-insensitive HFS+ on OS X, for a large repository with over 200,000
> files, this brings down 'hg diff' from 2.97 seconds to 2.36.
>
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -635,10 +635,15 @@
>
>          alldirs = None
>          for ff in files:
> -            if normalize:
> -                nf = normalize(normpath(ff), False, True)
> +            # constructing the foldmap is expensive, so don't do it for
> the
> +            # common case where files is ['']
> +            if ff != '':
> +                if normalize:
> +                    nf = normalize(normpath(ff), False, True)
> +                else:
> +                    nf = normpath(ff)
>              else:
> -                nf = normpath(ff)
> +                nf = '.'
>

Nit: It might be simpler to call normpath() first:

nf = normpath(ff)
if nf != '.' and normalize:
  nf = normalize(nf, False, True)

Slightly off topic: When does normpath have any effect anyway? The paths
get here through a matcher and the matcher seems to normalize all paths.
The only effect normpath seems to have is to convert '' to '.' (i.e. back
to what it was before we converted from '.' to '' a few lines up).


>              if nf in results:
>                  continue
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
Siddharth Agarwal - March 30, 2015, 6:25 a.m.
On 03/29/2015 09:31 PM, Martin von Zweigbergk wrote:
>
>
> On Sun, Mar 29, 2015 at 8:41 PM Siddharth Agarwal <sid0@fb.com
> <mailto:sid0@fb.com>> wrote:
>
>     # HG changeset patch
>     # User Siddharth Agarwal <sid0@fb.com <mailto:sid0@fb.com>>
>     # Date 1427678928 25200
>     #      Sun Mar 29 18:28:48 2015 -0700
>     # Node ID da9074bcbd34008b2615f019fd57d203b80425e1
>     # Parent  efa094701a05d58d505c3b0c3b3c73dba4e51e97
>     dirstate._walkexplicit: don't bother normalizing ''
>
>     The overwhelmingly common case is running commands like 'hg diff'
>     with no
>     arguments. Therefore the only file that'll be listed is the root
>     directory.
>     Normalizing that's just a waste of time.
>
>     This means that for a plain 'hg diff' we'll never need to
>     construct the
>     foldmap, saving us a significant chunk of time.
>
>     On case-insensitive HFS+ on OS X, for a large repository with over
>     200,000
>     files, this brings down 'hg diff' from 2.97 seconds to 2.36.
>
>     diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
>     --- a/mercurial/dirstate.py
>     +++ b/mercurial/dirstate.py
>     @@ -635,10 +635,15 @@
>
>              alldirs = None
>              for ff in files:
>     -            if normalize:
>     -                nf = normalize(normpath(ff), False, True)
>     +            # constructing the foldmap is expensive, so don't do
>     it for the
>     +            # common case where files is ['']
>     +            if ff != '':
>     +                if normalize:
>     +                    nf = normalize(normpath(ff), False, True)
>     +                else:
>     +                    nf = normpath(ff)
>                  else:
>     -                nf = normpath(ff)
>     +                nf = '.'
>
>
> Nit: It might be simpler to call normpath() first:
>
> nf = normpath(ff)
> if nf != '.' and normalize:
>   nf = normalize(nf, False, True)
>
> Slightly off topic: When does normpath have any effect anyway? The
> paths get here through a matcher and the matcher seems to normalize
> all paths. The only effect normpath seems to have is to convert '' to
> '.' (i.e. back to what it was before we converted from '.' to '' a few
> lines up).

I suspect you're completely correct. Let me double check then send out
patches cleaning this up.

- Siddharth

Patch

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -635,10 +635,15 @@ 
 
         alldirs = None
         for ff in files:
-            if normalize:
-                nf = normalize(normpath(ff), False, True)
+            # constructing the foldmap is expensive, so don't do it for the
+            # common case where files is ['']
+            if ff != '':
+                if normalize:
+                    nf = normalize(normpath(ff), False, True)
+                else:
+                    nf = normpath(ff)
             else:
-                nf = normpath(ff)
+                nf = '.'
             if nf in results:
                 continue