Patchwork [3,of,3] treemanifest: further optimize treemanifest.matches()

login
register
mail settings
Submitter Drew Gottlieb
Date April 6, 2015, 10:48 p.m.
Message ID <64c838dd0204e52e5a6a.1428360515@waste.org>
Download mbox | patch
Permalink /patch/8516/
State Accepted
Headers show

Comments

Drew Gottlieb - April 6, 2015, 10:48 p.m.
# HG changeset patch
# User Drew Gottlieb <drgott@google.com>
# Date 1428342713 25200
#      Mon Apr 06 10:51:53 2015 -0700
# Node ID 64c838dd0204e52e5a6aeae92af7950f323ecf87
# Parent  3d7b11f525d9439dfdaae57bdaaf5a41ed4be5a9
treemanifest: further optimize treemanifest.matches()

The matches function was previously traversing all submanifests to look for
matching files, even though it was possible to know if a submanifest won't
contain any matches.

This change adds a visitdir function on the match object to decide quickly if
a directory should be visited when traversing. The function also decides if
_all_ subdirectories should be traversed.

Adding this logic as methods on the match object also makes the logic
modifiable by extensions, such as largefiles.

An example of a command this speeds up is running
  hg status --rev .^ python/
on the Mozilla repo with the treemanifest experiment enabled.
It goes from 2.03s to 1.85s.

More improvements to speed from this change will happen when treemanifests are
lazily loaded. Because a flat manifest is still loaded and then converted
into treemanifests, speed improvements are limited.

This change has no negative effect on speed. For a worst-case example, this
command is not negatively impacted:
  hg status --rev .^ 'relglob:*.js'
on the Mozilla repo. It goes from 2.83s to 2.82s.
Martin von Zweigbergk - April 6, 2015, 11:08 p.m.
As usual, I've already reviewed these and they look good to me.

On Mon, Apr 6, 2015 at 3:50 PM Drew Gottlieb <drgott@google.com> wrote:

> # HG changeset patch
> # User Drew Gottlieb <drgott@google.com>
> # Date 1428342713 25200
> #      Mon Apr 06 10:51:53 2015 -0700
> # Node ID 64c838dd0204e52e5a6aeae92af7950f323ecf87
> # Parent  3d7b11f525d9439dfdaae57bdaaf5a41ed4be5a9
> treemanifest: further optimize treemanifest.matches()
>
> The matches function was previously traversing all submanifests to look for
> matching files, even though it was possible to know if a submanifest won't
> contain any matches.
>
> This change adds a visitdir function on the match object to decide quickly
> if
> a directory should be visited when traversing. The function also decides if
> _all_ subdirectories should be traversed.
>
> Adding this logic as methods on the match object also makes the logic
> modifiable by extensions, such as largefiles.
>
> An example of a command this speeds up is running
>   hg status --rev .^ python/
> on the Mozilla repo with the treemanifest experiment enabled.
> It goes from 2.03s to 1.85s.
>
> More improvements to speed from this change will happen when treemanifests
> are
> lazily loaded. Because a flat manifest is still loaded and then converted
> into treemanifests, speed improvements are limited.
>
> This change has no negative effect on speed. For a worst-case example, this
> command is not negatively impacted:
>   hg status --rev .^ 'relglob:*.js'
> on the Mozilla repo. It goes from 2.83s to 2.82s.
>
> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
> --- a/mercurial/manifest.py
> +++ b/mercurial/manifest.py
> @@ -581,11 +581,19 @@
>
>          return self._matches(match)
>
> -    def _matches(self, match):
> +    def _matches(self, match, alldirs=False):
>          '''recursively generate a new manifest filtered by the match
> argument.
> -        '''
> +
> +        Will visit all subdirectories if alldirs is True, otherwise it
> will
> +        only visit subdirectories for which match.visitdir is True.'''
>
>          ret = treemanifest(self._dir)
> +        if not alldirs:
> +            # substring to strip trailing slash
> +            visit = match.visitdir(self._dir[:-1] or '.')
> +            if not visit:
> +                return ret
> +            alldirs = (visit == 'all')
>
>          for fn in self._files:
>              fullp = self._subpath(fn)
> @@ -596,7 +604,7 @@
>                  ret._flags[fn] = self._flags[fn]
>
>          for dir, subm in self._dirs.iteritems():
> -            m = subm._matches(match)
> +            m = subm._matches(match, alldirs)
>              if not m._isempty():
>                  ret._dirs[dir] = m
>
> diff --git a/mercurial/match.py b/mercurial/match.py
> --- a/mercurial/match.py
> +++ b/mercurial/match.py
> @@ -9,6 +9,8 @@
>  import util, pathutil
>  from i18n import _
>
> +propertycache = util.propertycache
> +
>  def _rematcher(regex):
>      '''compile the regexp with the best available regexp engine and
> return a
>      matcher function'''
> @@ -157,6 +159,20 @@
>          else: optimal roots'''
>          return self._files
>
> +    @propertycache
> +    def _dirs(self):
> +        return set(util.dirs(self._fmap)) | set(['.'])
> +
> +    def visitdir(self, dir):
> +        '''Helps while traversing a directory tree. Returns the string
> 'all' if
> +        the given directory and all subdirectories should be visited.
> Otherwise
> +        returns True or False indicating whether the given directory
> should be
> +        visited. If 'all' is returned, calling this method on a
> subdirectory
> +        gives an undefined result.'''
> +        if not self._fmap or self.exact(dir):
> +            return 'all'
> +        return dir in self._dirs
> +
>      def exact(self, f):
>          '''Returns True if f is in .files().'''
>          return f in self._fmap
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
>
Pierre-Yves David - April 7, 2015, 6:13 p.m.
On 04/06/2015 04:08 PM, Martin von Zweigbergk wrote:
> As usual, I've already reviewed these and they look good to me.
>
> On Mon, Apr 6, 2015 at 3:50 PM Drew Gottlieb <drgott@google.com
> <mailto:drgott@google.com>> wrote:
>
>     # HG changeset patch
>     # User Drew Gottlieb <drgott@google.com <mailto:drgott@google.com>>
>     # Date 1428342713 25200
>     #      Mon Apr 06 10:51:53 2015 -0700
>     # Node ID 64c838dd0204e52e5a6aeae92af795__0f323ecf87
>     # Parent  3d7b11f525d9439dfdaae57bdaaf5a__41ed4be5a9
>     treemanifest: further optimize treemanifest.matches()

These three are pushed to the clowncopter. Thanks.

Patch

diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -581,11 +581,19 @@ 
 
         return self._matches(match)
 
-    def _matches(self, match):
+    def _matches(self, match, alldirs=False):
         '''recursively generate a new manifest filtered by the match argument.
-        '''
+
+        Will visit all subdirectories if alldirs is True, otherwise it will
+        only visit subdirectories for which match.visitdir is True.'''
 
         ret = treemanifest(self._dir)
+        if not alldirs:
+            # substring to strip trailing slash
+            visit = match.visitdir(self._dir[:-1] or '.')
+            if not visit:
+                return ret
+            alldirs = (visit == 'all')
 
         for fn in self._files:
             fullp = self._subpath(fn)
@@ -596,7 +604,7 @@ 
                 ret._flags[fn] = self._flags[fn]
 
         for dir, subm in self._dirs.iteritems():
-            m = subm._matches(match)
+            m = subm._matches(match, alldirs)
             if not m._isempty():
                 ret._dirs[dir] = m
 
diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -9,6 +9,8 @@ 
 import util, pathutil
 from i18n import _
 
+propertycache = util.propertycache
+
 def _rematcher(regex):
     '''compile the regexp with the best available regexp engine and return a
     matcher function'''
@@ -157,6 +159,20 @@ 
         else: optimal roots'''
         return self._files
 
+    @propertycache
+    def _dirs(self):
+        return set(util.dirs(self._fmap)) | set(['.'])
+
+    def visitdir(self, dir):
+        '''Helps while traversing a directory tree. Returns the string 'all' if
+        the given directory and all subdirectories should be visited. Otherwise
+        returns True or False indicating whether the given directory should be
+        visited. If 'all' is returned, calling this method on a subdirectory
+        gives an undefined result.'''
+        if not self._fmap or self.exact(dir):
+            return 'all'
+        return dir in self._dirs
+
     def exact(self, f):
         '''Returns True if f is in .files().'''
         return f in self._fmap