Patchwork [1,of,2] match: adding support for matching files inside a directory

login
register
mail settings
Submitter via Mercurial-devel
Date Feb. 10, 2017, 11:52 p.m.
Message ID <2d9523f80c5b5fbace1b.1486770769@rdamazio.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/18419/
State Superseded
Headers show

Comments

via Mercurial-devel - Feb. 10, 2017, 11:52 p.m.
# HG changeset patch
# User Rodrigo Damazio Bovendorp <rdamazio@google.com>
# Date 1486768320 28800
#      Fri Feb 10 15:12:00 2017 -0800
# Node ID 2d9523f80c5b5fbace1b0566fb47bed7468369b0
# Parent  a95fc01aaffe805bcc4c02a822b82a1162fa35b9
match: adding support for matching files inside a directory

This adds a new "rootfilesin" matcher type which matches files inside a
directory, but not any subdirectories (so it matches non-recursively).
This has the "root" prefix per foozy's plan for other matchers (rootglob,
rootpath, cwdre, etc.).
Katsunori FUJIWARA - Feb. 12, 2017, 7:23 p.m.
At Fri, 10 Feb 2017 15:52:49 -0800,
Rodrigo Damazio Bovendorp via Mercurial-devel wrote:
> 
> # HG changeset patch
> # User Rodrigo Damazio Bovendorp <rdamazio@google.com>
> # Date 1486768320 28800
> #      Fri Feb 10 15:12:00 2017 -0800
> # Node ID 2d9523f80c5b5fbace1b0566fb47bed7468369b0
> # Parent  a95fc01aaffe805bcc4c02a822b82a1162fa35b9
> match: adding support for matching files inside a directory
> 
> This adds a new "rootfilesin" matcher type which matches files inside a
> directory, but not any subdirectories (so it matches non-recursively).
> This has the "root" prefix per foozy's plan for other matchers (rootglob,
> rootpath, cwdre, etc.).

LGTM, but a nit pick below for chages on "_regex()".

 
> diff -r a95fc01aaffe -r 2d9523f80c5b mercurial/help/patterns.txt
> --- a/mercurial/help/patterns.txt	Wed Feb 08 14:37:38 2017 -0800
> +++ b/mercurial/help/patterns.txt	Fri Feb 10 15:12:00 2017 -0800
> @@ -13,7 +13,10 @@
>  
>  To use a plain path name without any pattern matching, start it with
>  ``path:``. These path names must completely match starting at the
> -current repository root.
> +current repository root, and when the path points to a directory, it is matched
> +recursively. To match all files in a directory non-recursively (not including
> +any files in subdirectories), ``rootfilesin:`` can be used, specifying an
> +absolute path (relative to the repository root).
>  
>  To use an extended glob, start a name with ``glob:``. Globs are rooted
>  at the current directory; a glob such as ``*.c`` will only match files
> @@ -39,12 +42,15 @@
>  All patterns, except for ``glob:`` specified in command line (not for
>  ``-I`` or ``-X`` options), can match also against directories: files
>  under matched directories are treated as matched.
> +For ``-I`` and ``-X`` options, ``glob:`` will match directories recursively.
>  
>  Plain examples::
>  
> -  path:foo/bar   a name bar in a directory named foo in the root
> -                 of the repository
> -  path:path:name a file or directory named "path:name"
> +  path:foo/bar        a name bar in a directory named foo in the root
> +                      of the repository
> +  path:path:name      a file or directory named "path:name"
> +  rootfilesin:foo/bar the files in a directory called foo/bar, but not any files
> +                      in its subdirectories and not a file bar in directory foo
>  
>  Glob examples::
>  
> @@ -52,6 +58,8 @@
>    *.c            any name ending in ".c" in the current directory
>    **.c           any name ending in ".c" in any subdirectory of the
>                   current directory including itself.
> +  foo/*          any file in directory foo plus all its subdirectories,
> +                 recursively
>    foo/*.c        any name ending in ".c" in the directory foo
>    foo/**.c       any name ending in ".c" in any subdirectory of foo
>                   including itself.
> diff -r a95fc01aaffe -r 2d9523f80c5b mercurial/match.py
> --- a/mercurial/match.py	Wed Feb 08 14:37:38 2017 -0800
> +++ b/mercurial/match.py	Fri Feb 10 15:12:00 2017 -0800
> @@ -104,7 +104,10 @@
>          a pattern is one of:
>          'glob:<glob>' - a glob relative to cwd
>          're:<regexp>' - a regular expression
> -        'path:<path>' - a path relative to repository root
> +        'path:<path>' - a path relative to repository root, which is matched
> +                        recursively
> +        'rootfilesin:<path>' - a path relative to repository root, which is
> +                        matched non-recursively (will not match subdirectories)
>          'relglob:<glob>' - an unrooted glob (*.c matches C files in all dirs)
>          'relpath:<path>' - a path relative to cwd
>          'relre:<regexp>' - a regexp that needn't match the start of a name
> @@ -153,7 +156,7 @@
>          elif patterns:
>              kindpats = self._normalize(patterns, default, root, cwd, auditor)
>              if not _kindpatsalwaysmatch(kindpats):
> -                self._files = _roots(kindpats)
> +                self._files = _explicitfiles(kindpats)
>                  self._anypats = self._anypats or _anypats(kindpats)
>                  self.patternspat, pm = _buildmatch(ctx, kindpats, '$',
>                                                     listsubrepos, root)
> @@ -286,7 +289,7 @@
>          for kind, pat in [_patsplit(p, default) for p in patterns]:
>              if kind in ('glob', 'relpath'):
>                  pat = pathutil.canonpath(root, cwd, pat, auditor)
> -            elif kind in ('relglob', 'path'):
> +            elif kind in ('relglob', 'path', 'rootfilesin'):
>                  pat = util.normpath(pat)
>              elif kind in ('listfile', 'listfile0'):
>                  try:
> @@ -447,7 +450,8 @@
>      if ':' in pattern:
>          kind, pat = pattern.split(':', 1)
>          if kind in ('re', 'glob', 'path', 'relglob', 'relpath', 'relre',
> -                    'listfile', 'listfile0', 'set', 'include', 'subinclude'):
> +                    'listfile', 'listfile0', 'set', 'include', 'subinclude',
> +                    'rootfilesin'):
>              return kind, pat
>      return default, pattern
>  
> @@ -540,6 +544,14 @@
>          if pat == '.':
>              return ''
>          return '^' + util.re.escape(pat) + '(?:/|$)'
> +    if kind == 'rootfilesin':
> +        if pat == '.':
> +            escaped = ''
> +        else:
> +            # Pattern is a directory name.
> +            escaped = util.re.escape(pat) + '/'
> +        # Anything after the pattern must be a non-directory.
> +        return '^' + escaped + '[^/]*$'

'[^/]+$' seems safer for "matching against a file", IMHO.

>      if kind == 'relglob':
>          return '(?:|.*/)' + _globre(pat) + globsuffix
>      if kind == 'relpath':
> @@ -614,6 +626,8 @@
>  
>      >>> _roots([('glob', 'g/*', ''), ('glob', 'g', ''), ('glob', 'g*', '')])
>      ['g', 'g', '.']
> +    >>> _roots([('rootfilesin', 'g', ''), ('rootfilesin', '', '')])
> +    ['g', '.']
>      >>> _roots([('relpath', 'r', ''), ('path', 'p/p', ''), ('path', '', '')])
>      ['r', 'p/p', '.']
>      >>> _roots([('relglob', 'rg*', ''), ('re', 're/', ''), ('relre', 'rr', '')])
> @@ -628,15 +642,28 @@
>                      break
>                  root.append(p)
>              r.append('/'.join(root) or '.')
> -        elif kind in ('relpath', 'path'):
> +        elif kind in ('relpath', 'path', 'rootfilesin'):
>              r.append(pat or '.')
>          else: # relglob, re, relre
>              r.append('.')
>      return r
>  
> +def _explicitfiles(kindpats):
> +    '''Returns the potential explicit filenames from the patterns.
> +
> +    >>> _explicitfiles([('path', 'foo/bar', '')])
> +    ['foo/bar']
> +    >>> _explicitfiles([('rootfilesin', 'foo/bar', '')])
> +    []
> +    '''
> +    # Keep only the pattern kinds where one can specify filenames (vs only
> +    # directory names).
> +    filable = [kp for kp in kindpats if kp[0] not in ('rootfilesin')]
> +    return _roots(filable)
> +
>  def _anypats(kindpats):
>      for kind, pat, source in kindpats:
> -        if kind in ('glob', 're', 'relglob', 'relre', 'set'):
> +        if kind in ('glob', 're', 'relglob', 'relre', 'set', 'rootfilesin'):
>              return True
>  
>  _commentre = None
> diff -r a95fc01aaffe -r 2d9523f80c5b tests/test-walk.t
> --- a/tests/test-walk.t	Wed Feb 08 14:37:38 2017 -0800
> +++ b/tests/test-walk.t	Fri Feb 10 15:12:00 2017 -0800
> @@ -112,6 +112,74 @@
>    f  beans/navy      ../beans/navy
>    f  beans/pinto     ../beans/pinto
>    f  beans/turtle    ../beans/turtle
> +
> +  $ hg debugwalk 'rootfilesin:'
> +  f  fennel      ../fennel
> +  f  fenugreek   ../fenugreek
> +  f  fiddlehead  ../fiddlehead
> +  $ hg debugwalk -I 'rootfilesin:'
> +  f  fennel      ../fennel
> +  f  fenugreek   ../fenugreek
> +  f  fiddlehead  ../fiddlehead
> +  $ hg debugwalk 'rootfilesin:.'
> +  f  fennel      ../fennel
> +  f  fenugreek   ../fenugreek
> +  f  fiddlehead  ../fiddlehead
> +  $ hg debugwalk -I 'rootfilesin:.'
> +  f  fennel      ../fennel
> +  f  fenugreek   ../fenugreek
> +  f  fiddlehead  ../fiddlehead
> +  $ hg debugwalk -X 'rootfilesin:'
> +  f  beans/black                     ../beans/black
> +  f  beans/borlotti                  ../beans/borlotti
> +  f  beans/kidney                    ../beans/kidney
> +  f  beans/navy                      ../beans/navy
> +  f  beans/pinto                     ../beans/pinto
> +  f  beans/turtle                    ../beans/turtle
> +  f  mammals/Procyonidae/cacomistle  Procyonidae/cacomistle
> +  f  mammals/Procyonidae/coatimundi  Procyonidae/coatimundi
> +  f  mammals/Procyonidae/raccoon     Procyonidae/raccoon
> +  f  mammals/skunk                   skunk
> +  $ hg debugwalk 'rootfilesin:fennel'
> +  $ hg debugwalk -I 'rootfilesin:fennel'
> +  $ hg debugwalk 'rootfilesin:skunk'
> +  $ hg debugwalk -I 'rootfilesin:skunk'
> +  $ hg debugwalk 'rootfilesin:beans'
> +  f  beans/black     ../beans/black
> +  f  beans/borlotti  ../beans/borlotti
> +  f  beans/kidney    ../beans/kidney
> +  f  beans/navy      ../beans/navy
> +  f  beans/pinto     ../beans/pinto
> +  f  beans/turtle    ../beans/turtle
> +  $ hg debugwalk -I 'rootfilesin:beans'
> +  f  beans/black     ../beans/black
> +  f  beans/borlotti  ../beans/borlotti
> +  f  beans/kidney    ../beans/kidney
> +  f  beans/navy      ../beans/navy
> +  f  beans/pinto     ../beans/pinto
> +  f  beans/turtle    ../beans/turtle
> +  $ hg debugwalk 'rootfilesin:mammals'
> +  f  mammals/skunk  skunk
> +  $ hg debugwalk -I 'rootfilesin:mammals'
> +  f  mammals/skunk  skunk
> +  $ hg debugwalk 'rootfilesin:mammals/'
> +  f  mammals/skunk  skunk
> +  $ hg debugwalk -I 'rootfilesin:mammals/'
> +  f  mammals/skunk  skunk
> +  $ hg debugwalk -X 'rootfilesin:mammals'
> +  f  beans/black                     ../beans/black
> +  f  beans/borlotti                  ../beans/borlotti
> +  f  beans/kidney                    ../beans/kidney
> +  f  beans/navy                      ../beans/navy
> +  f  beans/pinto                     ../beans/pinto
> +  f  beans/turtle                    ../beans/turtle
> +  f  fennel                          ../fennel
> +  f  fenugreek                       ../fenugreek
> +  f  fiddlehead                      ../fiddlehead
> +  f  mammals/Procyonidae/cacomistle  Procyonidae/cacomistle
> +  f  mammals/Procyonidae/coatimundi  Procyonidae/coatimundi
> +  f  mammals/Procyonidae/raccoon     Procyonidae/raccoon
> +
>    $ hg debugwalk .
>    f  mammals/Procyonidae/cacomistle  Procyonidae/cacomistle
>    f  mammals/Procyonidae/coatimundi  Procyonidae/coatimundi
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff -r a95fc01aaffe -r 2d9523f80c5b mercurial/help/patterns.txt
--- a/mercurial/help/patterns.txt	Wed Feb 08 14:37:38 2017 -0800
+++ b/mercurial/help/patterns.txt	Fri Feb 10 15:12:00 2017 -0800
@@ -13,7 +13,10 @@ 
 
 To use a plain path name without any pattern matching, start it with
 ``path:``. These path names must completely match starting at the
-current repository root.
+current repository root, and when the path points to a directory, it is matched
+recursively. To match all files in a directory non-recursively (not including
+any files in subdirectories), ``rootfilesin:`` can be used, specifying an
+absolute path (relative to the repository root).
 
 To use an extended glob, start a name with ``glob:``. Globs are rooted
 at the current directory; a glob such as ``*.c`` will only match files
@@ -39,12 +42,15 @@ 
 All patterns, except for ``glob:`` specified in command line (not for
 ``-I`` or ``-X`` options), can match also against directories: files
 under matched directories are treated as matched.
+For ``-I`` and ``-X`` options, ``glob:`` will match directories recursively.
 
 Plain examples::
 
-  path:foo/bar   a name bar in a directory named foo in the root
-                 of the repository
-  path:path:name a file or directory named "path:name"
+  path:foo/bar        a name bar in a directory named foo in the root
+                      of the repository
+  path:path:name      a file or directory named "path:name"
+  rootfilesin:foo/bar the files in a directory called foo/bar, but not any files
+                      in its subdirectories and not a file bar in directory foo
 
 Glob examples::
 
@@ -52,6 +58,8 @@ 
   *.c            any name ending in ".c" in the current directory
   **.c           any name ending in ".c" in any subdirectory of the
                  current directory including itself.
+  foo/*          any file in directory foo plus all its subdirectories,
+                 recursively
   foo/*.c        any name ending in ".c" in the directory foo
   foo/**.c       any name ending in ".c" in any subdirectory of foo
                  including itself.
diff -r a95fc01aaffe -r 2d9523f80c5b mercurial/match.py
--- a/mercurial/match.py	Wed Feb 08 14:37:38 2017 -0800
+++ b/mercurial/match.py	Fri Feb 10 15:12:00 2017 -0800
@@ -104,7 +104,10 @@ 
         a pattern is one of:
         'glob:<glob>' - a glob relative to cwd
         're:<regexp>' - a regular expression
-        'path:<path>' - a path relative to repository root
+        'path:<path>' - a path relative to repository root, which is matched
+                        recursively
+        'rootfilesin:<path>' - a path relative to repository root, which is
+                        matched non-recursively (will not match subdirectories)
         'relglob:<glob>' - an unrooted glob (*.c matches C files in all dirs)
         'relpath:<path>' - a path relative to cwd
         'relre:<regexp>' - a regexp that needn't match the start of a name
@@ -153,7 +156,7 @@ 
         elif patterns:
             kindpats = self._normalize(patterns, default, root, cwd, auditor)
             if not _kindpatsalwaysmatch(kindpats):
-                self._files = _roots(kindpats)
+                self._files = _explicitfiles(kindpats)
                 self._anypats = self._anypats or _anypats(kindpats)
                 self.patternspat, pm = _buildmatch(ctx, kindpats, '$',
                                                    listsubrepos, root)
@@ -286,7 +289,7 @@ 
         for kind, pat in [_patsplit(p, default) for p in patterns]:
             if kind in ('glob', 'relpath'):
                 pat = pathutil.canonpath(root, cwd, pat, auditor)
-            elif kind in ('relglob', 'path'):
+            elif kind in ('relglob', 'path', 'rootfilesin'):
                 pat = util.normpath(pat)
             elif kind in ('listfile', 'listfile0'):
                 try:
@@ -447,7 +450,8 @@ 
     if ':' in pattern:
         kind, pat = pattern.split(':', 1)
         if kind in ('re', 'glob', 'path', 'relglob', 'relpath', 'relre',
-                    'listfile', 'listfile0', 'set', 'include', 'subinclude'):
+                    'listfile', 'listfile0', 'set', 'include', 'subinclude',
+                    'rootfilesin'):
             return kind, pat
     return default, pattern
 
@@ -540,6 +544,14 @@ 
         if pat == '.':
             return ''
         return '^' + util.re.escape(pat) + '(?:/|$)'
+    if kind == 'rootfilesin':
+        if pat == '.':
+            escaped = ''
+        else:
+            # Pattern is a directory name.
+            escaped = util.re.escape(pat) + '/'
+        # Anything after the pattern must be a non-directory.
+        return '^' + escaped + '[^/]*$'
     if kind == 'relglob':
         return '(?:|.*/)' + _globre(pat) + globsuffix
     if kind == 'relpath':
@@ -614,6 +626,8 @@ 
 
     >>> _roots([('glob', 'g/*', ''), ('glob', 'g', ''), ('glob', 'g*', '')])
     ['g', 'g', '.']
+    >>> _roots([('rootfilesin', 'g', ''), ('rootfilesin', '', '')])
+    ['g', '.']
     >>> _roots([('relpath', 'r', ''), ('path', 'p/p', ''), ('path', '', '')])
     ['r', 'p/p', '.']
     >>> _roots([('relglob', 'rg*', ''), ('re', 're/', ''), ('relre', 'rr', '')])
@@ -628,15 +642,28 @@ 
                     break
                 root.append(p)
             r.append('/'.join(root) or '.')
-        elif kind in ('relpath', 'path'):
+        elif kind in ('relpath', 'path', 'rootfilesin'):
             r.append(pat or '.')
         else: # relglob, re, relre
             r.append('.')
     return r
 
+def _explicitfiles(kindpats):
+    '''Returns the potential explicit filenames from the patterns.
+
+    >>> _explicitfiles([('path', 'foo/bar', '')])
+    ['foo/bar']
+    >>> _explicitfiles([('rootfilesin', 'foo/bar', '')])
+    []
+    '''
+    # Keep only the pattern kinds where one can specify filenames (vs only
+    # directory names).
+    filable = [kp for kp in kindpats if kp[0] not in ('rootfilesin')]
+    return _roots(filable)
+
 def _anypats(kindpats):
     for kind, pat, source in kindpats:
-        if kind in ('glob', 're', 'relglob', 'relre', 'set'):
+        if kind in ('glob', 're', 'relglob', 'relre', 'set', 'rootfilesin'):
             return True
 
 _commentre = None
diff -r a95fc01aaffe -r 2d9523f80c5b tests/test-walk.t
--- a/tests/test-walk.t	Wed Feb 08 14:37:38 2017 -0800
+++ b/tests/test-walk.t	Fri Feb 10 15:12:00 2017 -0800
@@ -112,6 +112,74 @@ 
   f  beans/navy      ../beans/navy
   f  beans/pinto     ../beans/pinto
   f  beans/turtle    ../beans/turtle
+
+  $ hg debugwalk 'rootfilesin:'
+  f  fennel      ../fennel
+  f  fenugreek   ../fenugreek
+  f  fiddlehead  ../fiddlehead
+  $ hg debugwalk -I 'rootfilesin:'
+  f  fennel      ../fennel
+  f  fenugreek   ../fenugreek
+  f  fiddlehead  ../fiddlehead
+  $ hg debugwalk 'rootfilesin:.'
+  f  fennel      ../fennel
+  f  fenugreek   ../fenugreek
+  f  fiddlehead  ../fiddlehead
+  $ hg debugwalk -I 'rootfilesin:.'
+  f  fennel      ../fennel
+  f  fenugreek   ../fenugreek
+  f  fiddlehead  ../fiddlehead
+  $ hg debugwalk -X 'rootfilesin:'
+  f  beans/black                     ../beans/black
+  f  beans/borlotti                  ../beans/borlotti
+  f  beans/kidney                    ../beans/kidney
+  f  beans/navy                      ../beans/navy
+  f  beans/pinto                     ../beans/pinto
+  f  beans/turtle                    ../beans/turtle
+  f  mammals/Procyonidae/cacomistle  Procyonidae/cacomistle
+  f  mammals/Procyonidae/coatimundi  Procyonidae/coatimundi
+  f  mammals/Procyonidae/raccoon     Procyonidae/raccoon
+  f  mammals/skunk                   skunk
+  $ hg debugwalk 'rootfilesin:fennel'
+  $ hg debugwalk -I 'rootfilesin:fennel'
+  $ hg debugwalk 'rootfilesin:skunk'
+  $ hg debugwalk -I 'rootfilesin:skunk'
+  $ hg debugwalk 'rootfilesin:beans'
+  f  beans/black     ../beans/black
+  f  beans/borlotti  ../beans/borlotti
+  f  beans/kidney    ../beans/kidney
+  f  beans/navy      ../beans/navy
+  f  beans/pinto     ../beans/pinto
+  f  beans/turtle    ../beans/turtle
+  $ hg debugwalk -I 'rootfilesin:beans'
+  f  beans/black     ../beans/black
+  f  beans/borlotti  ../beans/borlotti
+  f  beans/kidney    ../beans/kidney
+  f  beans/navy      ../beans/navy
+  f  beans/pinto     ../beans/pinto
+  f  beans/turtle    ../beans/turtle
+  $ hg debugwalk 'rootfilesin:mammals'
+  f  mammals/skunk  skunk
+  $ hg debugwalk -I 'rootfilesin:mammals'
+  f  mammals/skunk  skunk
+  $ hg debugwalk 'rootfilesin:mammals/'
+  f  mammals/skunk  skunk
+  $ hg debugwalk -I 'rootfilesin:mammals/'
+  f  mammals/skunk  skunk
+  $ hg debugwalk -X 'rootfilesin:mammals'
+  f  beans/black                     ../beans/black
+  f  beans/borlotti                  ../beans/borlotti
+  f  beans/kidney                    ../beans/kidney
+  f  beans/navy                      ../beans/navy
+  f  beans/pinto                     ../beans/pinto
+  f  beans/turtle                    ../beans/turtle
+  f  fennel                          ../fennel
+  f  fenugreek                       ../fenugreek
+  f  fiddlehead                      ../fiddlehead
+  f  mammals/Procyonidae/cacomistle  Procyonidae/cacomistle
+  f  mammals/Procyonidae/coatimundi  Procyonidae/coatimundi
+  f  mammals/Procyonidae/raccoon     Procyonidae/raccoon
+
   $ hg debugwalk .
   f  mammals/Procyonidae/cacomistle  Procyonidae/cacomistle
   f  mammals/Procyonidae/coatimundi  Procyonidae/coatimundi