Patchwork minifileset: allow 'path:' patterns to have an explicit trailing slash

login
register
mail settings
Submitter Matt Harbison
Date Feb. 13, 2018, 3:17 a.m.
Message ID <e99e6917138593d2dddf.1518491855@Envy>
Download mbox | patch
Permalink /patch/27779/
State New
Headers show

Comments

Matt Harbison - Feb. 13, 2018, 3:17 a.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1518488713 18000
#      Mon Feb 12 21:25:13 2018 -0500
# Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743
# Parent  9b5df6e19a4f308e14703a8136cd0530c1e1d1a9
minifileset: allow 'path:' patterns to have an explicit trailing slash

We allow for it on the command line, with `hg status`, for example.  I thought
that I copied this "n.startswith(p) and (len(n) == pl or n[pl] == '/')" pattern
from somewhere, but I don't see it now.
Yuya Nishihara - Feb. 13, 2018, 11:27 a.m.
On Mon, 12 Feb 2018 22:17:35 -0500, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1518488713 18000
> #      Mon Feb 12 21:25:13 2018 -0500
> # Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743
> # Parent  9b5df6e19a4f308e14703a8136cd0530c1e1d1a9
> minifileset: allow 'path:' patterns to have an explicit trailing slash
> 
> We allow for it on the command line, with `hg status`, for example.  I thought
> that I copied this "n.startswith(p) and (len(n) == pl or n[pl] == '/')" pattern
> from somewhere, but I don't see it now.
> 
> diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py
> --- a/mercurial/minifileset.py
> +++ b/mercurial/minifileset.py
> @@ -26,7 +26,7 @@
>                      raise error.ParseError(_('reserved character: %s') % c)
>              return lambda n, s: n.endswith(ext)
>          elif name.startswith('path:'): # directory or full path test
> -            p = name[5:] # prefix
> +            p = name[5:] if name[-1] != '/' else name[5:-1] # prefix

Doesn't it mean 'a/' matches 'a'?

Can't we reuse some parts of the match module to build a function or regexp
from a pattern string?
Matt Harbison - Feb. 13, 2018, 12:58 p.m.
> On Feb 13, 2018, at 6:27 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> 
>> On Mon, 12 Feb 2018 22:17:35 -0500, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1518488713 18000
>> #      Mon Feb 12 21:25:13 2018 -0500
>> # Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743
>> # Parent  9b5df6e19a4f308e14703a8136cd0530c1e1d1a9
>> minifileset: allow 'path:' patterns to have an explicit trailing slash
>> 
>> We allow for it on the command line, with `hg status`, for example.  I thought
>> that I copied this "n.startswith(p) and (len(n) == pl or n[pl] == '/')" pattern
>> from somewhere, but I don't see it now.
>> 
>> diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py
>> --- a/mercurial/minifileset.py
>> +++ b/mercurial/minifileset.py
>> @@ -26,7 +26,7 @@
>>                     raise error.ParseError(_('reserved character: %s') % c)
>>             return lambda n, s: n.endswith(ext)
>>         elif name.startswith('path:'): # directory or full path test
>> -            p = name[5:] # prefix
>> +            p = name[5:] if name[-1] != '/' else name[5:-1] # prefix
> 
> Doesn't it mean 'a/' matches 'a'?

Yes. (But 'a' won’t match 'ab'.)  Basically, I spent some time last week writing ignore rules for some converted repos, and got into the habit of appending a trailing '/' to ensure the match is a directory, and not just a substring.  When I did that here, it took awhile to figure out why the path was being ignored.  ('path:' only matches directories)

> Can't we reuse some parts of the match module to build a function or regexp
> from a pattern string?

Probably.  I’ve seen a couple cases where a regex pattern would be useful.  I just assumed those other match types were part of the performance concern that was the reason for splitting out the mini language in the first place.
Yuya Nishihara - Feb. 13, 2018, 1:38 p.m.
On Tue, 13 Feb 2018 07:58:50 -0500, Matt Harbison wrote:
> 
> > On Feb 13, 2018, at 6:27 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > 
> >> On Mon, 12 Feb 2018 22:17:35 -0500, Matt Harbison wrote:
> >> # HG changeset patch
> >> # User Matt Harbison <matt_harbison@yahoo.com>
> >> # Date 1518488713 18000
> >> #      Mon Feb 12 21:25:13 2018 -0500
> >> # Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743
> >> # Parent  9b5df6e19a4f308e14703a8136cd0530c1e1d1a9
> >> minifileset: allow 'path:' patterns to have an explicit trailing slash
> >> 
> >> We allow for it on the command line, with `hg status`, for example.  I thought
> >> that I copied this "n.startswith(p) and (len(n) == pl or n[pl] == '/')" pattern
> >> from somewhere, but I don't see it now.
> >> 
> >> diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py
> >> --- a/mercurial/minifileset.py
> >> +++ b/mercurial/minifileset.py
> >> @@ -26,7 +26,7 @@
> >>                     raise error.ParseError(_('reserved character: %s') % c)
> >>             return lambda n, s: n.endswith(ext)
> >>         elif name.startswith('path:'): # directory or full path test
> >> -            p = name[5:] # prefix
> >> +            p = name[5:] if name[-1] != '/' else name[5:-1] # prefix
> > 
> > Doesn't it mean 'a/' matches 'a'?
> 
> Yes. (But 'a' won’t match 'ab'.)

Ugh, I never thought 'path:hg/' would match the file 'hg', but it does
probably because of util.normpath().

  % hg debugwalk 'path:hg/'
  matcher: <patternmatcher patterns='(?:hg(?:/|$))'>
  f  hg  hg  exact

Perhaps applying normpath() would look saner than testing if name[-1] == '/'.

> Basically, I spent some time last week writing ignore rules for some converted repos, and got into the habit of appending a trailing '/' to ensure the match is a directory, and not just a substring.  When I did that here, it took awhile to figure out why the path was being ignored.  ('path:' only matches directories)
> 
> > Can't we reuse some parts of the match module to build a function or regexp
> > from a pattern string?
> 
> Probably.  I’ve seen a couple cases where a regex pattern would be useful.  I just assumed those other match types were part of the performance concern that was the reason for splitting out the mini language in the first place.

(CC Jun)

I think the O(n) concern came from how fileset filters n-length list, not
from the matcher function itself.

Patch

diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py
--- a/mercurial/minifileset.py
+++ b/mercurial/minifileset.py
@@ -26,7 +26,7 @@ 
                     raise error.ParseError(_('reserved character: %s') % c)
             return lambda n, s: n.endswith(ext)
         elif name.startswith('path:'): # directory or full path test
-            p = name[5:] # prefix
+            p = name[5:] if name[-1] != '/' else name[5:-1] # prefix
             pl = len(p)
             f = lambda n, s: n.startswith(p) and (len(n) == pl or n[pl] == '/')
             return f
diff --git a/tests/test-minifileset.py b/tests/test-minifileset.py
--- a/tests/test-minifileset.py
+++ b/tests/test-minifileset.py
@@ -25,6 +25,8 @@ 
 check('"path:a" & (**.b | **.c)', [('a/b.b', 0), ('a/c.c', 0)], [('b/c.c', 0)])
 check('(path:a & **.b) | **.c',
       [('a/b.b', 0), ('a/c.c', 0), ('b/c.c', 0)], [])
+check('path:a/', [('a/b.b', 0), ('a/c.c', 0)], [('ab/c.c', 0)])
+check('path:a', [('a/b.b', 0), ('a/c.c', 0)], [('ab/c.c', 0)])
 
 check('**.bin - size("<20B")', [('b.bin', 21)], [('a.bin', 11), ('b.txt', 21)])