Patchwork minifileset: allow 'path:' patterns to have an explicit trailing slash

login
register
mail settings
Submitter Matt Harbison
Date Feb. 16, 2018, 3:05 a.m.
Message ID <op.zeidjxh29lwrgf@envy>
Download mbox | patch
Permalink /patch/27988/
State New
Headers show

Comments

Matt Harbison - Feb. 16, 2018, 3:05 a.m.
On Tue, 13 Feb 2018 08:38:27 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Tue, 13 Feb 2018 07:58:50 -0500, Matt Harbison wrote:
>>
>> > On Feb 13, 2018, at 6:27 AM, Yuya Nishihara <yuya@tcha.org> wrote:
>> >
>> >> On Mon, 12 Feb 2018 22:17:35 -0500, Matt Harbison wrote:
>> >> # HG changeset patch
>> >> # User Matt Harbison <matt_harbison@yahoo.com>
>> >> # Date 1518488713 18000
>> >> #      Mon Feb 12 21:25:13 2018 -0500
>> >> # Node ID e99e6917138593d2dddf7e0f5506dbd3f6c87743
>> >> # Parent  9b5df6e19a4f308e14703a8136cd0530c1e1d1a9
>> >> minifileset: allow 'path:' patterns to have an explicit trailing  
>> slash
>> >>
>> >> We allow for it on the command line, with `hg status`, for example.   
>> I thought
>> >> that I copied this "n.startswith(p) and (len(n) == pl or n[pl] ==  
>> '/')" pattern
>> >> from somewhere, but I don't see it now.
>> >>
>> >> diff --git a/mercurial/minifileset.py b/mercurial/minifileset.py
>> >> --- a/mercurial/minifileset.py
>> >> +++ b/mercurial/minifileset.py
>> >> @@ -26,7 +26,7 @@
>> >>                     raise error.ParseError(_('reserved character:  
>> %s') % c)
>> >>             return lambda n, s: n.endswith(ext)
>> >>         elif name.startswith('path:'): # directory or full path test
>> >> -            p = name[5:] # prefix
>> >> +            p = name[5:] if name[-1] != '/' else name[5:-1] # prefix
>> >
>> > Doesn't it mean 'a/' matches 'a'?
>>
>> Yes. (But 'a' won’t match 'ab'.)
>
> Ugh, I never thought 'path:hg/' would match the file 'hg', but it does
> probably because of util.normpath().
>
>   % hg debugwalk 'path:hg/'
>   matcher: <patternmatcher patterns='(?:hg(?:/|$))'>
>   f  hg  hg  exact

Hmm, that surprises me too.  It does look like util.normpath() is involved:


$ ../hg debugwalk 'path:hg/'
matcher: <patternmatcher patterns='(?:hg\\/(?:/|$))'>
..\hg\: The filename, directory name, or volume label syntax is incorrect

(Though that pattern looks like it would have problems matching against  
files in an actual directory.  In practice, it could walk tests/, though  
it double slashed the first separator.)

> Perhaps applying normpath() would look saner than testing if name[-1] ==  
> '/'.

Yeah, that was definitely a micro optimization.

>> Basically, I spent some time last week writing ignore rules for some  
>> converted repos, and got into the habit of appending a trailing '/' to  
>> ensure the match is a directory, and not just a substring.  When I did  
>> that here, it took awhile to figure out why the path was being  
>> ignored.  ('path:' only matches directories)
>>
>> > Can't we reuse some parts of the match module to build a function or  
>> regexp
>> > from a pattern string?
>>
>> Probably.  I’ve seen a couple cases where a regex pattern would be  
>> useful.  I just assumed those other match types were part of the  
>> performance concern that was the reason for splitting out the mini  
>> language in the first place.
>
> (CC Jun)
>
> I think the O(n) concern came from how fileset filters n-length list, not
> from the matcher function itself.

Unless I'm missing something, the only time patternmatcher walks ctx is if  
there's a 'set:' kind.  So if we filter that out that, the relative kinds  
(except relglob), and 'subinclude:', I don't see why we can't create one  
of those to build the match function.  That would allow regex,  
rootfilesin, and (rel)glob support too.
Yuya Nishihara - Feb. 18, 2018, 12:43 p.m.
On Thu, 15 Feb 2018 22:05:47 -0500, Matt Harbison wrote:
> >> Basically, I spent some time last week writing ignore rules for some  
> >> converted repos, and got into the habit of appending a trailing '/' to  
> >> ensure the match is a directory, and not just a substring.  When I did  
> >> that here, it took awhile to figure out why the path was being  
> >> ignored.  ('path:' only matches directories)
> >>
> >> > Can't we reuse some parts of the match module to build a function or  
> >> regexp
> >> > from a pattern string?
> >>
> >> Probably.  I’ve seen a couple cases where a regex pattern would be  
> >> useful.  I just assumed those other match types were part of the  
> >> performance concern that was the reason for splitting out the mini  
> >> language in the first place.
> >
> > (CC Jun)
> >
> > I think the O(n) concern came from how fileset filters n-length list, not
> > from the matcher function itself.
> 
> Unless I'm missing something, the only time patternmatcher walks ctx is if
> there's a 'set:' kind.

Perhaps. And we can effectively disable 'set:' by not passing ctx to matcher.
57d6c0c74b1b could be partially backed out if we want to handle unsupported
'set:' in matcher.

> So if we filter that out that, the relative kinds
> (except relglob), and 'subinclude:', I don't see why we can't create one  
> of those to build the match function.  That would allow regex,  
> rootfilesin, and (rel)glob support too.

Patch

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -199,7 +199,10 @@ 
          if kind in cwdrelativepatternkinds:
              pat = pathutil.canonpath(root, cwd, pat, auditor)
          elif kind in ('relglob', 'path', 'rootfilesin'):
+            isdir = pat[-1] in br'\/'
              pat = util.normpath(pat)
+            if isdir:
+                pat += b'/'
          elif kind in ('listfile', 'listfile0'):
              try:
                  files = util.readfile(pat)