Patchwork match: adding non-recursive directory matching

login
register
mail settings
Submitter via Mercurial-devel
Date Oct. 8, 2016, 4:58 p.m.
Message ID <545efe5a72efdce925a6.1475945935@waste.org>
Download mbox | patch
Permalink /patch/16952/
State Rejected
Delegated to: Pierre-Yves David
Headers show

Comments

via Mercurial-devel - Oct. 8, 2016, 4:58 p.m.
# HG changeset patch
# User Rodrigo Damazio Bovendorp <rdamazio@google.com>
# Date 1475944120 25200
#      Sat Oct 08 09:28:40 2016 -0700
# Node ID 545efe5a72efdce925a6a3fd3774b350c90b5c55
# Parent  dbcef8918bbdd8a64d9f79a37bcfa284a26f3a39
match: adding non-recursive directory matching

This allows one to match all files in a directory, without matching anything in subdirectories.
It's implemented almost identically to path:, except for the regex termination, which doesn't
allow more than one / after the directory name.
Pierre-Yves David - Oct. 16, 2016, 1:50 p.m.
On 10/08/2016 06:58 PM, Rodrigo Damazio Bovendorp via Mercurial-devel wrote:
> # HG changeset patch
> # User Rodrigo Damazio Bovendorp <rdamazio@google.com>
> # Date 1475944120 25200
> #      Sat Oct 08 09:28:40 2016 -0700
> # Node ID 545efe5a72efdce925a6a3fd3774b350c90b5c55
> # Parent  dbcef8918bbdd8a64d9f79a37bcfa284a26f3a39
> match: adding non-recursive directory matching
>
> This allows one to match all files in a directory, without matching anything in subdirectories.
> It's implemented almost identically to path:, except for the regex termination, which doesn't
> allow more than one / after the directory name.
>
> diff --git a/mercurial/match.py b/mercurial/match.py
> --- a/mercurial/match.py
> +++ b/mercurial/match.py
> @@ -105,6 +105,9 @@
>          'glob:<glob>' - a glob relative to cwd
>          're:<regexp>' - a regular expression
>          'path:<path>' - a path relative to repository root
> +        'files:<path>' - a path relative to repository root, which is matched
> +                         non-recursively (files inside the directory will match,
> +                         but subdirectories and files in them won't

The feature seems useful and we should have it.

The current behavior is a bit strange to me. because we have directory 
being implicitly recursed of just 1 level (directory content). Could we 
have a xxx:<path> where path is never recursed for anything. Listing a 
directory content would be an explicite 'xxx:my/directory/path/*'

We could use 'exact' or 'norecursion' for xxx.

>          'relglob:<glob>' - an unrooted glob (*.c matches C files in all dirs)
>          'relpath:<path>' - a path relative to cwd
>          'relre:<regexp>' - a regexp that needn't match the start of a name
> @@ -286,7 +289,7 @@
>          for kind, pat in [_patsplit(p, default) for p in patterns]:
>              if kind in ('glob', 'relpath'):
>                  pat = pathutil.canonpath(root, cwd, pat, auditor)
> -            elif kind in ('relglob', 'path'):
> +            elif kind in ('relglob', 'path', 'files'):
>                  pat = util.normpath(pat)
>              elif kind in ('listfile', 'listfile0'):
>                  try:
> @@ -447,7 +450,8 @@
>      if ':' in pattern:
>          kind, pat = pattern.split(':', 1)
>          if kind in ('re', 'glob', 'path', 'relglob', 'relpath', 'relre',
> -                    'listfile', 'listfile0', 'set', 'include', 'subinclude'):
> +                    'listfile', 'listfile0', 'set', 'include', 'subinclude',
> +                    'files'):
>              return kind, pat
>      return default, pattern
>
> @@ -540,6 +544,19 @@
>          if pat == '.':
>              return ''
>          return '^' + util.re.escape(pat) + '(?:/|$)'
> +    if kind == 'files':
> +        # Match one of:
> +        # For pat = 'some/dir':
> +        # some/dir
> +        # some/dir/
> +        # some/dir/filename
> +        # For pat = '' or pat = '.':
> +        # filename
> +        if pat == '.':
> +            escaped = ''
> +        else:
> +            escaped = util.re.escape(pat)
> +        return '^' + escaped + '(?:^|/|$)[^/]*$'
>      if kind == 'relglob':
>          return '(?:|.*/)' + _globre(pat) + globsuffix
>      if kind == 'relpath':
> @@ -628,7 +645,7 @@
>                      break
>                  root.append(p)
>              r.append('/'.join(root) or '.')
> -        elif kind in ('relpath', 'path'):
> +        elif kind in ('relpath', 'path', 'files'):
>              r.append(pat or '.')
>          else: # relglob, re, relre
>              r.append('.')
> diff --git a/tests/test-locate.t b/tests/test-locate.t
> --- a/tests/test-locate.t
> +++ b/tests/test-locate.t
> @@ -52,6 +52,12 @@
>    t/b
>    t/e.h
>    t/x
> +  $ hg locate files:
> +  b
> +  t.h
> +  $ hg locate files:.
> +  b
> +  t.h
>    $ hg locate -r 0 a
>    a
>    $ hg locate -r 0 NONEXISTENT
> @@ -119,6 +125,13 @@
>    ../t/e.h (glob)
>    ../t/x (glob)
>
> +  $ hg files files:
> +  ../b (glob)
> +  ../t.h (glob)
> +  $ hg files files:.
> +  ../b (glob)
> +  ../t.h (glob)
> +
>    $ hg locate b
>    ../b (glob)
>    ../t/b (glob)
> diff --git a/tests/test-walk.t b/tests/test-walk.t
> --- a/tests/test-walk.t
> +++ b/tests/test-walk.t
> @@ -112,6 +112,8 @@
>    f  beans/navy      ../beans/navy
>    f  beans/pinto     ../beans/pinto
>    f  beans/turtle    ../beans/turtle
> +  $ hg debugwalk -I 'files:mammals'
> +  f  mammals/skunk  skunk
>    $ hg debugwalk .
>    f  mammals/Procyonidae/cacomistle  Procyonidae/cacomistle
>    f  mammals/Procyonidae/coatimundi  Procyonidae/coatimundi
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Augie Fackler - Oct. 18, 2016, 12:25 a.m.
On Sun, Oct 16, 2016 at 03:50:10PM +0200, Pierre-Yves David wrote:
>
>
> On 10/08/2016 06:58 PM, Rodrigo Damazio Bovendorp via Mercurial-devel wrote:
> > # HG changeset patch
> > # User Rodrigo Damazio Bovendorp <rdamazio@google.com>
> > # Date 1475944120 25200
> > #      Sat Oct 08 09:28:40 2016 -0700
> > # Node ID 545efe5a72efdce925a6a3fd3774b350c90b5c55
> > # Parent  dbcef8918bbdd8a64d9f79a37bcfa284a26f3a39
> > match: adding non-recursive directory matching
> >
> > This allows one to match all files in a directory, without matching anything in subdirectories.
> > It's implemented almost identically to path:, except for the regex termination, which doesn't
> > allow more than one / after the directory name.
> >
> > diff --git a/mercurial/match.py b/mercurial/match.py
> > --- a/mercurial/match.py
> > +++ b/mercurial/match.py
> > @@ -105,6 +105,9 @@
> >          'glob:<glob>' - a glob relative to cwd
> >          're:<regexp>' - a regular expression
> >          'path:<path>' - a path relative to repository root
> > +        'files:<path>' - a path relative to repository root, which is matched
> > +                         non-recursively (files inside the directory will match,
> > +                         but subdirectories and files in them won't
>
> The feature seems useful and we should have it.
>
> The current behavior is a bit strange to me. because we have directory being
> implicitly recursed of just 1 level (directory content). Could we have a
> xxx:<path> where path is never recursed for anything. Listing a directory
> content would be an explicite 'xxx:my/directory/path/*'
>
> We could use 'exact' or 'norecursion' for xxx.

exact: works for me. I think norecusion: is too long, since users will
need to type this.
Augie Fackler - Oct. 18, 2016, 1:40 p.m.
> On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
> 
>> After coordinating on irc to figure out what this proposal actually
>> is, I've noticed that the semantics of this "exact" proposal are
>> exactly what "glob" does today, which means (I think) that
>> "files:foo/bar" should be representable as "glob:foo/bar/*" - what am
>> I missing?
> 
> Maybe we want a "glob" relative to the repo root?

As far as I can tell, it already is. "relglob:" is relative to your location in the repo according to the docs.
Yuya Nishihara - Oct. 18, 2016, 1:52 p.m.
On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
> > On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
> >> After coordinating on irc to figure out what this proposal actually
> >> is, I've noticed that the semantics of this "exact" proposal are
> >> exactly what "glob" does today, which means (I think) that
> >> "files:foo/bar" should be representable as "glob:foo/bar/*" - what am
> >> I missing?
> > 
> > Maybe we want a "glob" relative to the repo root?
> 
> As far as I can tell, it already is. "relglob:" is relative to your
> location in the repo according to the docs.

Unfortunately that isn't.

        'glob:<glob>' - a glob relative to cwd
        'relglob:<glob>' - an unrooted glob (*.c matches C files in all dirs)

Don't ask me why. ;-)
Augie Fackler - Oct. 18, 2016, 2:12 p.m.
On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
>> > On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
>> >> After coordinating on irc to figure out what this proposal actually
>> >> is, I've noticed that the semantics of this "exact" proposal are
>> >> exactly what "glob" does today, which means (I think) that
>> >> "files:foo/bar" should be representable as "glob:foo/bar/*" - what am
>> >> I missing?
>> >
>> > Maybe we want a "glob" relative to the repo root?
>>
>> As far as I can tell, it already is. "relglob:" is relative to your
>> location in the repo according to the docs.
>
> Unfortunately that isn't.
>
>         'glob:<glob>' - a glob relative to cwd
>         'relglob:<glob>' - an unrooted glob (*.c matches C files in all dirs)
>
> Don't ask me why. ;-)

Oh wat. It looks like narrowhg might change this behavior in narrowed
repositories, thus my additional confusion.

Maybe we should add "absglob" that is always repo-root-absolute. How
do we feel about that overall?
Yuya Nishihara - Oct. 18, 2016, 2:39 p.m.
On Tue, 18 Oct 2016 10:12:07 -0400, Augie Fackler wrote:
> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
> >> > On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
> >> >> After coordinating on irc to figure out what this proposal actually
> >> >> is, I've noticed that the semantics of this "exact" proposal are
> >> >> exactly what "glob" does today, which means (I think) that
> >> >> "files:foo/bar" should be representable as "glob:foo/bar/*" - what am
> >> >> I missing?
> >> >
> >> > Maybe we want a "glob" relative to the repo root?
> >>
> >> As far as I can tell, it already is. "relglob:" is relative to your
> >> location in the repo according to the docs.
> >
> > Unfortunately that isn't.
> >
> >         'glob:<glob>' - a glob relative to cwd
> >         'relglob:<glob>' - an unrooted glob (*.c matches C files in all dirs)
> >
> > Don't ask me why. ;-)
> 
> Oh wat. It looks like narrowhg might change this behavior in narrowed
> repositories, thus my additional confusion.
> 
> Maybe we should add "absglob" that is always repo-root-absolute. How
> do we feel about that overall?

Sounds good to me.
via Mercurial-devel - Oct. 20, 2016, 4:19 p.m.
The issue is that glob:foo/* is recursive in some cases - e.g. "hg files -I
glob:contrib/*" in the hg repo gives me subdirectories of contrib
recursively (including e.g. contrib/docker/apache-server, two levels down).
After discussing a bit more offline with Martin: I'll check if that's a bug
in the matcher's visitdir (rather than a design limitation of glob) before
following up on this change.


On Tue, Oct 18, 2016 at 7:39 AM, Yuya Nishihara <yuya@tcha.org> wrote:

> On Tue, 18 Oct 2016 10:12:07 -0400, Augie Fackler wrote:
> > On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > > On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
> > >> > On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
> > >> >> After coordinating on irc to figure out what this proposal actually
> > >> >> is, I've noticed that the semantics of this "exact" proposal are
> > >> >> exactly what "glob" does today, which means (I think) that
> > >> >> "files:foo/bar" should be representable as "glob:foo/bar/*" - what
> am
> > >> >> I missing?
> > >> >
> > >> > Maybe we want a "glob" relative to the repo root?
> > >>
> > >> As far as I can tell, it already is. "relglob:" is relative to your
> > >> location in the repo according to the docs.
> > >
> > > Unfortunately that isn't.
> > >
> > >         'glob:<glob>' - a glob relative to cwd
> > >         'relglob:<glob>' - an unrooted glob (*.c matches C files in
> all dirs)
> > >
> > > Don't ask me why. ;-)
> >
> > Oh wat. It looks like narrowhg might change this behavior in narrowed
> > repositories, thus my additional confusion.
> >
> > Maybe we should add "absglob" that is always repo-root-absolute. How
> > do we feel about that overall?
>
> Sounds good to me.
>
Katsunori FUJIWARA - Oct. 21, 2016, 3:13 p.m.
At Tue, 18 Oct 2016 10:12:07 -0400,
Augie Fackler wrote:
> 
> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
> >> > On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
> >> >> After coordinating on irc to figure out what this proposal actually
> >> >> is, I've noticed that the semantics of this "exact" proposal are
> >> >> exactly what "glob" does today, which means (I think) that
> >> >> "files:foo/bar" should be representable as "glob:foo/bar/*" - what am
> >> >> I missing?
> >> >
> >> > Maybe we want a "glob" relative to the repo root?
> >>
> >> As far as I can tell, it already is. "relglob:" is relative to your
> >> location in the repo according to the docs.
> >
> > Unfortunately that isn't.
> >
> >         'glob:<glob>' - a glob relative to cwd
> >         'relglob:<glob>' - an unrooted glob (*.c matches C files in all dirs)
> >
> > Don't ask me why. ;-)
> 
> Oh wat. It looks like narrowhg might change this behavior in narrowed
> repositories, thus my additional confusion.
> 
> Maybe we should add "absglob" that is always repo-root-absolute. How
> do we feel about that overall?

FYI, current pattern matching is implemented as below. This was
chatted in "non-recursive directory matching" session of 4.0 sprint,
and sorry for my late posting of this translation from
http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in Japanese, as
my backlog of the last sprint.

  ============ ======= ======= ===========
  pattern type root-ed cwd-ed  any-of-path
  ============ ======= ======= ===========
  wildcard     ---     glob    relglob
  regexp       re      ---     relre
  raw string   path    relpath ---
  ============ ======= ======= ===========

  If rule is read in from file (e.g. .hgignore):

    * "glob" is treated as "relglob"
    * "re" is treated as "relre"

  This is mentioned in "hg help patterns" and "hg help hgignore", but
  syntax name "relglob" and "relre" themselves aren't explained.

  "end of name" matching is required:

    * for glob/relglob as PATTERN (e.g. argument in command line), but
    * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern syntaxes

  For example, file "foo/bar/baz" is:

    * not matched at "hg files glob:foo/bar"
    * but matched at "hg file -I glob:foo/bar"

  This isn't mentioned in any help document :-<, and the latter seems
  to cause the issue mentioned in this patch series.

How about introducing new systematic names like below to re-organize
current complicated mapping between names and matching ? (and enable
"end of name" matching by "-eon" suffix or so)

  ============ ======== ======= ===========
  pattern type root-ed  cwd-ed  any-of-path
  ============ ======== ======= ===========
  wildcard     rootglob cwdglob anyglob
  regexp       rootre   cwdre   anyre
  raw string   rootpath cwdpath anypath
  ============ ======== ======= ===========

Of course, we should take care of backward compatibility of .hgignore
or so (e.g. config knob to warn/abort for new syntax name in .hgignore).


> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Yuya Nishihara - Oct. 22, 2016, 9:35 a.m.
On Thu, 20 Oct 2016 09:19:24 -0700, Rodrigo Damazio wrote:
> The issue is that glob:foo/* is recursive in some cases - e.g. "hg files -I
> glob:contrib/*" in the hg repo gives me subdirectories of contrib
> recursively (including e.g. contrib/docker/apache-server, two levels down).
> After discussing a bit more offline with Martin: I'll check if that's a bug
> in the matcher's visitdir (rather than a design limitation of glob) before
> following up on this change.

It appears -I designed to include subdirectories.

https://selenic.com/repo/hg/file/3.9.2/mercurial/match.py#l135
Pierre-Yves David - Oct. 24, 2016, 1:16 p.m.
On 10/22/2016 11:35 AM, Yuya Nishihara wrote:
> On Thu, 20 Oct 2016 09:19:24 -0700, Rodrigo Damazio wrote:
>> The issue is that glob:foo/* is recursive in some cases - e.g. "hg files -I
>> glob:contrib/*" in the hg repo gives me subdirectories of contrib
>> recursively (including e.g. contrib/docker/apache-server, two levels down).
>> After discussing a bit more offline with Martin: I'll check if that's a bug
>> in the matcher's visitdir (rather than a design limitation of glob) before
>> following up on this change.
>
> It appears -I designed to include subdirectories.
>
> https://selenic.com/repo/hg/file/3.9.2/mercurial/match.py#l135

So the solution to Rodrigo usecase would be a "--include-exact" flag?
Pierre-Yves David - Oct. 24, 2016, 1:21 p.m.
On 10/21/2016 05:13 PM, FUJIWARA Katsunori wrote:
> At Tue, 18 Oct 2016 10:12:07 -0400,
> Augie Fackler wrote:
>>
>> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org> wrote:
>>> On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
>>>>> On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
>>>>>> After coordinating on irc to figure out what this proposal actually
>>>>>> is, I've noticed that the semantics of this "exact" proposal are
>>>>>> exactly what "glob" does today, which means (I think) that
>>>>>> "files:foo/bar" should be representable as "glob:foo/bar/*" - what am
>>>>>> I missing?
>>>>>
>>>>> Maybe we want a "glob" relative to the repo root?
>>>>
>>>> As far as I can tell, it already is. "relglob:" is relative to your
>>>> location in the repo according to the docs.
>>>
>>> Unfortunately that isn't.
>>>
>>>         'glob:<glob>' - a glob relative to cwd
>>>         'relglob:<glob>' - an unrooted glob (*.c matches C files in all dirs)
>>>
>>> Don't ask me why. ;-)
>>
>> Oh wat. It looks like narrowhg might change this behavior in narrowed
>> repositories, thus my additional confusion.
>>
>> Maybe we should add "absglob" that is always repo-root-absolute. How
>> do we feel about that overall?
>
> FYI, current pattern matching is implemented as below. This was
> chatted in "non-recursive directory matching" session of 4.0 sprint,
> and sorry for my late posting of this translation from
> http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in Japanese, as
> my backlog of the last sprint.
>
>   ============ ======= ======= ===========
>   pattern type root-ed cwd-ed  any-of-path
>   ============ ======= ======= ===========
>   wildcard     ---     glob    relglob
>   regexp       re      ---     relre
>   raw string   path    relpath ---
>   ============ ======= ======= ===========
>
>   If rule is read in from file (e.g. .hgignore):
>
>     * "glob" is treated as "relglob"
>     * "re" is treated as "relre"
>
>   This is mentioned in "hg help patterns" and "hg help hgignore", but
>   syntax name "relglob" and "relre" themselves aren't explained.
>
>   "end of name" matching is required:
>
>     * for glob/relglob as PATTERN (e.g. argument in command line), but
>     * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern syntaxes
>
>   For example, file "foo/bar/baz" is:
>
>     * not matched at "hg files glob:foo/bar"
>     * but matched at "hg file -I glob:foo/bar"
>
>   This isn't mentioned in any help document :-<, and the latter seems
>   to cause the issue mentioned in this patch series.
>
> How about introducing new systematic names like below to re-organize
> current complicated mapping between names and matching ? (and enable
> "end of name" matching by "-eon" suffix or so)
>
>   ============ ======== ======= ===========
>   pattern type root-ed  cwd-ed  any-of-path
>   ============ ======== ======= ===========
>   wildcard     rootglob cwdglob anyglob
>   regexp       rootre   cwdre   anyre
>   raw string   rootpath cwdpath anypath
>   ============ ======== ======= ===========

Moving toward a more regular and clear feature set and naming seems a 
win. I'm +1 for moving in that direction.

Cheers,
via Mercurial-devel - Oct. 24, 2016, 5:34 p.m.
It sounds like we'd like to do 3 somewhat orthogonal things:
- allow user to specify the directory the pattern is relative to
(root/cwd/any)
- allow the user to specify recursiveness/non-recursiveness consistently
(not covered by the *path patterns, but could be the defined behavior for
the globs)
- clean up the matcher API (discussed during Sprint)

Doing all 3 together would probably take some time and a lot of
back-and-forth, so I'm wondering if it'd be ok to start by updating this
patch to implement "rootglob" with consistent recursiveness behavior, and
we can then more slowly add the other patterns and converge on a cleaner
API?

Also, for discussion: I assume the *path patterns will be recursive when
they reference a directory. Do we also want a non-recursive equivalent
(rootexact, rootfiles, rootnonrecursive or something like that)?

Thanks
Rodrigo



On Mon, Oct 24, 2016 at 6:21 AM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 10/21/2016 05:13 PM, FUJIWARA Katsunori wrote:
>
>> At Tue, 18 Oct 2016 10:12:07 -0400,
>> Augie Fackler wrote:
>>
>>>
>>> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org> wrote:
>>>
>>>> On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
>>>>
>>>>> On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
>>>>>>
>>>>>>> After coordinating on irc to figure out what this proposal actually
>>>>>>> is, I've noticed that the semantics of this "exact" proposal are
>>>>>>> exactly what "glob" does today, which means (I think) that
>>>>>>> "files:foo/bar" should be representable as "glob:foo/bar/*" - what am
>>>>>>> I missing?
>>>>>>>
>>>>>>
>>>>>> Maybe we want a "glob" relative to the repo root?
>>>>>>
>>>>>
>>>>> As far as I can tell, it already is. "relglob:" is relative to your
>>>>> location in the repo according to the docs.
>>>>>
>>>>
>>>> Unfortunately that isn't.
>>>>
>>>>         'glob:<glob>' - a glob relative to cwd
>>>>         'relglob:<glob>' - an unrooted glob (*.c matches C files in all
>>>> dirs)
>>>>
>>>> Don't ask me why. ;-)
>>>>
>>>
>>> Oh wat. It looks like narrowhg might change this behavior in narrowed
>>> repositories, thus my additional confusion.
>>>
>>> Maybe we should add "absglob" that is always repo-root-absolute. How
>>> do we feel about that overall?
>>>
>>
>> FYI, current pattern matching is implemented as below. This was
>> chatted in "non-recursive directory matching" session of 4.0 sprint,
>> and sorry for my late posting of this translation from
>> http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in Japanese, as
>> my backlog of the last sprint.
>>
>>   ============ ======= ======= ===========
>>   pattern type root-ed cwd-ed  any-of-path
>>   ============ ======= ======= ===========
>>   wildcard     ---     glob    relglob
>>   regexp       re      ---     relre
>>   raw string   path    relpath ---
>>   ============ ======= ======= ===========
>>
>>   If rule is read in from file (e.g. .hgignore):
>>
>>     * "glob" is treated as "relglob"
>>     * "re" is treated as "relre"
>>
>>   This is mentioned in "hg help patterns" and "hg help hgignore", but
>>   syntax name "relglob" and "relre" themselves aren't explained.
>>
>>   "end of name" matching is required:
>>
>>     * for glob/relglob as PATTERN (e.g. argument in command line), but
>>     * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern syntaxes
>>
>>   For example, file "foo/bar/baz" is:
>>
>>     * not matched at "hg files glob:foo/bar"
>>     * but matched at "hg file -I glob:foo/bar"
>>
>>   This isn't mentioned in any help document :-<, and the latter seems
>>   to cause the issue mentioned in this patch series.
>>
>> How about introducing new systematic names like below to re-organize
>> current complicated mapping between names and matching ? (and enable
>> "end of name" matching by "-eon" suffix or so)
>>
>>   ============ ======== ======= ===========
>>   pattern type root-ed  cwd-ed  any-of-path
>>   ============ ======== ======= ===========
>>   wildcard     rootglob cwdglob anyglob
>>   regexp       rootre   cwdre   anyre
>>   raw string   rootpath cwdpath anypath
>>   ============ ======== ======= ===========
>>
>
> Moving toward a more regular and clear feature set and naming seems a win.
> I'm +1 for moving in that direction.
>
> Cheers,
>
> --
> Pierre-Yves David
>
via Mercurial-devel - Oct. 25, 2016, 7:40 a.m.
Sending updated patch via pushgate (description changed).


On Mon, Oct 24, 2016 at 10:34 AM, Rodrigo Damazio <rdamazio@google.com>
wrote:

> It sounds like we'd like to do 3 somewhat orthogonal things:
> - allow user to specify the directory the pattern is relative to
> (root/cwd/any)
> - allow the user to specify recursiveness/non-recursiveness consistently
> (not covered by the *path patterns, but could be the defined behavior for
> the globs)
> - clean up the matcher API (discussed during Sprint)
>
> Doing all 3 together would probably take some time and a lot of
> back-and-forth, so I'm wondering if it'd be ok to start by updating this
> patch to implement "rootglob" with consistent recursiveness behavior, and
> we can then more slowly add the other patterns and converge on a cleaner
> API?
>
> Also, for discussion: I assume the *path patterns will be recursive when
> they reference a directory. Do we also want a non-recursive equivalent
> (rootexact, rootfiles, rootnonrecursive or something like that)?
>
> Thanks
> Rodrigo
>
>
>
> On Mon, Oct 24, 2016 at 6:21 AM, Pierre-Yves David <
> pierre-yves.david@ens-lyon.org> wrote:
>
>>
>>
>> On 10/21/2016 05:13 PM, FUJIWARA Katsunori wrote:
>>
>>> At Tue, 18 Oct 2016 10:12:07 -0400,
>>> Augie Fackler wrote:
>>>
>>>>
>>>> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org> wrote:
>>>>
>>>>> On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
>>>>>
>>>>>> On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
>>>>>>>
>>>>>>>> After coordinating on irc to figure out what this proposal actually
>>>>>>>> is, I've noticed that the semantics of this "exact" proposal are
>>>>>>>> exactly what "glob" does today, which means (I think) that
>>>>>>>> "files:foo/bar" should be representable as "glob:foo/bar/*" - what
>>>>>>>> am
>>>>>>>> I missing?
>>>>>>>>
>>>>>>>
>>>>>>> Maybe we want a "glob" relative to the repo root?
>>>>>>>
>>>>>>
>>>>>> As far as I can tell, it already is. "relglob:" is relative to your
>>>>>> location in the repo according to the docs.
>>>>>>
>>>>>
>>>>> Unfortunately that isn't.
>>>>>
>>>>>         'glob:<glob>' - a glob relative to cwd
>>>>>         'relglob:<glob>' - an unrooted glob (*.c matches C files in
>>>>> all dirs)
>>>>>
>>>>> Don't ask me why. ;-)
>>>>>
>>>>
>>>> Oh wat. It looks like narrowhg might change this behavior in narrowed
>>>> repositories, thus my additional confusion.
>>>>
>>>> Maybe we should add "absglob" that is always repo-root-absolute. How
>>>> do we feel about that overall?
>>>>
>>>
>>> FYI, current pattern matching is implemented as below. This was
>>> chatted in "non-recursive directory matching" session of 4.0 sprint,
>>> and sorry for my late posting of this translation from
>>> http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in Japanese, as
>>> my backlog of the last sprint.
>>>
>>>   ============ ======= ======= ===========
>>>   pattern type root-ed cwd-ed  any-of-path
>>>   ============ ======= ======= ===========
>>>   wildcard     ---     glob    relglob
>>>   regexp       re      ---     relre
>>>   raw string   path    relpath ---
>>>   ============ ======= ======= ===========
>>>
>>>   If rule is read in from file (e.g. .hgignore):
>>>
>>>     * "glob" is treated as "relglob"
>>>     * "re" is treated as "relre"
>>>
>>>   This is mentioned in "hg help patterns" and "hg help hgignore", but
>>>   syntax name "relglob" and "relre" themselves aren't explained.
>>>
>>>   "end of name" matching is required:
>>>
>>>     * for glob/relglob as PATTERN (e.g. argument in command line), but
>>>     * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern
>>> syntaxes
>>>
>>>   For example, file "foo/bar/baz" is:
>>>
>>>     * not matched at "hg files glob:foo/bar"
>>>     * but matched at "hg file -I glob:foo/bar"
>>>
>>>   This isn't mentioned in any help document :-<, and the latter seems
>>>   to cause the issue mentioned in this patch series.
>>>
>>> How about introducing new systematic names like below to re-organize
>>> current complicated mapping between names and matching ? (and enable
>>> "end of name" matching by "-eon" suffix or so)
>>>
>>>   ============ ======== ======= ===========
>>>   pattern type root-ed  cwd-ed  any-of-path
>>>   ============ ======== ======= ===========
>>>   wildcard     rootglob cwdglob anyglob
>>>   regexp       rootre   cwdre   anyre
>>>   raw string   rootpath cwdpath anypath
>>>   ============ ======== ======= ===========
>>>
>>
>> Moving toward a more regular and clear feature set and naming seems a
>> win. I'm +1 for moving in that direction.
>>
>> Cheers,
>>
>> --
>> Pierre-Yves David
>>
>
>
Katsunori FUJIWARA - Oct. 25, 2016, 11:31 p.m.
At Mon, 24 Oct 2016 10:34:52 -0700,
Rodrigo Damazio wrote:
> 
> [1  <text/plain; UTF-8 (7bit)>]
> It sounds like we'd like to do 3 somewhat orthogonal things:
> - allow user to specify the directory the pattern is relative to
> (root/cwd/any)
> - allow the user to specify recursiveness/non-recursiveness consistently
> (not covered by the *path patterns, but could be the defined behavior for
> the globs)
> - clean up the matcher API (discussed during Sprint)
> 
> Doing all 3 together would probably take some time and a lot of
> back-and-forth, so I'm wondering if it'd be ok to start by updating this
> patch to implement "rootglob" with consistent recursiveness behavior, and
> we can then more slowly add the other patterns and converge on a cleaner
> API?

(let's suspend posting revised series while code freeze period, to
focus on stabilization :-))

    https://www.mercurial-scm.org/wiki/TimeBasedReleasePlan#Code_Freeze

In my previous reply, I assume that newly introduced syntaxes do:

  - match recursively by default regardless of the way of passing
    (command line, -I/-X, ....), because of similarity with almost all
    of existing syntaxes

    Only glob/relglob as PATTERN in command line require "end of name"
    matching.

  - require additional "-eon" ("end of name") suffix for non-recursive
    matching (e.g. "rootglob-eon", "cwdre-eon", "anypath-eon", ...)

But according to your revised patch, "rootglob" syntax matches
non-recursively. Would you assume as below ?

  - newly introduced syntaxes match non-recursively by default
  - recursive matching requires any additional suffix (e.g. "-recursive")

On the other hand, you assume that newly introduced *path syntaxes
will be recursive, as below. Would you assume that default
recursive-ness is different between *glob and *path syntaxes ?

> Also, for discussion: I assume the *path patterns will be recursive when
> they reference a directory. Do we also want a non-recursive equivalent
> (rootexact, rootfiles, rootnonrecursive or something like that)?

IMHO, making patch description explain how recursive matching will be
controlled in the future helps reviewers to evaluate your patch.


BTW, bikeshedding about name of additional suffix:

  - for non-recursive matching, in "recursive matching by default" case

    - "-eon"

      "end of name matching" is my coined word only for explanation,
      and let's choose better one :-)

    - "-exact" for non-recursive matching

      this might confuse developers, because current implementation
      already uses "exact" term as "matching without any special
      handling".

        https://selenic.com/repo/hg/file/438173c41587/mercurial/match.py#l100

    - "-nonrecursive"

      this is too long, isn't it ?

    - "-file"

      this seems better (short and understandable for end users)

  - for recursive matching, in "non-recursive matching by default" case

    - "-recursive"

      this is too long, isn't it ?

    - "-dir"

      this seems better (short and understandable for end users)

> Thanks
> Rodrigo
> 
> 
> 
> On Mon, Oct 24, 2016 at 6:21 AM, Pierre-Yves David <
> pierre-yves.david@ens-lyon.org> wrote:
> 
> >
> >
> > On 10/21/2016 05:13 PM, FUJIWARA Katsunori wrote:
> >
> >> At Tue, 18 Oct 2016 10:12:07 -0400,
> >> Augie Fackler wrote:
> >>
> >>>
> >>> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> >>>
> >>>> On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
> >>>>
> >>>>> On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
> >>>>>>
> >>>>>>> After coordinating on irc to figure out what this proposal actually
> >>>>>>> is, I've noticed that the semantics of this "exact" proposal are
> >>>>>>> exactly what "glob" does today, which means (I think) that
> >>>>>>> "files:foo/bar" should be representable as "glob:foo/bar/*" - what am
> >>>>>>> I missing?
> >>>>>>>
> >>>>>>
> >>>>>> Maybe we want a "glob" relative to the repo root?
> >>>>>>
> >>>>>
> >>>>> As far as I can tell, it already is. "relglob:" is relative to your
> >>>>> location in the repo according to the docs.
> >>>>>
> >>>>
> >>>> Unfortunately that isn't.
> >>>>
> >>>>         'glob:<glob>' - a glob relative to cwd
> >>>>         'relglob:<glob>' - an unrooted glob (*.c matches C files in all
> >>>> dirs)
> >>>>
> >>>> Don't ask me why. ;-)
> >>>>
> >>>
> >>> Oh wat. It looks like narrowhg might change this behavior in narrowed
> >>> repositories, thus my additional confusion.
> >>>
> >>> Maybe we should add "absglob" that is always repo-root-absolute. How
> >>> do we feel about that overall?
> >>>
> >>
> >> FYI, current pattern matching is implemented as below. This was
> >> chatted in "non-recursive directory matching" session of 4.0 sprint,
> >> and sorry for my late posting of this translation from
> >> http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in Japanese, as
> >> my backlog of the last sprint.
> >>
> >>   ============ ======= ======= ===========
> >>   pattern type root-ed cwd-ed  any-of-path
> >>   ============ ======= ======= ===========
> >>   wildcard     ---     glob    relglob
> >>   regexp       re      ---     relre
> >>   raw string   path    relpath ---
> >>   ============ ======= ======= ===========
> >>
> >>   If rule is read in from file (e.g. .hgignore):
> >>
> >>     * "glob" is treated as "relglob"
> >>     * "re" is treated as "relre"
> >>
> >>   This is mentioned in "hg help patterns" and "hg help hgignore", but
> >>   syntax name "relglob" and "relre" themselves aren't explained.
> >>
> >>   "end of name" matching is required:
> >>
> >>     * for glob/relglob as PATTERN (e.g. argument in command line), but
> >>     * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern syntaxes
> >>
> >>   For example, file "foo/bar/baz" is:
> >>
> >>     * not matched at "hg files glob:foo/bar"
> >>     * but matched at "hg file -I glob:foo/bar"
> >>
> >>   This isn't mentioned in any help document :-<, and the latter seems
> >>   to cause the issue mentioned in this patch series.
> >>
> >> How about introducing new systematic names like below to re-organize
> >> current complicated mapping between names and matching ? (and enable
> >> "end of name" matching by "-eon" suffix or so)
> >>
> >>   ============ ======== ======= ===========
> >>   pattern type root-ed  cwd-ed  any-of-path
> >>   ============ ======== ======= ===========
> >>   wildcard     rootglob cwdglob anyglob
> >>   regexp       rootre   cwdre   anyre
> >>   raw string   rootpath cwdpath anypath
> >>   ============ ======== ======= ===========
> >>
> >
> > Moving toward a more regular and clear feature set and naming seems a win.
> > I'm +1 for moving in that direction.
> >
> > Cheers,
> >
> > --
> > Pierre-Yves David
> >
> [2  <text/html; UTF-8 (quoted-printable)>]
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
via Mercurial-devel - Oct. 26, 2016, 2:51 a.m.
On Tue, Oct 25, 2016 at 4:31 PM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
wrote:

>
> At Mon, 24 Oct 2016 10:34:52 -0700,
> Rodrigo Damazio wrote:
> >
> > [1  <text/plain; UTF-8 (7bit)>]
> > It sounds like we'd like to do 3 somewhat orthogonal things:
> > - allow user to specify the directory the pattern is relative to
> > (root/cwd/any)
> > - allow the user to specify recursiveness/non-recursiveness consistently
> > (not covered by the *path patterns, but could be the defined behavior for
> > the globs)
> > - clean up the matcher API (discussed during Sprint)
> >
> > Doing all 3 together would probably take some time and a lot of
> > back-and-forth, so I'm wondering if it'd be ok to start by updating this
> > patch to implement "rootglob" with consistent recursiveness behavior, and
> > we can then more slowly add the other patterns and converge on a cleaner
> > API?
>
> (let's suspend posting revised series while code freeze period, to
> focus on stabilization :-))
>

Sure, I understand you're under the freeze. Feel free to prioritize
reviewing my patches appropriately.
(notice the new patch is based on default, not stable)

    https://www.mercurial-scm.org/wiki/TimeBasedReleasePlan#Code_Freeze
>
> In my previous reply, I assume that newly introduced syntaxes do:
>
>   - match recursively by default regardless of the way of passing
>     (command line, -I/-X, ....), because of similarity with almost all
>     of existing syntaxes
>
>     Only glob/relglob as PATTERN in command line require "end of name"
>     matching.
>
>   - require additional "-eon" ("end of name") suffix for non-recursive
>     matching (e.g. "rootglob-eon", "cwdre-eon", "anypath-eon", ...)
>
> But according to your revised patch, "rootglob" syntax matches
> non-recursively. Would you assume as below ?
>
>   - newly introduced syntaxes match non-recursively by default
>   - recursive matching requires any additional suffix (e.g. "-recursive")
>

Ah, the assumption is slightly different - the assumption is that for glob
types, specifically, we're doing a full match, so that to get recursiveness
at the end the user should specify /** or similar. This allows the user to
do recursive or non-recursive matching by using * or ** as appropriate.
I'll suggest that regex types also do a full match, and the user can end
them with .* if they want it to be a prefix.
I believe this is simpler and more flexible than having 18 different
pattern types just to account for the different behavior of the matching. I
considered that we could, likewise, make partial matching be the default,
but I decided against that when making the patch because then it'd be
impossible to make them non-recursive by a modifier, without doubling the
number of matchers as you suggested.

On the other hand, you assume that newly introduced *path syntaxes
> will be recursive, as below. Would you assume that default
> recursive-ness is different between *glob and *path syntaxes ?
>

path would be recursive, as will glob that ends with ** or regex that ends
with .*


> > Also, for discussion: I assume the *path patterns will be recursive when
> > they reference a directory. Do we also want a non-recursive equivalent
> > (rootexact, rootfiles, rootnonrecursive or something like that)?
>
> IMHO, making patch description explain how recursive matching will be
> controlled in the future helps reviewers to evaluate your patch.
>

I'm happy to update the documentation on my patch to better reflect the
full-matching characteristic, if it's OK to push a new version of the patch
:)


> BTW, bikeshedding about name of additional suffix:
>
>   - for non-recursive matching, in "recursive matching by default" case
>
>     - "-eon"
>
>       "end of name matching" is my coined word only for explanation,
>       and let's choose better one :-)
>
>     - "-exact" for non-recursive matching
>
>       this might confuse developers, because current implementation
>       already uses "exact" term as "matching without any special
>       handling".
>
>         https://selenic.com/repo/hg/file/438173c41587/mercurial/
> match.py#l100
>
>     - "-nonrecursive"
>
>       this is too long, isn't it ?
>
>     - "-file"
>
>       this seems better (short and understandable for end users)
>
>   - for recursive matching, in "non-recursive matching by default" case
>
>     - "-recursive"
>
>       this is too long, isn't it ?
>
>     - "-dir"
>
>       this seems better (short and understandable for end users)


> > Thanks
> > Rodrigo
> >
> >
> >
> > On Mon, Oct 24, 2016 at 6:21 AM, Pierre-Yves David <
> > pierre-yves.david@ens-lyon.org> wrote:
> >
> > >
> > >
> > > On 10/21/2016 05:13 PM, FUJIWARA Katsunori wrote:
> > >
> > >> At Tue, 18 Oct 2016 10:12:07 -0400,
> > >> Augie Fackler wrote:
> > >>
> > >>>
> > >>> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org>
> wrote:
> > >>>
> > >>>> On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
> > >>>>
> > >>>>> On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
> > >>>>>>
> > >>>>>>> After coordinating on irc to figure out what this proposal
> actually
> > >>>>>>> is, I've noticed that the semantics of this "exact" proposal are
> > >>>>>>> exactly what "glob" does today, which means (I think) that
> > >>>>>>> "files:foo/bar" should be representable as "glob:foo/bar/*" -
> what am
> > >>>>>>> I missing?
> > >>>>>>>
> > >>>>>>
> > >>>>>> Maybe we want a "glob" relative to the repo root?
> > >>>>>>
> > >>>>>
> > >>>>> As far as I can tell, it already is. "relglob:" is relative to your
> > >>>>> location in the repo according to the docs.
> > >>>>>
> > >>>>
> > >>>> Unfortunately that isn't.
> > >>>>
> > >>>>         'glob:<glob>' - a glob relative to cwd
> > >>>>         'relglob:<glob>' - an unrooted glob (*.c matches C files in
> all
> > >>>> dirs)
> > >>>>
> > >>>> Don't ask me why. ;-)
> > >>>>
> > >>>
> > >>> Oh wat. It looks like narrowhg might change this behavior in narrowed
> > >>> repositories, thus my additional confusion.
> > >>>
> > >>> Maybe we should add "absglob" that is always repo-root-absolute. How
> > >>> do we feel about that overall?
> > >>>
> > >>
> > >> FYI, current pattern matching is implemented as below. This was
> > >> chatted in "non-recursive directory matching" session of 4.0 sprint,
> > >> and sorry for my late posting of this translation from
> > >> http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in Japanese,
> as
> > >> my backlog of the last sprint.
> > >>
> > >>   ============ ======= ======= ===========
> > >>   pattern type root-ed cwd-ed  any-of-path
> > >>   ============ ======= ======= ===========
> > >>   wildcard     ---     glob    relglob
> > >>   regexp       re      ---     relre
> > >>   raw string   path    relpath ---
> > >>   ============ ======= ======= ===========
> > >>
> > >>   If rule is read in from file (e.g. .hgignore):
> > >>
> > >>     * "glob" is treated as "relglob"
> > >>     * "re" is treated as "relre"
> > >>
> > >>   This is mentioned in "hg help patterns" and "hg help hgignore", but
> > >>   syntax name "relglob" and "relre" themselves aren't explained.
> > >>
> > >>   "end of name" matching is required:
> > >>
> > >>     * for glob/relglob as PATTERN (e.g. argument in command line), but
> > >>     * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern
> syntaxes
> > >>
> > >>   For example, file "foo/bar/baz" is:
> > >>
> > >>     * not matched at "hg files glob:foo/bar"
> > >>     * but matched at "hg file -I glob:foo/bar"
> > >>
> > >>   This isn't mentioned in any help document :-<, and the latter seems
> > >>   to cause the issue mentioned in this patch series.
> > >>
> > >> How about introducing new systematic names like below to re-organize
> > >> current complicated mapping between names and matching ? (and enable
> > >> "end of name" matching by "-eon" suffix or so)
> > >>
> > >>   ============ ======== ======= ===========
> > >>   pattern type root-ed  cwd-ed  any-of-path
> > >>   ============ ======== ======= ===========
> > >>   wildcard     rootglob cwdglob anyglob
> > >>   regexp       rootre   cwdre   anyre
> > >>   raw string   rootpath cwdpath anypath
> > >>   ============ ======== ======= ===========
> > >>
> > >
> > > Moving toward a more regular and clear feature set and naming seems a
> win.
> > > I'm +1 for moving in that direction.
> > >
> > > Cheers,
> > >
> > > --
> > > Pierre-Yves David
> > >
> > [2  <text/html; UTF-8 (quoted-printable)>]
> >
>
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
>
Katsunori FUJIWARA - Oct. 26, 2016, 7:17 a.m.
At Tue, 25 Oct 2016 19:51:59 -0700,
Rodrigo Damazio wrote:
> 
> On Tue, Oct 25, 2016 at 4:31 PM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> wrote:
> 
> >
> > At Mon, 24 Oct 2016 10:34:52 -0700,
> > Rodrigo Damazio wrote:
> > >
> > > [1  <text/plain; UTF-8 (7bit)>]
> > > It sounds like we'd like to do 3 somewhat orthogonal things:
> > > - allow user to specify the directory the pattern is relative to
> > > (root/cwd/any)
> > > - allow the user to specify recursiveness/non-recursiveness consistently
> > > (not covered by the *path patterns, but could be the defined behavior for
> > > the globs)
> > > - clean up the matcher API (discussed during Sprint)
> > >
> > > Doing all 3 together would probably take some time and a lot of
> > > back-and-forth, so I'm wondering if it'd be ok to start by updating this
> > > patch to implement "rootglob" with consistent recursiveness behavior, and
> > > we can then more slowly add the other patterns and converge on a cleaner
> > > API?
> >
> > (let's suspend posting revised series while code freeze period, to
> > focus on stabilization :-))
> >
> 
> Sure, I understand you're under the freeze. Feel free to prioritize
> reviewing my patches appropriately.
> (notice the new patch is based on default, not stable)
> 
>     https://www.mercurial-scm.org/wiki/TimeBasedReleasePlan#Code_Freeze
> >
> > In my previous reply, I assume that newly introduced syntaxes do:
> >
> >   - match recursively by default regardless of the way of passing
> >     (command line, -I/-X, ....), because of similarity with almost all
> >     of existing syntaxes
> >
> >     Only glob/relglob as PATTERN in command line require "end of name"
> >     matching.
> >
> >   - require additional "-eon" ("end of name") suffix for non-recursive
> >     matching (e.g. "rootglob-eon", "cwdre-eon", "anypath-eon", ...)
> >
> > But according to your revised patch, "rootglob" syntax matches
> > non-recursively. Would you assume as below ?
> >
> >   - newly introduced syntaxes match non-recursively by default
> >   - recursive matching requires any additional suffix (e.g. "-recursive")
> >
> 
> Ah, the assumption is slightly different - the assumption is that for glob
> types, specifically, we're doing a full match, so that to get recursiveness
> at the end the user should specify /** or similar. This allows the user to
> do recursive or non-recursive matching by using * or ** as appropriate.
> I'll suggest that regex types also do a full match, and the user can end
> them with .* if they want it to be a prefix.
> I believe this is simpler and more flexible than having 18 different
> pattern types just to account for the different behavior of the matching. I
> considered that we could, likewise, make partial matching be the default,
> but I decided against that when making the patch because then it'd be
> impossible to make them non-recursive by a modifier, without doubling the
> number of matchers as you suggested.

It is right that glob and re pattern can switch
recursive/non-recursive by its own pattern, and controlling
recursive-ness by extra suffix of syntax name is redundant for them.

I also forgot that adding "(?:$)" or "(?:$|/)" to "re:" pattern
correctly according to recursive-ness might cause trouble for
complicated regexp :-<


> On the other hand, you assume that newly introduced *path syntaxes
> > will be recursive, as below. Would you assume that default
> > recursive-ness is different between *glob and *path syntaxes ?
> >
> 
> path would be recursive, as will glob that ends with ** or regex that ends
> with .*
> 
> 
> > > Also, for discussion: I assume the *path patterns will be recursive when
> > > they reference a directory. Do we also want a non-recursive equivalent
> > > (rootexact, rootfiles, rootnonrecursive or something like that)?

How about adding syntax type "file"/"dir" ?

  ===== ============= =================
  type  for recursive for non-recursive
  ===== ============= =================
  glob  use "**"      use "*"
  re    omit "$"      append "$"
  path  always(*1)    ----
  file  ----          always
  dir   always(*2)    ----
  ===== ============= =================

  (*1) match against both file and directory
  (*2) match against only directory

"dir" might be overkill, though :-) (is it useful in resolving name
collision at merging or so ?)

> >
> > IMHO, making patch description explain how recursive matching will be
> > controlled in the future helps reviewers to evaluate your patch.
> >
> 
> I'm happy to update the documentation on my patch to better reflect the
> full-matching characteristic, if it's OK to push a new version of the patch
> :)
> 
> 
> > BTW, bikeshedding about name of additional suffix:
> >
> >   - for non-recursive matching, in "recursive matching by default" case
> >
> >     - "-eon"
> >
> >       "end of name matching" is my coined word only for explanation,
> >       and let's choose better one :-)
> >
> >     - "-exact" for non-recursive matching
> >
> >       this might confuse developers, because current implementation
> >       already uses "exact" term as "matching without any special
> >       handling".
> >
> >         https://selenic.com/repo/hg/file/438173c41587/mercurial/
> > match.py#l100
> >
> >     - "-nonrecursive"
> >
> >       this is too long, isn't it ?
> >
> >     - "-file"
> >
> >       this seems better (short and understandable for end users)
> >
> >   - for recursive matching, in "non-recursive matching by default" case
> >
> >     - "-recursive"
> >
> >       this is too long, isn't it ?
> >
> >     - "-dir"
> >
> >       this seems better (short and understandable for end users)
> 
> 
> > > Thanks
> > > Rodrigo
> > >
> > >
> > >
> > > On Mon, Oct 24, 2016 at 6:21 AM, Pierre-Yves David <
> > > pierre-yves.david@ens-lyon.org> wrote:
> > >
> > > >
> > > >
> > > > On 10/21/2016 05:13 PM, FUJIWARA Katsunori wrote:
> > > >
> > > >> At Tue, 18 Oct 2016 10:12:07 -0400,
> > > >> Augie Fackler wrote:
> > > >>
> > > >>>
> > > >>> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org>
> > wrote:
> > > >>>
> > > >>>> On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
> > > >>>>
> > > >>>>> On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org> wrote:
> > > >>>>>>
> > > >>>>>>> After coordinating on irc to figure out what this proposal
> > actually
> > > >>>>>>> is, I've noticed that the semantics of this "exact" proposal are
> > > >>>>>>> exactly what "glob" does today, which means (I think) that
> > > >>>>>>> "files:foo/bar" should be representable as "glob:foo/bar/*" -
> > what am
> > > >>>>>>> I missing?
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> Maybe we want a "glob" relative to the repo root?
> > > >>>>>>
> > > >>>>>
> > > >>>>> As far as I can tell, it already is. "relglob:" is relative to your
> > > >>>>> location in the repo according to the docs.
> > > >>>>>
> > > >>>>
> > > >>>> Unfortunately that isn't.
> > > >>>>
> > > >>>>         'glob:<glob>' - a glob relative to cwd
> > > >>>>         'relglob:<glob>' - an unrooted glob (*.c matches C files in
> > all
> > > >>>> dirs)
> > > >>>>
> > > >>>> Don't ask me why. ;-)
> > > >>>>
> > > >>>
> > > >>> Oh wat. It looks like narrowhg might change this behavior in narrowed
> > > >>> repositories, thus my additional confusion.
> > > >>>
> > > >>> Maybe we should add "absglob" that is always repo-root-absolute. How
> > > >>> do we feel about that overall?
> > > >>>
> > > >>
> > > >> FYI, current pattern matching is implemented as below. This was
> > > >> chatted in "non-recursive directory matching" session of 4.0 sprint,
> > > >> and sorry for my late posting of this translation from
> > > >> http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in Japanese,
> > as
> > > >> my backlog of the last sprint.
> > > >>
> > > >>   ============ ======= ======= ===========
> > > >>   pattern type root-ed cwd-ed  any-of-path
> > > >>   ============ ======= ======= ===========
> > > >>   wildcard     ---     glob    relglob
> > > >>   regexp       re      ---     relre
> > > >>   raw string   path    relpath ---
> > > >>   ============ ======= ======= ===========
> > > >>
> > > >>   If rule is read in from file (e.g. .hgignore):
> > > >>
> > > >>     * "glob" is treated as "relglob"
> > > >>     * "re" is treated as "relre"
> > > >>
> > > >>   This is mentioned in "hg help patterns" and "hg help hgignore", but
> > > >>   syntax name "relglob" and "relre" themselves aren't explained.
> > > >>
> > > >>   "end of name" matching is required:
> > > >>
> > > >>     * for glob/relglob as PATTERN (e.g. argument in command line), but
> > > >>     * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern
> > syntaxes
> > > >>
> > > >>   For example, file "foo/bar/baz" is:
> > > >>
> > > >>     * not matched at "hg files glob:foo/bar"
> > > >>     * but matched at "hg file -I glob:foo/bar"
> > > >>
> > > >>   This isn't mentioned in any help document :-<, and the latter seems
> > > >>   to cause the issue mentioned in this patch series.
> > > >>
> > > >> How about introducing new systematic names like below to re-organize
> > > >> current complicated mapping between names and matching ? (and enable
> > > >> "end of name" matching by "-eon" suffix or so)
> > > >>
> > > >>   ============ ======== ======= ===========
> > > >>   pattern type root-ed  cwd-ed  any-of-path
> > > >>   ============ ======== ======= ===========
> > > >>   wildcard     rootglob cwdglob anyglob
> > > >>   regexp       rootre   cwdre   anyre
> > > >>   raw string   rootpath cwdpath anypath
> > > >>   ============ ======== ======= ===========
> > > >>
> > > >
> > > > Moving toward a more regular and clear feature set and naming seems a
> > win.
> > > > I'm +1 for moving in that direction.
> > > >
> > > > Cheers,
> > > >
> > > > --
> > > > Pierre-Yves David
> > > >
> > > [2  <text/html; UTF-8 (quoted-printable)>]
> > >
> >
> > ----------------------------------------------------------------------
> > [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
> >

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
via Mercurial-devel - Oct. 26, 2016, 9:02 p.m.
On Wed, Oct 26, 2016 at 12:17 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
wrote:

>
> At Tue, 25 Oct 2016 19:51:59 -0700,
> Rodrigo Damazio wrote:
> >
> > On Tue, Oct 25, 2016 at 4:31 PM, FUJIWARA Katsunori <
> foozy@lares.dti.ne.jp>
> > wrote:
> >
> > >
> > > At Mon, 24 Oct 2016 10:34:52 -0700,
> > > Rodrigo Damazio wrote:
> > > >
> > > > [1  <text/plain; UTF-8 (7bit)>]
> > > > It sounds like we'd like to do 3 somewhat orthogonal things:
> > > > - allow user to specify the directory the pattern is relative to
> > > > (root/cwd/any)
> > > > - allow the user to specify recursiveness/non-recursiveness
> consistently
> > > > (not covered by the *path patterns, but could be the defined
> behavior for
> > > > the globs)
> > > > - clean up the matcher API (discussed during Sprint)
> > > >
> > > > Doing all 3 together would probably take some time and a lot of
> > > > back-and-forth, so I'm wondering if it'd be ok to start by updating
> this
> > > > patch to implement "rootglob" with consistent recursiveness
> behavior, and
> > > > we can then more slowly add the other patterns and converge on a
> cleaner
> > > > API?
> > >
> > > (let's suspend posting revised series while code freeze period, to
> > > focus on stabilization :-))
> > >
> >
> > Sure, I understand you're under the freeze. Feel free to prioritize
> > reviewing my patches appropriately.
> > (notice the new patch is based on default, not stable)
> >
> >     https://www.mercurial-scm.org/wiki/TimeBasedReleasePlan#Code_Freeze
> > >
> > > In my previous reply, I assume that newly introduced syntaxes do:
> > >
> > >   - match recursively by default regardless of the way of passing
> > >     (command line, -I/-X, ....), because of similarity with almost all
> > >     of existing syntaxes
> > >
> > >     Only glob/relglob as PATTERN in command line require "end of name"
> > >     matching.
> > >
> > >   - require additional "-eon" ("end of name") suffix for non-recursive
> > >     matching (e.g. "rootglob-eon", "cwdre-eon", "anypath-eon", ...)
> > >
> > > But according to your revised patch, "rootglob" syntax matches
> > > non-recursively. Would you assume as below ?
> > >
> > >   - newly introduced syntaxes match non-recursively by default
> > >   - recursive matching requires any additional suffix (e.g.
> "-recursive")
> > >
> >
> > Ah, the assumption is slightly different - the assumption is that for
> glob
> > types, specifically, we're doing a full match, so that to get
> recursiveness
> > at the end the user should specify /** or similar. This allows the user
> to
> > do recursive or non-recursive matching by using * or ** as appropriate.
> > I'll suggest that regex types also do a full match, and the user can end
> > them with .* if they want it to be a prefix.
> > I believe this is simpler and more flexible than having 18 different
> > pattern types just to account for the different behavior of the
> matching. I
> > considered that we could, likewise, make partial matching be the default,
> > but I decided against that when making the patch because then it'd be
> > impossible to make them non-recursive by a modifier, without doubling the
> > number of matchers as you suggested.
>
> It is right that glob and re pattern can switch
> recursive/non-recursive by its own pattern, and controlling
> recursive-ness by extra suffix of syntax name is redundant for them.
>
> I also forgot that adding "(?:$)" or "(?:$|/)" to "re:" pattern
> correctly according to recursive-ness might cause trouble for
> complicated regexp :-<


>
> > On the other hand, you assume that newly introduced *path syntaxes
> > > will be recursive, as below. Would you assume that default
> > > recursive-ness is different between *glob and *path syntaxes ?
> > >
> >
> > path would be recursive, as will glob that ends with ** or regex that
> ends
> > with .*
> >
> >
> > > > Also, for discussion: I assume the *path patterns will be recursive
> when
> > > > they reference a directory. Do we also want a non-recursive
> equivalent
> > > > (rootexact, rootfiles, rootnonrecursive or something like that)?
>
> How about adding syntax type "file"/"dir" ?
>
>   ===== ============= =================
>   type  for recursive for non-recursive
>   ===== ============= =================
>   glob  use "**"      use "*"
>   re    omit "$"      append "$"
>   path  always(*1)    ----
>   file  ----          always
>   dir   always(*2)    ----
>   ===== ============= =================
>
>   (*1) match against both file and directory
>   (*2) match against only directory
>
> "dir" might be overkill, though :-) (is it useful in resolving name
> collision at merging or so ?)
>

foozy, thanks so much for the review and discussion.
Sounds like we do agree about the glob behavior then, so let me know if
you'd like any changes to the latest version of this patch, other than
improving documentation. I'm happy to send an updated version as soon as
someone is ready to review.

I understand the difference between dir and path (and between the original
version of this patch and file) would be that they'd validate the type of
entry being matched (so that passing a filename to dir or dir name to file
would be an error) - is that what you have in mind? The current matchers
don't have a good mechanism to verify the type, so some significant
rewiring would need to be done to pass that information down.
Another thought is that by supporting file and dir, you're incentivizing
developers to rely on smarter name collision support (and also case
collisions) - one could argue that there's no reason for the complexity
caused by that.


> > >
> > > IMHO, making patch description explain how recursive matching will be
> > > controlled in the future helps reviewers to evaluate your patch.
> > >
> >
> > I'm happy to update the documentation on my patch to better reflect the
> > full-matching characteristic, if it's OK to push a new version of the
> patch
> > :)
> >
> >
> > > BTW, bikeshedding about name of additional suffix:
> > >
> > >   - for non-recursive matching, in "recursive matching by default" case
> > >
> > >     - "-eon"
> > >
> > >       "end of name matching" is my coined word only for explanation,
> > >       and let's choose better one :-)
> > >
> > >     - "-exact" for non-recursive matching
> > >
> > >       this might confuse developers, because current implementation
> > >       already uses "exact" term as "matching without any special
> > >       handling".
> > >
> > >         https://selenic.com/repo/hg/file/438173c41587/mercurial/
> > > match.py#l100
> > >
> > >     - "-nonrecursive"
> > >
> > >       this is too long, isn't it ?
> > >
> > >     - "-file"
> > >
> > >       this seems better (short and understandable for end users)
> > >
> > >   - for recursive matching, in "non-recursive matching by default" case
> > >
> > >     - "-recursive"
> > >
> > >       this is too long, isn't it ?
> > >
> > >     - "-dir"
> > >
> > >       this seems better (short and understandable for end users)
> >
> >
> > > > Thanks
> > > > Rodrigo
> > > >
> > > >
> > > >
> > > > On Mon, Oct 24, 2016 at 6:21 AM, Pierre-Yves David <
> > > > pierre-yves.david@ens-lyon.org> wrote:
> > > >
> > > > >
> > > > >
> > > > > On 10/21/2016 05:13 PM, FUJIWARA Katsunori wrote:
> > > > >
> > > > >> At Tue, 18 Oct 2016 10:12:07 -0400,
> > > > >> Augie Fackler wrote:
> > > > >>
> > > > >>>
> > > > >>> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org>
> > > wrote:
> > > > >>>
> > > > >>>> On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
> > > > >>>>
> > > > >>>>> On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org>
> wrote:
> > > > >>>>>>
> > > > >>>>>>> After coordinating on irc to figure out what this proposal
> > > actually
> > > > >>>>>>> is, I've noticed that the semantics of this "exact" proposal
> are
> > > > >>>>>>> exactly what "glob" does today, which means (I think) that
> > > > >>>>>>> "files:foo/bar" should be representable as "glob:foo/bar/*" -
> > > what am
> > > > >>>>>>> I missing?
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>> Maybe we want a "glob" relative to the repo root?
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> As far as I can tell, it already is. "relglob:" is relative to
> your
> > > > >>>>> location in the repo according to the docs.
> > > > >>>>>
> > > > >>>>
> > > > >>>> Unfortunately that isn't.
> > > > >>>>
> > > > >>>>         'glob:<glob>' - a glob relative to cwd
> > > > >>>>         'relglob:<glob>' - an unrooted glob (*.c matches C
> files in
> > > all
> > > > >>>> dirs)
> > > > >>>>
> > > > >>>> Don't ask me why. ;-)
> > > > >>>>
> > > > >>>
> > > > >>> Oh wat. It looks like narrowhg might change this behavior in
> narrowed
> > > > >>> repositories, thus my additional confusion.
> > > > >>>
> > > > >>> Maybe we should add "absglob" that is always repo-root-absolute.
> How
> > > > >>> do we feel about that overall?
> > > > >>>
> > > > >>
> > > > >> FYI, current pattern matching is implemented as below. This was
> > > > >> chatted in "non-recursive directory matching" session of 4.0
> sprint,
> > > > >> and sorry for my late posting of this translation from
> > > > >> http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in
> Japanese,
> > > as
> > > > >> my backlog of the last sprint.
> > > > >>
> > > > >>   ============ ======= ======= ===========
> > > > >>   pattern type root-ed cwd-ed  any-of-path
> > > > >>   ============ ======= ======= ===========
> > > > >>   wildcard     ---     glob    relglob
> > > > >>   regexp       re      ---     relre
> > > > >>   raw string   path    relpath ---
> > > > >>   ============ ======= ======= ===========
> > > > >>
> > > > >>   If rule is read in from file (e.g. .hgignore):
> > > > >>
> > > > >>     * "glob" is treated as "relglob"
> > > > >>     * "re" is treated as "relre"
> > > > >>
> > > > >>   This is mentioned in "hg help patterns" and "hg help hgignore",
> but
> > > > >>   syntax name "relglob" and "relre" themselves aren't explained.
> > > > >>
> > > > >>   "end of name" matching is required:
> > > > >>
> > > > >>     * for glob/relglob as PATTERN (e.g. argument in command
> line), but
> > > > >>     * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern
> > > syntaxes
> > > > >>
> > > > >>   For example, file "foo/bar/baz" is:
> > > > >>
> > > > >>     * not matched at "hg files glob:foo/bar"
> > > > >>     * but matched at "hg file -I glob:foo/bar"
> > > > >>
> > > > >>   This isn't mentioned in any help document :-<, and the latter
> seems
> > > > >>   to cause the issue mentioned in this patch series.
> > > > >>
> > > > >> How about introducing new systematic names like below to
> re-organize
> > > > >> current complicated mapping between names and matching ? (and
> enable
> > > > >> "end of name" matching by "-eon" suffix or so)
> > > > >>
> > > > >>   ============ ======== ======= ===========
> > > > >>   pattern type root-ed  cwd-ed  any-of-path
> > > > >>   ============ ======== ======= ===========
> > > > >>   wildcard     rootglob cwdglob anyglob
> > > > >>   regexp       rootre   cwdre   anyre
> > > > >>   raw string   rootpath cwdpath anypath
> > > > >>   ============ ======== ======= ===========
> > > > >>
> > > > >
> > > > > Moving toward a more regular and clear feature set and naming
> seems a
> > > win.
> > > > > I'm +1 for moving in that direction.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > --
> > > > > Pierre-Yves David
> > > > >
> > > > [2  <text/html; UTF-8 (quoted-printable)>]
> > > >
> > >
> > > ----------------------------------------------------------------------
> > > [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
> > >
>
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
>
via Mercurial-devel - Nov. 7, 2016, 10:38 p.m.
On Wed, Oct 26, 2016 at 2:03 PM Rodrigo Damazio via Mercurial-devel <
mercurial-devel@mercurial-scm.org> wrote:

> On Wed, Oct 26, 2016 at 12:17 AM, FUJIWARA Katsunori <
> foozy@lares.dti.ne.jp> wrote:
>
>
> At Tue, 25 Oct 2016 19:51:59 -0700,
> Rodrigo Damazio wrote:
> >
> > On Tue, Oct 25, 2016 at 4:31 PM, FUJIWARA Katsunori <
> foozy@lares.dti.ne.jp>
> > wrote:
> >
> > >
> > > At Mon, 24 Oct 2016 10:34:52 -0700,
> > > Rodrigo Damazio wrote:
> > > >
> > > > [1  <text/plain; UTF-8 (7bit)>]
> > > > It sounds like we'd like to do 3 somewhat orthogonal things:
> > > > - allow user to specify the directory the pattern is relative to
> > > > (root/cwd/any)
> > > > - allow the user to specify recursiveness/non-recursiveness
> consistently
> > > > (not covered by the *path patterns, but could be the defined
> behavior for
> > > > the globs)
> > > > - clean up the matcher API (discussed during Sprint)
> > > >
> > > > Doing all 3 together would probably take some time and a lot of
> > > > back-and-forth, so I'm wondering if it'd be ok to start by updating
> this
> > > > patch to implement "rootglob" with consistent recursiveness
> behavior, and
> > > > we can then more slowly add the other patterns and converge on a
> cleaner
> > > > API?
>
>
I'm obviously biased by working on the same project as you, but starting
with rootglob makes sense to me. The matcher API cleanup, whatever that is,
will probably be insignificantly harder because of rootglob. Even if we add
all nine (or more?) suggested patterns suggested by Foozy, I don't think it
will matter much for the refactoring.

However, I think the rootglob pattern has more impact to our users than it
does to our codebase, so what we may want to do now is to document it
better. I haven't thought much about it, but your patch didn't seem to
include any documentation. I'm thinking that one of the tables in this
thread should be in `hg help patterns` (i.e. mercurial/help/patterns.txt)
and we can perhaps think about how we want that text to look once we add
the other patterns Foozy suggested.

What do others think?



> > >
> > > (let's suspend posting revised series while code freeze period, to
> > > focus on stabilization :-))
> > >
> >
> > Sure, I understand you're under the freeze. Feel free to prioritize
> > reviewing my patches appropriately.
> > (notice the new patch is based on default, not stable)
> >
> >     https://www.mercurial-scm.org/wiki/TimeBasedReleasePlan#Code_Freeze
> > >
> > > In my previous reply, I assume that newly introduced syntaxes do:
> > >
> > >   - match recursively by default regardless of the way of passing
> > >     (command line, -I/-X, ....), because of similarity with almost all
> > >     of existing syntaxes
> > >
> > >     Only glob/relglob as PATTERN in command line require "end of name"
> > >     matching.
> > >
> > >   - require additional "-eon" ("end of name") suffix for non-recursive
> > >     matching (e.g. "rootglob-eon", "cwdre-eon", "anypath-eon", ...)
> > >
> > > But according to your revised patch, "rootglob" syntax matches
> > > non-recursively. Would you assume as below ?
> > >
> > >   - newly introduced syntaxes match non-recursively by default
> > >   - recursive matching requires any additional suffix (e.g.
> "-recursive")
> > >
> >
> > Ah, the assumption is slightly different - the assumption is that for
> glob
> > types, specifically, we're doing a full match, so that to get
> recursiveness
> > at the end the user should specify /** or similar. This allows the user
> to
> > do recursive or non-recursive matching by using * or ** as appropriate.
> > I'll suggest that regex types also do a full match, and the user can end
> > them with .* if they want it to be a prefix.
> > I believe this is simpler and more flexible than having 18 different
> > pattern types just to account for the different behavior of the
> matching. I
> > considered that we could, likewise, make partial matching be the default,
> > but I decided against that when making the patch because then it'd be
> > impossible to make them non-recursive by a modifier, without doubling the
> > number of matchers as you suggested.
>
> It is right that glob and re pattern can switch
> recursive/non-recursive by its own pattern, and controlling
> recursive-ness by extra suffix of syntax name is redundant for them.
>
> I also forgot that adding "(?:$)" or "(?:$|/)" to "re:" pattern
> correctly according to recursive-ness might cause trouble for
> complicated regexp :-<
>
>
>
> > On the other hand, you assume that newly introduced *path syntaxes
> > > will be recursive, as below. Would you assume that default
> > > recursive-ness is different between *glob and *path syntaxes ?
> > >
> >
> > path would be recursive, as will glob that ends with ** or regex that
> ends
> > with .*
> >
> >
> > > > Also, for discussion: I assume the *path patterns will be recursive
> when
> > > > they reference a directory. Do we also want a non-recursive
> equivalent
> > > > (rootexact, rootfiles, rootnonrecursive or something like that)?
>
> How about adding syntax type "file"/"dir" ?
>
>   ===== ============= =================
>   type  for recursive for non-recursive
>   ===== ============= =================
>   glob  use "**"      use "*"
>   re    omit "$"      append "$"
>   path  always(*1)    ----
>   file  ----          always
>   dir   always(*2)    ----
>   ===== ============= =================
>
>   (*1) match against both file and directory
>   (*2) match against only directory
>
> "dir" might be overkill, though :-) (is it useful in resolving name
> collision at merging or so ?)
>
>
> foozy, thanks so much for the review and discussion.
> Sounds like we do agree about the glob behavior then, so let me know if
> you'd like any changes to the latest version of this patch, other than
> improving documentation. I'm happy to send an updated version as soon as
> someone is ready to review.
>
> I understand the difference between dir and path (and between the original
> version of this patch and file) would be that they'd validate the type of
> entry being matched (so that passing a filename to dir or dir name to file
> would be an error) - is that what you have in mind? The current matchers
> don't have a good mechanism to verify the type, so some significant
> rewiring would need to be done to pass that information down.
> Another thought is that by supporting file and dir, you're incentivizing
> developers to rely on smarter name collision support (and also case
> collisions) - one could argue that there's no reason for the complexity
> caused by that.
>
>
> > >
> > > IMHO, making patch description explain how recursive matching will be
> > > controlled in the future helps reviewers to evaluate your patch.
> > >
> >
> > I'm happy to update the documentation on my patch to better reflect the
> > full-matching characteristic, if it's OK to push a new version of the
> patch
> > :)
> >
> >
> > > BTW, bikeshedding about name of additional suffix:
> > >
> > >   - for non-recursive matching, in "recursive matching by default" case
> > >
> > >     - "-eon"
> > >
> > >       "end of name matching" is my coined word only for explanation,
> > >       and let's choose better one :-)
> > >
> > >     - "-exact" for non-recursive matching
> > >
> > >       this might confuse developers, because current implementation
> > >       already uses "exact" term as "matching without any special
> > >       handling".
> > >
> > >         https://selenic.com/repo/hg/file/438173c41587/mercurial/
> > > match.py#l100
> > >
> > >     - "-nonrecursive"
> > >
> > >       this is too long, isn't it ?
> > >
> > >     - "-file"
> > >
> > >       this seems better (short and understandable for end users)
> > >
> > >   - for recursive matching, in "non-recursive matching by default" case
> > >
> > >     - "-recursive"
> > >
> > >       this is too long, isn't it ?
> > >
> > >     - "-dir"
> > >
> > >       this seems better (short and understandable for end users)
> >
> >
> > > > Thanks
> > > > Rodrigo
> > > >
> > > >
> > > >
> > > > On Mon, Oct 24, 2016 at 6:21 AM, Pierre-Yves David <
> > > > pierre-yves.david@ens-lyon.org> wrote:
> > > >
> > > > >
> > > > >
> > > > > On 10/21/2016 05:13 PM, FUJIWARA Katsunori wrote:
> > > > >
> > > > >> At Tue, 18 Oct 2016 10:12:07 -0400,
> > > > >> Augie Fackler wrote:
> > > > >>
> > > > >>>
> > > > >>> On Tue, Oct 18, 2016 at 9:52 AM, Yuya Nishihara <yuya@tcha.org>
> > > wrote:
> > > > >>>
> > > > >>>> On Tue, 18 Oct 2016 09:40:36 -0400, Augie Fackler wrote:
> > > > >>>>
> > > > >>>>> On Oct 18, 2016, at 09:38, Yuya Nishihara <yuya@tcha.org>
> wrote:
> > > > >>>>>>
> > > > >>>>>>> After coordinating on irc to figure out what this proposal
> > > actually
> > > > >>>>>>> is, I've noticed that the semantics of this "exact" proposal
> are
> > > > >>>>>>> exactly what "glob" does today, which means (I think) that
> > > > >>>>>>> "files:foo/bar" should be representable as "glob:foo/bar/*" -
> > > what am
> > > > >>>>>>> I missing?
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>> Maybe we want a "glob" relative to the repo root?
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>> As far as I can tell, it already is. "relglob:" is relative to
> your
> > > > >>>>> location in the repo according to the docs.
> > > > >>>>>
> > > > >>>>
> > > > >>>> Unfortunately that isn't.
> > > > >>>>
> > > > >>>>         'glob:<glob>' - a glob relative to cwd
> > > > >>>>         'relglob:<glob>' - an unrooted glob (*.c matches C
> files in
> > > all
> > > > >>>> dirs)
> > > > >>>>
> > > > >>>> Don't ask me why. ;-)
> > > > >>>>
> > > > >>>
> > > > >>> Oh wat. It looks like narrowhg might change this behavior in
> narrowed
> > > > >>> repositories, thus my additional confusion.
> > > > >>>
> > > > >>> Maybe we should add "absglob" that is always repo-root-absolute.
> How
> > > > >>> do we feel about that overall?
> > > > >>>
> > > > >>
> > > > >> FYI, current pattern matching is implemented as below. This was
> > > > >> chatted in "non-recursive directory matching" session of 4.0
> sprint,
> > > > >> and sorry for my late posting of this translation from
> > > > >> http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in
> Japanese,
> > > as
> > > > >> my backlog of the last sprint.
> > > > >>
> > > > >>   ============ ======= ======= ===========
> > > > >>   pattern type root-ed cwd-ed  any-of-path
> > > > >>   ============ ======= ======= ===========
> > > > >>   wildcard     ---     glob    relglob
> > > > >>   regexp       re      ---     relre
> > > > >>   raw string   path    relpath ---
> > > > >>   ============ ======= ======= ===========
> > > > >>
> > > > >>   If rule is read in from file (e.g. .hgignore):
> > > > >>
> > > > >>     * "glob" is treated as "relglob"
> > > > >>     * "re" is treated as "relre"
> > > > >>
> > > > >>   This is mentioned in "hg help patterns" and "hg help hgignore",
> but
> > > > >>   syntax name "relglob" and "relre" themselves aren't explained.
> > > > >>
> > > > >>   "end of name" matching is required:
> > > > >>
> > > > >>     * for glob/relglob as PATTERN (e.g. argument in command
> line), but
> > > > >>     * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern
> > > syntaxes
> > > > >>
> > > > >>   For example, file "foo/bar/baz" is:
> > > > >>
> > > > >>     * not matched at "hg files glob:foo/bar"
> > > > >>     * but matched at "hg file -I glob:foo/bar"
> > > > >>
> > > > >>   This isn't mentioned in any help document :-<, and the latter
> seems
> > > > >>   to cause the issue mentioned in this patch series.
>
>
`hg help patterns` actually has the following section that I suspect is
meant to say this, although it definitely could have been made clearer:

    All patterns, except for "glob:" specified in command line (not for "-I"
    or "-X" options), can match also against directories: files under
matched
    directories are treated as matched.



> > > > >>
> > > > >> How about introducing new systematic names like below to
> re-organize
> > > > >> current complicated mapping between names and matching ? (and
> enable
> > > > >> "end of name" matching by "-eon" suffix or so)
> > > > >>
> > > > >>   ============ ======== ======= ===========
> > > > >>   pattern type root-ed  cwd-ed  any-of-path
> > > > >>   ============ ======== ======= ===========
> > > > >>   wildcard     rootglob cwdglob anyglob
> > > > >>   regexp       rootre   cwdre   anyre
> > > > >>   raw string   rootpath cwdpath anypath
> > > > >>   ============ ======== ======= ===========
> > > > >>
> > > > >
> > > > > Moving toward a more regular and clear feature set and naming
> seems a
> > > win.
> > > > > I'm +1 for moving in that direction.
> > > > >
> > > > > Cheers,
> > > > >
> > > > > --
> > > > > Pierre-Yves David
> > > > >
> > > > [2  <text/html; UTF-8 (quoted-printable)>]
> > > >
> > >
> > > ----------------------------------------------------------------------
> > > [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
> > >
>
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Katsunori FUJIWARA - Nov. 15, 2016, 1:46 p.m.
At Mon, 7 Nov 2016 14:38:00 -0800,
Martin von Zweigbergk wrote:
> 
> On Wed, Oct 26, 2016 at 2:03 PM Rodrigo Damazio via Mercurial-devel <
> mercurial-devel@mercurial-scm.org> wrote:
> 
> > On Wed, Oct 26, 2016 at 12:17 AM, FUJIWARA Katsunori <
> > foozy@lares.dti.ne.jp> wrote:
> >
> >
> > At Tue, 25 Oct 2016 19:51:59 -0700,
> > Rodrigo Damazio wrote:
> > >
> > > On Tue, Oct 25, 2016 at 4:31 PM, FUJIWARA Katsunori <
> > foozy@lares.dti.ne.jp>
> > > wrote:
> > >
> > > >
> > > > At Mon, 24 Oct 2016 10:34:52 -0700,
> > > > Rodrigo Damazio wrote:
> > > > >

[snip]

> > > > > On Mon, Oct 24, 2016 at 6:21 AM, Pierre-Yves David <
> > > > > pierre-yves.david@ens-lyon.org> wrote:
> > > > >
> > > > > >
> > > > > > On 10/21/2016 05:13 PM, FUJIWARA Katsunori wrote:
> > > > > >

[snip]

> > > > > >> FYI, current pattern matching is implemented as below. This was
> > > > > >> chatted in "non-recursive directory matching" session of 4.0 sprint,
> > > > > >> and sorry for my late posting of this translation from
> > > > > >> http://d.hatena.ne.jp/flying-foozy/20140107/1389087728 in Japanese, as
> > > > > >> my backlog of the last sprint.
> > > > > >>
> > > > > >>   ============ ======= ======= ===========
> > > > > >>   pattern type root-ed cwd-ed  any-of-path
> > > > > >>   ============ ======= ======= ===========
> > > > > >>   wildcard     ---     glob    relglob
> > > > > >>   regexp       re      ---     relre
> > > > > >>   raw string   path    relpath ---
> > > > > >>   ============ ======= ======= ===========
> > > > > >>
> > > > > >>   If rule is read in from file (e.g. .hgignore):
> > > > > >>
> > > > > >>     * "glob" is treated as "relglob"
> > > > > >>     * "re" is treated as "relre"
> > > > > >>
> > > > > >>   This is mentioned in "hg help patterns" and "hg help hgignore", but
> > > > > >>   syntax name "relglob" and "relre" themselves aren't explained.
> > > > > >>
> > > > > >>   "end of name" matching is required:
> > > > > >>
> > > > > >>     * for glob/relglob as PATTERN (e.g. argument in command line), but
> > > > > >>     * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern syntaxes
> > > > > >>
> > > > > >>   For example, file "foo/bar/baz" is:
> > > > > >>
> > > > > >>     * not matched at "hg files glob:foo/bar"
> > > > > >>     * but matched at "hg file -I glob:foo/bar"
> > > > > >>
> > > > > >>   This isn't mentioned in any help document :-<, and the latter seems
> > > > > >>   to cause the issue mentioned in this patch series.
> >
> >
> `hg help patterns` actually has the following section that I suspect is
> meant to say this, although it definitely could have been made clearer:
> 
>     All patterns, except for "glob:" specified in command line (not for "-I"
>     or "-X" options), can match also against directories: files under
> matched
>     directories are treated as matched.

Oops, I forgot my own patch to add this explanation, which was posted
after writing my blog entry described above :-<

    https://www.mercurial-scm.org/repo/hg/rev/50db996bccaf

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Katsunori FUJIWARA - Nov. 17, 2016, 3:52 p.m.
(sorry for late reply)

At Wed, 26 Oct 2016 14:02:48 -0700,
Rodrigo Damazio wrote:
> 
> On Wed, Oct 26, 2016 at 12:17 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> wrote:
> 
> >
> > At Tue, 25 Oct 2016 19:51:59 -0700,
> > Rodrigo Damazio wrote:
> > >
> > > On Tue, Oct 25, 2016 at 4:31 PM, FUJIWARA Katsunori <
> > foozy@lares.dti.ne.jp>
> > > wrote:
> > >
> > > >
> > > > At Mon, 24 Oct 2016 10:34:52 -0700,
> > > > Rodrigo Damazio wrote:

[snip]

> > > On the other hand, you assume that newly introduced *path syntaxes
> > > > will be recursive, as below. Would you assume that default
> > > > recursive-ness is different between *glob and *path syntaxes ?
> > > >
> > >
> > > path would be recursive, as will glob that ends with ** or regex that
> > ends
> > > with .*
> > >
> > >
> > > > > Also, for discussion: I assume the *path patterns will be recursive
> > when
> > > > > they reference a directory. Do we also want a non-recursive
> > equivalent
> > > > > (rootexact, rootfiles, rootnonrecursive or something like that)?
> >
> > How about adding syntax type "file"/"dir" ?
> >
> >   ===== ============= =================
> >   type  for recursive for non-recursive
> >   ===== ============= =================
> >   glob  use "**"      use "*"
> >   re    omit "$"      append "$"
> >   path  always(*1)    ----
> >   file  ----          always
> >   dir   always(*2)    ----
> >   ===== ============= =================
> >
> >   (*1) match against both file and directory
> >   (*2) match against only directory
> >
> > "dir" might be overkill, though :-) (is it useful in resolving name
> > collision at merging or so ?)
> >
> 
> foozy, thanks so much for the review and discussion.
> Sounds like we do agree about the glob behavior then, so let me know if
> you'd like any changes to the latest version of this patch, other than
> improving documentation. I'm happy to send an updated version as soon as
> someone is ready to review.
> 
> I understand the difference between dir and path (and between the original
> version of this patch and file) would be that they'd validate the type of
> entry being matched (so that passing a filename to dir or dir name to file
> would be an error) - is that what you have in mind?

Yes > "passing a filename to dir or dir name to file would be an error"


> The current matchers
> don't have a good mechanism to verify the type, so some significant
> rewiring would need to be done to pass that information down.

Current match implement uses two additional pattern suffix '(?:/|$)'
and '$' to control recursive matching of "glob" and "path". The former
allows to match recursively (for "glob" and "path"), and the latter
doesn't (only for "glob").

I simply think using this technique to implement pattern types "file"
and "dir".

    path:PATTERN => ESCAPED-PATTERN(?:/|$)
    file:PATTERN => ESCAPED-PATTERN$
    dif:PATTERN  => ESCAPED-PATTERN/


> Another thought is that by supporting file and dir, you're incentivizing
> developers to rely on smarter name collision support (and also case
> collisions) - one could argue that there's no reason for the complexity
> caused by that.

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
via Mercurial-devel - Nov. 17, 2016, 9:19 p.m.
On Thu, Nov 17, 2016 at 7:52 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
wrote:

>
> (sorry for late reply)
>
> At Wed, 26 Oct 2016 14:02:48 -0700,
> Rodrigo Damazio wrote:
> >
> > On Wed, Oct 26, 2016 at 12:17 AM, FUJIWARA Katsunori <
> foozy@lares.dti.ne.jp>
> > wrote:
> >
> > >
> > > At Tue, 25 Oct 2016 19:51:59 -0700,
> > > Rodrigo Damazio wrote:
> > > >
> > > > On Tue, Oct 25, 2016 at 4:31 PM, FUJIWARA Katsunori <
> > > foozy@lares.dti.ne.jp>
> > > > wrote:
> > > >
> > > > >
> > > > > At Mon, 24 Oct 2016 10:34:52 -0700,
> > > > > Rodrigo Damazio wrote:
>
> [snip]
>
> > > > On the other hand, you assume that newly introduced *path syntaxes
> > > > > will be recursive, as below. Would you assume that default
> > > > > recursive-ness is different between *glob and *path syntaxes ?
> > > > >
> > > >
> > > > path would be recursive, as will glob that ends with ** or regex that
> > > ends
> > > > with .*
> > > >
> > > >
> > > > > > Also, for discussion: I assume the *path patterns will be
> recursive
> > > when
> > > > > > they reference a directory. Do we also want a non-recursive
> > > equivalent
> > > > > > (rootexact, rootfiles, rootnonrecursive or something like that)?
> > >
> > > How about adding syntax type "file"/"dir" ?
> > >
> > >   ===== ============= =================
> > >   type  for recursive for non-recursive
> > >   ===== ============= =================
> > >   glob  use "**"      use "*"
> > >   re    omit "$"      append "$"
> > >   path  always(*1)    ----
> > >   file  ----          always
> > >   dir   always(*2)    ----
> > >   ===== ============= =================
> > >
> > >   (*1) match against both file and directory
> > >   (*2) match against only directory
> > >
> > > "dir" might be overkill, though :-) (is it useful in resolving name
> > > collision at merging or so ?)
> > >
> >
> > foozy, thanks so much for the review and discussion.
> > Sounds like we do agree about the glob behavior then, so let me know if
> > you'd like any changes to the latest version of this patch, other than
> > improving documentation. I'm happy to send an updated version as soon as
> > someone is ready to review.
> >
> > I understand the difference between dir and path (and between the
> original
> > version of this patch and file) would be that they'd validate the type of
> > entry being matched (so that passing a filename to dir or dir name to
> file
> > would be an error) - is that what you have in mind?
>
> Yes > "passing a filename to dir or dir name to file would be an error"
>
>
> > The current matchers
> > don't have a good mechanism to verify the type, so some significant
> > rewiring would need to be done to pass that information down.
>
> Current match implement uses two additional pattern suffix '(?:/|$)'
> and '$' to control recursive matching of "glob" and "path". The former
> allows to match recursively (for "glob" and "path"), and the latter
> doesn't (only for "glob").
>
> I simply think using this technique to implement pattern types "file"
> and "dir".
>
>     path:PATTERN => ESCAPED-PATTERN(?:/|$)
>     file:PATTERN => ESCAPED-PATTERN$
>     dif:PATTERN  => ESCAPED-PATTERN/
>

Yes, "files:" was the original version of this patch and the case I really
care about :) I changed it to rootglob after your comments.
Which way would be preferred to move forward?
via Mercurial-devel - Nov. 24, 2016, 3:55 a.m.
Hi guys - any comments on the preferred way forward?

(I do have a follow-up patch for optimizing visitdir accordingly, but don't
want to send it until this one is agreed upon)


On Thu, Nov 17, 2016 at 1:19 PM, Rodrigo Damazio <rdamazio@google.com>
wrote:

>
>
> On Thu, Nov 17, 2016 at 7:52 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp
> > wrote:
>
>>
>> (sorry for late reply)
>>
>> At Wed, 26 Oct 2016 14:02:48 -0700,
>> Rodrigo Damazio wrote:
>> >
>> > On Wed, Oct 26, 2016 at 12:17 AM, FUJIWARA Katsunori <
>> foozy@lares.dti.ne.jp>
>> > wrote:
>> >
>> > >
>> > > At Tue, 25 Oct 2016 19:51:59 -0700,
>> > > Rodrigo Damazio wrote:
>> > > >
>> > > > On Tue, Oct 25, 2016 at 4:31 PM, FUJIWARA Katsunori <
>> > > foozy@lares.dti.ne.jp>
>> > > > wrote:
>> > > >
>> > > > >
>> > > > > At Mon, 24 Oct 2016 10:34:52 -0700,
>> > > > > Rodrigo Damazio wrote:
>>
>> [snip]
>>
>> > > > On the other hand, you assume that newly introduced *path syntaxes
>> > > > > will be recursive, as below. Would you assume that default
>> > > > > recursive-ness is different between *glob and *path syntaxes ?
>> > > > >
>> > > >
>> > > > path would be recursive, as will glob that ends with ** or regex
>> that
>> > > ends
>> > > > with .*
>> > > >
>> > > >
>> > > > > > Also, for discussion: I assume the *path patterns will be
>> recursive
>> > > when
>> > > > > > they reference a directory. Do we also want a non-recursive
>> > > equivalent
>> > > > > > (rootexact, rootfiles, rootnonrecursive or something like that)?
>> > >
>> > > How about adding syntax type "file"/"dir" ?
>> > >
>> > >   ===== ============= =================
>> > >   type  for recursive for non-recursive
>> > >   ===== ============= =================
>> > >   glob  use "**"      use "*"
>> > >   re    omit "$"      append "$"
>> > >   path  always(*1)    ----
>> > >   file  ----          always
>> > >   dir   always(*2)    ----
>> > >   ===== ============= =================
>> > >
>> > >   (*1) match against both file and directory
>> > >   (*2) match against only directory
>> > >
>> > > "dir" might be overkill, though :-) (is it useful in resolving name
>> > > collision at merging or so ?)
>> > >
>> >
>> > foozy, thanks so much for the review and discussion.
>> > Sounds like we do agree about the glob behavior then, so let me know if
>> > you'd like any changes to the latest version of this patch, other than
>> > improving documentation. I'm happy to send an updated version as soon as
>> > someone is ready to review.
>> >
>> > I understand the difference between dir and path (and between the
>> original
>> > version of this patch and file) would be that they'd validate the type
>> of
>> > entry being matched (so that passing a filename to dir or dir name to
>> file
>> > would be an error) - is that what you have in mind?
>>
>> Yes > "passing a filename to dir or dir name to file would be an error"
>>
>>
>> > The current matchers
>> > don't have a good mechanism to verify the type, so some significant
>> > rewiring would need to be done to pass that information down.
>>
>> Current match implement uses two additional pattern suffix '(?:/|$)'
>> and '$' to control recursive matching of "glob" and "path". The former
>> allows to match recursively (for "glob" and "path"), and the latter
>> doesn't (only for "glob").
>>
>> I simply think using this technique to implement pattern types "file"
>> and "dir".
>>
>>     path:PATTERN => ESCAPED-PATTERN(?:/|$)
>>     file:PATTERN => ESCAPED-PATTERN$
>>     dif:PATTERN  => ESCAPED-PATTERN/
>>
>
> Yes, "files:" was the original version of this patch and the case I really
> care about :) I changed it to rootglob after your comments.
> Which way would be preferred to move forward?
>
>
Katsunori FUJIWARA - Nov. 24, 2016, 3:28 p.m.
At Wed, 23 Nov 2016 19:55:16 -0800,
Rodrigo Damazio wrote:
> 
> Hi guys - any comments on the preferred way forward?
> 
> (I do have a follow-up patch for optimizing visitdir accordingly, but don't
> want to send it until this one is agreed upon)

Sorry for long interval !

> On Thu, Nov 17, 2016 at 1:19 PM, Rodrigo Damazio <rdamazio@google.com>
> wrote:
> 
> >
> >
> > On Thu, Nov 17, 2016 at 7:52 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp
> > > wrote:
> >
> >>
> >> (sorry for late reply)
> >>
> >> At Wed, 26 Oct 2016 14:02:48 -0700,
> >> Rodrigo Damazio wrote:
> >> >
> >> > On Wed, Oct 26, 2016 at 12:17 AM, FUJIWARA Katsunori <
> >> foozy@lares.dti.ne.jp>
> >> > wrote:
> >> >
> >> > >
> >> > > At Tue, 25 Oct 2016 19:51:59 -0700,
> >> > > Rodrigo Damazio wrote:
> >> > > >
> >> > > > On Tue, Oct 25, 2016 at 4:31 PM, FUJIWARA Katsunori <
> >> > > foozy@lares.dti.ne.jp>
> >> > > > wrote:
> >> > > >
> >> > > > >
> >> > > > > At Mon, 24 Oct 2016 10:34:52 -0700,
> >> > > > > Rodrigo Damazio wrote:
> >>
> >> [snip]
> >>
> >> > > > On the other hand, you assume that newly introduced *path syntaxes
> >> > > > > will be recursive, as below. Would you assume that default
> >> > > > > recursive-ness is different between *glob and *path syntaxes ?
> >> > > > >
> >> > > >
> >> > > > path would be recursive, as will glob that ends with ** or regex
> >> that
> >> > > ends
> >> > > > with .*
> >> > > >
> >> > > >
> >> > > > > > Also, for discussion: I assume the *path patterns will be
> >> recursive
> >> > > when
> >> > > > > > they reference a directory. Do we also want a non-recursive
> >> > > equivalent
> >> > > > > > (rootexact, rootfiles, rootnonrecursive or something like that)?
> >> > >
> >> > > How about adding syntax type "file"/"dir" ?
> >> > >
> >> > >   ===== ============= =================
> >> > >   type  for recursive for non-recursive
> >> > >   ===== ============= =================
> >> > >   glob  use "**"      use "*"
> >> > >   re    omit "$"      append "$"
> >> > >   path  always(*1)    ----
> >> > >   file  ----          always
> >> > >   dir   always(*2)    ----
> >> > >   ===== ============= =================
> >> > >
> >> > >   (*1) match against both file and directory
> >> > >   (*2) match against only directory
> >> > >
> >> > > "dir" might be overkill, though :-) (is it useful in resolving name
> >> > > collision at merging or so ?)
> >> > >
> >> >
> >> > foozy, thanks so much for the review and discussion.
> >> > Sounds like we do agree about the glob behavior then, so let me know if
> >> > you'd like any changes to the latest version of this patch, other than
> >> > improving documentation. I'm happy to send an updated version as soon as
> >> > someone is ready to review.
> >> >
> >> > I understand the difference between dir and path (and between the
> >> original
> >> > version of this patch and file) would be that they'd validate the type
> >> of
> >> > entry being matched (so that passing a filename to dir or dir name to
> >> file
> >> > would be an error) - is that what you have in mind?
> >>
> >> Yes > "passing a filename to dir or dir name to file would be an error"
> >>
> >>
> >> > The current matchers
> >> > don't have a good mechanism to verify the type, so some significant
> >> > rewiring would need to be done to pass that information down.
> >>
> >> Current match implement uses two additional pattern suffix '(?:/|$)'
> >> and '$' to control recursive matching of "glob" and "path". The former
> >> allows to match recursively (for "glob" and "path"), and the latter
> >> doesn't (only for "glob").
> >>
> >> I simply think using this technique to implement pattern types "file"
> >> and "dir".
> >>
> >>     path:PATTERN => ESCAPED-PATTERN(?:/|$)
> >>     file:PATTERN => ESCAPED-PATTERN$
> >>     dif:PATTERN  => ESCAPED-PATTERN/
> >>
> >
> > Yes, "files:" was the original version of this patch and the case I really
> > care about :) I changed it to rootglob after your comments.
> > Which way would be preferred to move forward?

"files:" is "path:" family, and "rootglob:" is "glob:" family. As we
concluded before, "path:" itself can't control recursion of matching
well.

Therefore, I think that "files:" should be implemented if needed,
regardless of implementing "rootglob:".

Of course, we need high point view of this area, at first :-)


BTW, it is a little ambiguous (at least, for me) that "files:foo"
matches against both file "foo" and files just under directory
"foo". Name other than "files:" may resolve this ambiguity, but I
don't have any better (and short enough) name :-<

  ========== ==== ======= ===========
  pattern    foo  foo/bar foo/bar/baz
  ========== ==== ======= ===========
  path:foo    o     o         o

  files:foo   o     o         x

  file:foo    o     x         x
  dir:foo     x     o         o
  ========== ==== ======= ===========


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Augie Fackler - Dec. 16, 2016, 1:19 a.m.
> On Nov 24, 2016, at 10:28 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp> wrote:
> 
>>> Yes, "files:" was the original version of this patch and the case I really
>>> care about :) I changed it to rootglob after your comments.
>>> Which way would be preferred to move forward?
> 
> "files:" is "path:" family, and "rootglob:" is "glob:" family. As we
> concluded before, "path:" itself can't control recursion of matching
> well.
> 
> Therefore, I think that "files:" should be implemented if needed,
> regardless of implementing "rootglob:".
> 
> Of course, we need high point view of this area, at first :-)
> 
> 
> BTW, it is a little ambiguous (at least, for me) that "files:foo"
> matches against both file "foo" and files just under directory
> "foo". Name other than "files:" may resolve this ambiguity, but I
> don't have any better (and short enough) name :-<
> 
>  ========== ==== ======= ===========
>  pattern    foo  foo/bar foo/bar/baz
>  ========== ==== ======= ===========
>  path:foo    o     o         o
> 
>  files:foo   o     o         x
> 
>  file:foo    o     x         x
>  dir:foo     x     o         o
>  ========== ==== ======= ===========
> 

Scanning the plan page, I see that there’s a *lot* of work that could be done and no consensus as yet, but that the only immediate use case seems to be the rootfile/rootglob case. Is there some path forward we could agree on that would unblock those immediate needs for narrowhg and not make things harder in the future?

Alternatively, would we be okay with a slight refactor of the matcher so that narrowhg can introduce a custom filesonly: matcher for the time being so we can keep making forward progress there?  I don’t know the matcher code well enough to be able to guess if this is a reasonable path so we can be unblocked.

(It’s very hard for to justify the amount of work implied by reaching consensus on FileNamePatternsPlan and then executing the entire thing when what we need is solvable today with a sub-hour patch to existing code, thus my trying to find a solution we can all live with.)

Thanks!
Augie
Pierre-Yves David - Dec. 16, 2016, 2:21 p.m.
On 12/16/2016 02:19 AM, Augie Fackler wrote:
>
>> On Nov 24, 2016, at 10:28 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp> wrote:
>>
>>>> Yes, "files:" was the original version of this patch and the case I really
>>>> care about :) I changed it to rootglob after your comments.
>>>> Which way would be preferred to move forward?
>>
>> "files:" is "path:" family, and "rootglob:" is "glob:" family. As we
>> concluded before, "path:" itself can't control recursion of matching
>> well.
>>
>> Therefore, I think that "files:" should be implemented if needed,
>> regardless of implementing "rootglob:".
>>
>> Of course, we need high point view of this area, at first :-)
>>
>>
>> BTW, it is a little ambiguous (at least, for me) that "files:foo"
>> matches against both file "foo" and files just under directory
>> "foo". Name other than "files:" may resolve this ambiguity, but I
>> don't have any better (and short enough) name :-<
>>
>>  ========== ==== ======= ===========
>>  pattern    foo  foo/bar foo/bar/baz
>>  ========== ==== ======= ===========
>>  path:foo    o     o         o
>>
>>  files:foo   o     o         x
>>
>>  file:foo    o     x         x
>>  dir:foo     x     o         o
>>  ========== ==== ======= ===========
>>
>
> Scanning the plan page, I see that there’s a *lot* of work that could be done and no consensus as yet, but that the only immediate use case seems to be the rootfile/rootglob case. Is there some path forward we could agree on that would unblock those immediate needs for narrowhg and not make things harder in the future?
>
> Alternatively, would we be okay with a slight refactor of the matcher so that narrowhg can introduce a custom filesonly: matcher for the time being so we can keep making forward progress there?  I don’t know the matcher code well enough to be able to guess if this is a reasonable path so we can be unblocked.
>
> (It’s very hard for to justify the amount of work implied by reaching consensus on FileNamePatternsPlan and then executing the entire thing when what we need is solvable today with a sub-hour patch to existing code, thus my trying to find a solution we can all live with.)

As far as I understand, Foozy finding shows that the feature narrowhg 
needs is already there and nothing new is necessary.

You can add "set:" in front of your glob to make it non recursive in all 
cases "set:your/directory/you/want/to/match/files/in/*"

If this does not fits your needs, this probably mean I got your usecase 
wrong. In that case can you re-explain the issue you are trying to solve 
here?


At the project level, it will make sense to clean up the Pattern 
Matching at some point, and Foozy wiki work will help us to do that.

Cheers.
via Mercurial-devel - Dec. 20, 2016, 5 a.m.
Unfortunately, while set would match the right files, because of the way
the code is structured, it provides no way to not try visiting the
directories inside the non-recursive match - the set needs to first collect
all the files in all subdirectories (match.py, _expandset) and then filter
that down to the desired ones. In plain hg repos, that's just much slower -
in the context of narrowhg, the repo will simply not have the manifests for
those subdirectories and trying to visit them will crash.

The follow-up change to this one (which I haven't sent yet but is written)
is updating visitdir to allow non-recursiveness, which btw makes something
like "hg files -I rootglob:browser/*" about 4-5x faster in the firefox repo.


On Fri, Dec 16, 2016 at 6:21 AM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 12/16/2016 02:19 AM, Augie Fackler wrote:
>
>>
>> On Nov 24, 2016, at 10:28 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
>>> wrote:
>>>
>>> Yes, "files:" was the original version of this patch and the case I
>>>>> really
>>>>> care about :) I changed it to rootglob after your comments.
>>>>> Which way would be preferred to move forward?
>>>>>
>>>>
>>> "files:" is "path:" family, and "rootglob:" is "glob:" family. As we
>>> concluded before, "path:" itself can't control recursion of matching
>>> well.
>>>
>>> Therefore, I think that "files:" should be implemented if needed,
>>> regardless of implementing "rootglob:".
>>>
>>> Of course, we need high point view of this area, at first :-)
>>>
>>>
>>> BTW, it is a little ambiguous (at least, for me) that "files:foo"
>>> matches against both file "foo" and files just under directory
>>> "foo". Name other than "files:" may resolve this ambiguity, but I
>>> don't have any better (and short enough) name :-<
>>>
>>>  ========== ==== ======= ===========
>>>  pattern    foo  foo/bar foo/bar/baz
>>>  ========== ==== ======= ===========
>>>  path:foo    o     o         o
>>>
>>>  files:foo   o     o         x
>>>
>>>  file:foo    o     x         x
>>>  dir:foo     x     o         o
>>>  ========== ==== ======= ===========
>>>
>>>
>> Scanning the plan page, I see that there’s a *lot* of work that could be
>> done and no consensus as yet, but that the only immediate use case seems to
>> be the rootfile/rootglob case. Is there some path forward we could agree on
>> that would unblock those immediate needs for narrowhg and not make things
>> harder in the future?
>>
>> Alternatively, would we be okay with a slight refactor of the matcher so
>> that narrowhg can introduce a custom filesonly: matcher for the time being
>> so we can keep making forward progress there?  I don’t know the matcher
>> code well enough to be able to guess if this is a reasonable path so we can
>> be unblocked.
>>
>> (It’s very hard for to justify the amount of work implied by reaching
>> consensus on FileNamePatternsPlan and then executing the entire thing when
>> what we need is solvable today with a sub-hour patch to existing code, thus
>> my trying to find a solution we can all live with.)
>>
>
> As far as I understand, Foozy finding shows that the feature narrowhg
> needs is already there and nothing new is necessary.
>
> You can add "set:" in front of your glob to make it non recursive in all
> cases "set:your/directory/you/want/to/match/files/in/*"
>
> If this does not fits your needs, this probably mean I got your usecase
> wrong. In that case can you re-explain the issue you are trying to solve
> here?
>
>
> At the project level, it will make sense to clean up the Pattern Matching
> at some point, and Foozy wiki work will help us to do that.
>
> Cheers.
>
> --
> Pierre-Yves David
>
Pierre-Yves David - Dec. 20, 2016, 1:47 p.m.
On 12/20/2016 06:00 AM, Rodrigo Damazio wrote:
> Unfortunately, while set would match the right files, because of the way
> the code is structured, it provides no way to not try visiting the
> directories inside the non-recursive match - the set needs to first
> collect all the files in all subdirectories (match.py, _expandset) and
> then filter that down to the desired ones. In plain hg repos, that's
> just much slower - in the context of narrowhg, the repo will simply not
> have the manifests for those subdirectories and trying to visit them
> will crash.

Okay, so this seems like the current tools allow you to specify the 
right request but shortcoming of the -implementation- are preventing 
that request to work probably with narrowhg (and have performance impacts)

Did I got that right ?

> The follow-up change to this one (which I haven't sent yet but is
> written) is updating visitdir to allow non-recursiveness, which btw
> makes something like "hg files -I rootglob:browser/*" about 4-5x faster
> in the firefox repo.

And, If I read you right, the implementation of 'rootglob:' you provided 
in your patch have the same implementation issue, but you have another 
patch to improve the implementation to behave a way you can use (and is 
faster).

Did I got that right too ?


If I got these two pieces right, it looks like we could just apply the 
improvement to 'visitdir' to 'set:your/glob/*' and have your usecase 
filled while not jumping into UI changes. Would that work for you ?

> On Fri, Dec 16, 2016 at 6:21 AM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
> wrote:
>
>
>
>     On 12/16/2016 02:19 AM, Augie Fackler wrote:
>
>
>             On Nov 24, 2016, at 10:28 AM, FUJIWARA Katsunori
>             <foozy@lares.dti.ne.jp <mailto:foozy@lares.dti.ne.jp>> wrote:
>
>                     Yes, "files:" was the original version of this patch
>                     and the case I really
>                     care about :) I changed it to rootglob after your
>                     comments.
>                     Which way would be preferred to move forward?
>
>
>             "files:" is "path:" family, and "rootglob:" is "glob:"
>             family. As we
>             concluded before, "path:" itself can't control recursion of
>             matching
>             well.
>
>             Therefore, I think that "files:" should be implemented if
>             needed,
>             regardless of implementing "rootglob:".
>
>             Of course, we need high point view of this area, at first :-)
>
>
>             BTW, it is a little ambiguous (at least, for me) that
>             "files:foo"
>             matches against both file "foo" and files just under directory
>             "foo". Name other than "files:" may resolve this ambiguity,
>             but I
>             don't have any better (and short enough) name :-<
>
>              ========== ==== ======= ===========
>              pattern    foo  foo/bar foo/bar/baz
>              ========== ==== ======= ===========
>              path:foo    o     o         o
>
>              files:foo   o     o         x
>
>              file:foo    o     x         x
>              dir:foo     x     o         o
>              ========== ==== ======= ===========
>
>
>         Scanning the plan page, I see that there’s a *lot* of work that
>         could be done and no consensus as yet, but that the only
>         immediate use case seems to be the rootfile/rootglob case. Is
>         there some path forward we could agree on that would unblock
>         those immediate needs for narrowhg and not make things harder in
>         the future?
>
>         Alternatively, would we be okay with a slight refactor of the
>         matcher so that narrowhg can introduce a custom filesonly:
>         matcher for the time being so we can keep making forward
>         progress there?  I don’t know the matcher code well enough to be
>         able to guess if this is a reasonable path so we can be unblocked.
>
>         (It’s very hard for to justify the amount of work implied by
>         reaching consensus on FileNamePatternsPlan and then executing
>         the entire thing when what we need is solvable today with a
>         sub-hour patch to existing code, thus my trying to find a
>         solution we can all live with.)
>
>
>     As far as I understand, Foozy finding shows that the feature
>     narrowhg needs is already there and nothing new is necessary.
>
>     You can add "set:" in front of your glob to make it non recursive in
>     all cases "set:your/directory/you/want/to/match/files/in/*"
>
>     If this does not fits your needs, this probably mean I got your
>     usecase wrong. In that case can you re-explain the issue you are
>     trying to solve here?
>
>
>     At the project level, it will make sense to clean up the Pattern
>     Matching at some point, and Foozy wiki work will help us to do that.
>
>     Cheers.
>
>     --
>     Pierre-Yves David
>
>
via Mercurial-devel - Dec. 21, 2016, 3:21 a.m.
On Tue, Dec 20, 2016 at 5:47 AM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 12/20/2016 06:00 AM, Rodrigo Damazio wrote:
>
>> Unfortunately, while set would match the right files, because of the way
>> the code is structured, it provides no way to not try visiting the
>> directories inside the non-recursive match - the set needs to first
>> collect all the files in all subdirectories (match.py, _expandset) and
>> then filter that down to the desired ones. In plain hg repos, that's
>> just much slower - in the context of narrowhg, the repo will simply not
>> have the manifests for those subdirectories and trying to visit them
>> will crash.
>>
>
> Okay, so this seems like the current tools allow you to specify the right
> request but shortcoming of the -implementation- are preventing that request
> to work probably with narrowhg (and have performance impacts)
>
> Did I got that right ?


Yes.

The follow-up change to this one (which I haven't sent yet but is
>> written) is updating visitdir to allow non-recursiveness, which btw
>> makes something like "hg files -I rootglob:browser/*" about 4-5x faster
>> in the firefox repo.
>>
>
> And, If I read you right, the implementation of 'rootglob:' you provided
> in your patch have the same implementation issue, but you have another
> patch to improve the implementation to behave a way you can use (and is
> faster).
>
> Did I got that right too ?
>

Yes.

If I got these two pieces right, it looks like we could just apply the
> improvement to 'visitdir' to 'set:your/glob/*' and have your usecase filled
> while not jumping into UI changes. Would that work for you ?
>

Not without a third set of changes, since set expansion doesn't use
visitdir (or the matcher being built) at all - the dependency is that
building the matcher depends on expanding the set (and thus the set can't
depend on the matcher).
It would technically be doable for re:, but I'm wary of getting into the
business of parsing and special-casing regexes to assume what they match or
don't.


> On Fri, Dec 16, 2016 at 6:21 AM, Pierre-Yves David
>> <pierre-yves.david@ens-lyon.org <mailto:pierre-yves.david@ens-lyon.org>>
>> wrote:
>>
>>
>>
>>     On 12/16/2016 02:19 AM, Augie Fackler wrote:
>>
>>
>>             On Nov 24, 2016, at 10:28 AM, FUJIWARA Katsunori
>>             <foozy@lares.dti.ne.jp <mailto:foozy@lares.dti.ne.jp>> wrote:
>>
>>                     Yes, "files:" was the original version of this patch
>>                     and the case I really
>>                     care about :) I changed it to rootglob after your
>>                     comments.
>>                     Which way would be preferred to move forward?
>>
>>
>>             "files:" is "path:" family, and "rootglob:" is "glob:"
>>             family. As we
>>             concluded before, "path:" itself can't control recursion of
>>             matching
>>             well.
>>
>>             Therefore, I think that "files:" should be implemented if
>>             needed,
>>             regardless of implementing "rootglob:".
>>
>>             Of course, we need high point view of this area, at first :-)
>>
>>
>>             BTW, it is a little ambiguous (at least, for me) that
>>             "files:foo"
>>             matches against both file "foo" and files just under directory
>>             "foo". Name other than "files:" may resolve this ambiguity,
>>             but I
>>             don't have any better (and short enough) name :-<
>>
>>              ========== ==== ======= ===========
>>              pattern    foo  foo/bar foo/bar/baz
>>              ========== ==== ======= ===========
>>              path:foo    o     o         o
>>
>>              files:foo   o     o         x
>>
>>              file:foo    o     x         x
>>              dir:foo     x     o         o
>>              ========== ==== ======= ===========
>>
>>
>>         Scanning the plan page, I see that there’s a *lot* of work that
>>         could be done and no consensus as yet, but that the only
>>         immediate use case seems to be the rootfile/rootglob case. Is
>>         there some path forward we could agree on that would unblock
>>         those immediate needs for narrowhg and not make things harder in
>>         the future?
>>
>>         Alternatively, would we be okay with a slight refactor of the
>>         matcher so that narrowhg can introduce a custom filesonly:
>>         matcher for the time being so we can keep making forward
>>         progress there?  I don’t know the matcher code well enough to be
>>         able to guess if this is a reasonable path so we can be unblocked.
>>
>>         (It’s very hard for to justify the amount of work implied by
>>         reaching consensus on FileNamePatternsPlan and then executing
>>         the entire thing when what we need is solvable today with a
>>         sub-hour patch to existing code, thus my trying to find a
>>         solution we can all live with.)
>>
>>
>>     As far as I understand, Foozy finding shows that the feature
>>     narrowhg needs is already there and nothing new is necessary.
>>
>>     You can add "set:" in front of your glob to make it non recursive in
>>     all cases "set:your/directory/you/want/to/match/files/in/*"
>>
>>     If this does not fits your needs, this probably mean I got your
>>     usecase wrong. In that case can you re-explain the issue you are
>>     trying to solve here?
>>
>>
>>     At the project level, it will make sense to clean up the Pattern
>>     Matching at some point, and Foozy wiki work will help us to do that.
>>
>>     Cheers.
>>
>>     --
>>     Pierre-Yves David
>>
>>
>>
> --
> Pierre-Yves David
>
Pierre-Yves David - Dec. 27, 2016, 10:14 a.m.
On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
>     If I got these two pieces right, it looks like we could just apply
>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
>     usecase filled while not jumping into UI changes. Would that work
>     for you ?
>
>
> Not without a third set of changes, since set expansion doesn't use
> visitdir (or the matcher being built) at all - the dependency is that
> building the matcher depends on expanding the set (and thus the set
> can't depend on the matcher).
> It would technically be doable for re:, but I'm wary of getting into the
> business of parsing and special-casing regexes to assume what they match
> or don't.

Rodrigo and I chatted directly about this a couple of days ago. Here is 
a quick summary of my new understanding of the situation.

Fileset
-------

Fileset (behind "set:") can give the right result, but it is powered by 
not very modern code, it follow the old revset principle of "get 
everything and then run filters on that everything". That does not fit 
Rodrigo needs at all. It was easy to make 'set:' a bit smarter in the 
simple case but then we get into the issue that the matcher class is 
using 'set:' in a strange, non-lazy, way that does not use all the 
'visitdir' feature Rodrigo/Google needs.

So in short, fileset needs a rework before being usable in a demanding 
context.


Current path restriction capability
-----------------------------------

The 'Match' class already have logic to restrict the path visited 
(implemented in the 'visitdir' method). To clarify, this logic as no 
effect on the returned match but is only an optimization for the 
directory we visit. It seems to only kicks in when treemanifest is used.
This logic already works with a couple of patterns type (all pattern use 
the same class). However, that logic currently do not support the case 
were one want to select some subdirectory and skips the rest of the 
subtrees under it.

note: Rodrigo, you seems to have a good understanding of the logic. Do 
you think you could document the involved attributes (_includeroots, 
_includedirs, _excluderoots, etc) That would help a lot the next poor 
souls looking at the code.

Way forward
-----------

That limitation in the matcher class optimization is the main blocker 
for Rodrigo/Google needs. The optimization is independent of the UI part 
we actually provides to user as all patterns use the same matcher class 
and some existing class could already benefit from this optimization.

Rodrigo seems to have a patch to update the matcher code to track and 
optimize the "subdir-but-not-subtree" case. He has not submitted this 
patch yet. Submitting that patches seems the next step to me. It will 
get the matcher code in a state that can actually be used for the 
narrowhg+treemanifest usecase.

Once that code is in, it seems easy to make sure various patterns use it 
basic, easily recognizable cases. We poked at updating the code to have 
basic regexp matching a subtree recognized as such and that was quite easy.


Rodrigo, does that match your current understanding of the situation?

Cheers,
via Mercurial-devel - Jan. 24, 2017, 1:02 a.m.
Getting back to this after the end-of-year hiatus (yes, I know it happens
to be during another code freeze :) I seem to have good timing).

On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>
>
> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
>
>>     If I got these two pieces right, it looks like we could just apply
>>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
>>     usecase filled while not jumping into UI changes. Would that work
>>     for you ?
>>
>>
>> Not without a third set of changes, since set expansion doesn't use
>> visitdir (or the matcher being built) at all - the dependency is that
>> building the matcher depends on expanding the set (and thus the set
>> can't depend on the matcher).
>> It would technically be doable for re:, but I'm wary of getting into the
>> business of parsing and special-casing regexes to assume what they match
>> or don't.
>>
>
> Rodrigo and I chatted directly about this a couple of days ago. Here is a
> quick summary of my new understanding of the situation.
>
> Fileset
> -------
>
> Fileset (behind "set:") can give the right result, but it is powered by
> not very modern code, it follow the old revset principle of "get everything
> and then run filters on that everything". That does not fit Rodrigo needs
> at all. It was easy to make 'set:' a bit smarter in the simple case but
> then we get into the issue that the matcher class is using 'set:' in a
> strange, non-lazy, way that does not use all the 'visitdir' feature
> Rodrigo/Google needs.
>
> So in short, fileset needs a rework before being usable in a demanding
> context.
>
>
> Current path restriction capability
> -----------------------------------
>
> The 'Match' class already have logic to restrict the path visited
> (implemented in the 'visitdir' method). To clarify, this logic as no effect
> on the returned match but is only an optimization for the directory we
> visit. It seems to only kicks in when treemanifest is used.
> This logic already works with a couple of patterns type (all pattern use
> the same class). However, that logic currently do not support the case were
> one want to select some subdirectory and skips the rest of the subtrees
> under it.
>

That is correct.

note: Rodrigo, you seems to have a good understanding of the logic. Do you
> think you could document the involved attributes (_includeroots,
> _includedirs, _excluderoots, etc) That would help a lot the next poor souls
> looking at the code.
>

Sure. It took me a while to understand that "roots" means "recursive
directories" and "dirs" means "non-recursive directories" in that code - it
all became much more clear after that. I'll be sure to add comments in my
patch and/or rename the attributes.


>
> Way forward
> -----------
>
> That limitation in the matcher class optimization is the main blocker for
> Rodrigo/Google needs. The optimization is independent of the UI part we
> actually provides to user as all patterns use the same matcher class and
> some existing class could already benefit from this optimization.
>
> Rodrigo seems to have a patch to update the matcher code to track and
> optimize the "subdir-but-not-subtree" case. He has not submitted this patch
> yet. Submitting that patches seems the next step to me. It will get the
> matcher code in a state that can actually be used for the
> narrowhg+treemanifest usecase.
>
> Once that code is in, it seems easy to make sure various patterns use it
> basic, easily recognizable cases. We poked at updating the code to have
> basic regexp matching a subtree recognized as such and that was quite easy.
>
>
> Rodrigo, does that match your current understanding of the situation?
>

It does.
And just to clarify on the patches - I sent an initial patch, then after
comments changes it significantly, so those are two different changes:

   - The first implements a "files:" matcher which matches all files inside
   a directory, non-recursively. This has no wildcards, so special-casing it
   in visitdir and any other places needed results in clean and simple code
   ("if it's files:, don't recurse").
   - The second implements "rootglob:" which allows any number of wildcards
   at point in the path, and is part of Foozy's plan for the new set of
   matchers. This adds some complexity in splitting dirs and roots (mentioned
   above) by having to parse the wildcards, and then the visitdir change looks
   less clean ("if it's a rootglob that has a single /* wildcard at the end,
   then don't recurse" - other cases are possible but start to get more
   complex).

For these reasons, I'd still prefer to get "files:" or similar in, but I'm
open for doing it either way. Please advise on the preferred way and I'll
send an updated patch (2 patches really - one for the matcher, one for the
visitdir optimization which makes it work with narrow).

Thanks
Rodrigo
via Mercurial-devel - Jan. 26, 2017, 4:54 a.m.
On Mon, Jan 23, 2017 at 5:02 PM, Rodrigo Damazio via Mercurial-devel
<mercurial-devel@mercurial-scm.org> wrote:
> Getting back to this after the end-of-year hiatus (yes, I know it happens to
> be during another code freeze :) I seem to have good timing).
>
> On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>>
>>
>>
>> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
>>>
>>>     If I got these two pieces right, it looks like we could just apply
>>>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
>>>     usecase filled while not jumping into UI changes. Would that work
>>>     for you ?
>>>
>>>
>>> Not without a third set of changes, since set expansion doesn't use
>>> visitdir (or the matcher being built) at all - the dependency is that
>>> building the matcher depends on expanding the set (and thus the set
>>> can't depend on the matcher).
>>> It would technically be doable for re:, but I'm wary of getting into the
>>> business of parsing and special-casing regexes to assume what they match
>>> or don't.
>>
>>
>> Rodrigo and I chatted directly about this a couple of days ago. Here is a
>> quick summary of my new understanding of the situation.
>>
>> Fileset
>> -------
>>
>> Fileset (behind "set:") can give the right result, but it is powered by
>> not very modern code, it follow the old revset principle of "get everything
>> and then run filters on that everything". That does not fit Rodrigo needs at
>> all. It was easy to make 'set:' a bit smarter in the simple case but then we
>> get into the issue that the matcher class is using 'set:' in a strange,
>> non-lazy, way that does not use all the 'visitdir' feature Rodrigo/Google
>> needs.
>>
>> So in short, fileset needs a rework before being usable in a demanding
>> context.
>>
>>
>> Current path restriction capability
>> -----------------------------------
>>
>> The 'Match' class already have logic to restrict the path visited
>> (implemented in the 'visitdir' method). To clarify, this logic as no effect
>> on the returned match but is only an optimization for the directory we
>> visit. It seems to only kicks in when treemanifest is used.
>> This logic already works with a couple of patterns type (all pattern use
>> the same class). However, that logic currently do not support the case were
>> one want to select some subdirectory and skips the rest of the subtrees
>> under it.
>
>
> That is correct.
>
>> note: Rodrigo, you seems to have a good understanding of the logic. Do you
>> think you could document the involved attributes (_includeroots,
>> _includedirs, _excluderoots, etc) That would help a lot the next poor souls
>> looking at the code.
>
>
> Sure. It took me a while to understand that "roots" means "recursive
> directories" and "dirs" means "non-recursive directories" in that code - it
> all became much more clear after that. I'll be sure to add comments in my
> patch and/or rename the attributes.
>
>>
>>
>> Way forward
>> -----------
>>
>> That limitation in the matcher class optimization is the main blocker for
>> Rodrigo/Google needs. The optimization is independent of the UI part we
>> actually provides to user as all patterns use the same matcher class and
>> some existing class could already benefit from this optimization.
>>
>> Rodrigo seems to have a patch to update the matcher code to track and
>> optimize the "subdir-but-not-subtree" case. He has not submitted this patch
>> yet. Submitting that patches seems the next step to me. It will get the
>> matcher code in a state that can actually be used for the
>> narrowhg+treemanifest usecase.
>>
>> Once that code is in, it seems easy to make sure various patterns use it
>> basic, easily recognizable cases. We poked at updating the code to have
>> basic regexp matching a subtree recognized as such and that was quite easy.
>>
>>
>> Rodrigo, does that match your current understanding of the situation?
>
>
> It does.
> And just to clarify on the patches - I sent an initial patch, then after
> comments changes it significantly, so those are two different changes:
>
> The first implements a "files:" matcher which matches all files inside a
> directory, non-recursively. This has no wildcards, so special-casing it in
> visitdir and any other places needed results in clean and simple code ("if
> it's files:, don't recurse").
> The second implements "rootglob:" which allows any number of wildcards at
> point in the path, and is part of Foozy's plan for the new set of matchers.
> This adds some complexity in splitting dirs and roots (mentioned above) by
> having to parse the wildcards, and then the visitdir change looks less clean
> ("if it's a rootglob that has a single /* wildcard at the end, then don't
> recurse" - other cases are possible but start to get more complex).
>
> For these reasons, I'd still prefer to get "files:" or similar in, but I'm
> open for doing it either way. Please advise on the preferred way and I'll
> send an updated patch (2 patches really - one for the matcher, one for the
> visitdir optimization which makes it work with narrow).

I'm fine with not doing rootglob:, but if I read foozy's proposal
right, the proposed files: will be what he would call rootfiles:. I
liked his proposal for a systematic naming, and if I got that right, I
think we should call it that from the beginning so we don't end up
with more aliases. I'd also like "rootfiles:foo" to *not* match the
file "foo", but only files in the directory "foo/". I mention that
because last I heard, he was unsure about that himself. Foozy, do you
agree?


>
> Thanks
> Rodrigo
>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Katsunori FUJIWARA - Jan. 26, 2017, 7:19 p.m.
At Wed, 25 Jan 2017 20:54:37 -0800,
Martin von Zweigbergk wrote:
> 
> On Mon, Jan 23, 2017 at 5:02 PM, Rodrigo Damazio via Mercurial-devel
> <mercurial-devel@mercurial-scm.org> wrote:
> > Getting back to this after the end-of-year hiatus (yes, I know it happens to
> > be during another code freeze :) I seem to have good timing).
> >
> > On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David
> > <pierre-yves.david@ens-lyon.org> wrote:
> >>
> >>
> >>
> >> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
> >>>
> >>>     If I got these two pieces right, it looks like we could just apply
> >>>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
> >>>     usecase filled while not jumping into UI changes. Would that work
> >>>     for you ?
> >>>
> >>>
> >>> Not without a third set of changes, since set expansion doesn't use
> >>> visitdir (or the matcher being built) at all - the dependency is that
> >>> building the matcher depends on expanding the set (and thus the set
> >>> can't depend on the matcher).
> >>> It would technically be doable for re:, but I'm wary of getting into the
> >>> business of parsing and special-casing regexes to assume what they match
> >>> or don't.
> >>
> >>
> >> Rodrigo and I chatted directly about this a couple of days ago. Here is a
> >> quick summary of my new understanding of the situation.
> >>
> >> Fileset
> >> -------
> >>
> >> Fileset (behind "set:") can give the right result, but it is powered by
> >> not very modern code, it follow the old revset principle of "get everything
> >> and then run filters on that everything". That does not fit Rodrigo needs at
> >> all. It was easy to make 'set:' a bit smarter in the simple case but then we
> >> get into the issue that the matcher class is using 'set:' in a strange,
> >> non-lazy, way that does not use all the 'visitdir' feature Rodrigo/Google
> >> needs.
> >>
> >> So in short, fileset needs a rework before being usable in a demanding
> >> context.
> >>
> >>
> >> Current path restriction capability
> >> -----------------------------------
> >>
> >> The 'Match' class already have logic to restrict the path visited
> >> (implemented in the 'visitdir' method). To clarify, this logic as no effect
> >> on the returned match but is only an optimization for the directory we
> >> visit. It seems to only kicks in when treemanifest is used.
> >> This logic already works with a couple of patterns type (all pattern use
> >> the same class). However, that logic currently do not support the case were
> >> one want to select some subdirectory and skips the rest of the subtrees
> >> under it.
> >
> >
> > That is correct.
> >
> >> note: Rodrigo, you seems to have a good understanding of the logic. Do you
> >> think you could document the involved attributes (_includeroots,
> >> _includedirs, _excluderoots, etc) That would help a lot the next poor souls
> >> looking at the code.
> >
> >
> > Sure. It took me a while to understand that "roots" means "recursive
> > directories" and "dirs" means "non-recursive directories" in that code - it
> > all became much more clear after that. I'll be sure to add comments in my
> > patch and/or rename the attributes.
> >
> >>
> >>
> >> Way forward
> >> -----------
> >>
> >> That limitation in the matcher class optimization is the main blocker for
> >> Rodrigo/Google needs. The optimization is independent of the UI part we
> >> actually provides to user as all patterns use the same matcher class and
> >> some existing class could already benefit from this optimization.
> >>
> >> Rodrigo seems to have a patch to update the matcher code to track and
> >> optimize the "subdir-but-not-subtree" case. He has not submitted this patch
> >> yet. Submitting that patches seems the next step to me. It will get the
> >> matcher code in a state that can actually be used for the
> >> narrowhg+treemanifest usecase.
> >>
> >> Once that code is in, it seems easy to make sure various patterns use it
> >> basic, easily recognizable cases. We poked at updating the code to have
> >> basic regexp matching a subtree recognized as such and that was quite easy.
> >>
> >>
> >> Rodrigo, does that match your current understanding of the situation?
> >
> >
> > It does.
> > And just to clarify on the patches - I sent an initial patch, then after
> > comments changes it significantly, so those are two different changes:
> >
> > The first implements a "files:" matcher which matches all files inside a
> > directory, non-recursively. This has no wildcards, so special-casing it in
> > visitdir and any other places needed results in clean and simple code ("if
> > it's files:, don't recurse").
> > The second implements "rootglob:" which allows any number of wildcards at
> > point in the path, and is part of Foozy's plan for the new set of matchers.
> > This adds some complexity in splitting dirs and roots (mentioned above) by
> > having to parse the wildcards, and then the visitdir change looks less clean
> > ("if it's a rootglob that has a single /* wildcard at the end, then don't
> > recurse" - other cases are possible but start to get more complex).
> >
> > For these reasons, I'd still prefer to get "files:" or similar in, but I'm
> > open for doing it either way. Please advise on the preferred way and I'll
> > send an updated patch (2 patches really - one for the matcher, one for the
> > visitdir optimization which makes it work with narrow).
> 
> I'm fine with not doing rootglob:, but if I read foozy's proposal
> right, the proposed files: will be what he would call rootfiles:. I
> liked his proposal for a systematic naming, and if I got that right, I
> think we should call it that from the beginning so we don't end up
> with more aliases.

Yeah, we should avoid confusion of naming !

> I'd also like "rootfiles:foo" to *not* match the
> file "foo", but only files in the directory "foo/". I mention that
> because last I heard, he was unsure about that himself. Foozy, do you
> agree?

I don't have strong opinion against mode "XXX:", which matches against
both "just this file" and "files directly under this directory"

Therefore, I agree with adding new mode "XXX:", if it is needed (and
Rodrigo/Google think so).

But, name "files:" doesn't seem to suit for "XXX:" mode (at least, for
me).

Even if it matches against only "files directly under this directory",
"files:" doesn't yet.

Maybe, root cause of my bad feel is that "foo" of "files:foo" should
be the directory in both cases, even though mode name "files:" has
less "(under) this directory" flavor.

If it is possible to combine 2 modes below for solving issues of
Rodrigo/Google, I'm +1 for splitting "XXX:" into them, because naming
"YYY:" should be easier than naming "XXX:".

  - "file:" matching against "just this file"
  - "YYY:" matching against "files directly under this directory"

Are there any better (and short enough) names for "XXX:" or "YYY:"
than "files:" ?

  - "filesin:" (Files-In) for "YYY:" (<=> "files under" as "dir:") ?
  - "fileorin:" (File-Or-(Files-)In) for "XXX:" ?

Of course, I'm also OK with naming "XXX:" or "YYY:" as "files:", to go
forward :-)

> 
> >
> > Thanks
> > Rodrigo
> >
> >
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@mercurial-scm.org
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> >
>
via Mercurial-devel - Jan. 26, 2017, 7:24 p.m.
On Thu, Jan 26, 2017 at 11:19 AM, FUJIWARA Katsunori
<foozy@lares.dti.ne.jp> wrote:
>
> At Wed, 25 Jan 2017 20:54:37 -0800,
> Martin von Zweigbergk wrote:
>>
>> On Mon, Jan 23, 2017 at 5:02 PM, Rodrigo Damazio via Mercurial-devel
>> <mercurial-devel@mercurial-scm.org> wrote:
>> > Getting back to this after the end-of-year hiatus (yes, I know it happens to
>> > be during another code freeze :) I seem to have good timing).
>> >
>> > On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David
>> > <pierre-yves.david@ens-lyon.org> wrote:
>> >>
>> >>
>> >>
>> >> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
>> >>>
>> >>>     If I got these two pieces right, it looks like we could just apply
>> >>>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
>> >>>     usecase filled while not jumping into UI changes. Would that work
>> >>>     for you ?
>> >>>
>> >>>
>> >>> Not without a third set of changes, since set expansion doesn't use
>> >>> visitdir (or the matcher being built) at all - the dependency is that
>> >>> building the matcher depends on expanding the set (and thus the set
>> >>> can't depend on the matcher).
>> >>> It would technically be doable for re:, but I'm wary of getting into the
>> >>> business of parsing and special-casing regexes to assume what they match
>> >>> or don't.
>> >>
>> >>
>> >> Rodrigo and I chatted directly about this a couple of days ago. Here is a
>> >> quick summary of my new understanding of the situation.
>> >>
>> >> Fileset
>> >> -------
>> >>
>> >> Fileset (behind "set:") can give the right result, but it is powered by
>> >> not very modern code, it follow the old revset principle of "get everything
>> >> and then run filters on that everything". That does not fit Rodrigo needs at
>> >> all. It was easy to make 'set:' a bit smarter in the simple case but then we
>> >> get into the issue that the matcher class is using 'set:' in a strange,
>> >> non-lazy, way that does not use all the 'visitdir' feature Rodrigo/Google
>> >> needs.
>> >>
>> >> So in short, fileset needs a rework before being usable in a demanding
>> >> context.
>> >>
>> >>
>> >> Current path restriction capability
>> >> -----------------------------------
>> >>
>> >> The 'Match' class already have logic to restrict the path visited
>> >> (implemented in the 'visitdir' method). To clarify, this logic as no effect
>> >> on the returned match but is only an optimization for the directory we
>> >> visit. It seems to only kicks in when treemanifest is used.
>> >> This logic already works with a couple of patterns type (all pattern use
>> >> the same class). However, that logic currently do not support the case were
>> >> one want to select some subdirectory and skips the rest of the subtrees
>> >> under it.
>> >
>> >
>> > That is correct.
>> >
>> >> note: Rodrigo, you seems to have a good understanding of the logic. Do you
>> >> think you could document the involved attributes (_includeroots,
>> >> _includedirs, _excluderoots, etc) That would help a lot the next poor souls
>> >> looking at the code.
>> >
>> >
>> > Sure. It took me a while to understand that "roots" means "recursive
>> > directories" and "dirs" means "non-recursive directories" in that code - it
>> > all became much more clear after that. I'll be sure to add comments in my
>> > patch and/or rename the attributes.
>> >
>> >>
>> >>
>> >> Way forward
>> >> -----------
>> >>
>> >> That limitation in the matcher class optimization is the main blocker for
>> >> Rodrigo/Google needs. The optimization is independent of the UI part we
>> >> actually provides to user as all patterns use the same matcher class and
>> >> some existing class could already benefit from this optimization.
>> >>
>> >> Rodrigo seems to have a patch to update the matcher code to track and
>> >> optimize the "subdir-but-not-subtree" case. He has not submitted this patch
>> >> yet. Submitting that patches seems the next step to me. It will get the
>> >> matcher code in a state that can actually be used for the
>> >> narrowhg+treemanifest usecase.
>> >>
>> >> Once that code is in, it seems easy to make sure various patterns use it
>> >> basic, easily recognizable cases. We poked at updating the code to have
>> >> basic regexp matching a subtree recognized as such and that was quite easy.
>> >>
>> >>
>> >> Rodrigo, does that match your current understanding of the situation?
>> >
>> >
>> > It does.
>> > And just to clarify on the patches - I sent an initial patch, then after
>> > comments changes it significantly, so those are two different changes:
>> >
>> > The first implements a "files:" matcher which matches all files inside a
>> > directory, non-recursively. This has no wildcards, so special-casing it in
>> > visitdir and any other places needed results in clean and simple code ("if
>> > it's files:, don't recurse").
>> > The second implements "rootglob:" which allows any number of wildcards at
>> > point in the path, and is part of Foozy's plan for the new set of matchers.
>> > This adds some complexity in splitting dirs and roots (mentioned above) by
>> > having to parse the wildcards, and then the visitdir change looks less clean
>> > ("if it's a rootglob that has a single /* wildcard at the end, then don't
>> > recurse" - other cases are possible but start to get more complex).
>> >
>> > For these reasons, I'd still prefer to get "files:" or similar in, but I'm
>> > open for doing it either way. Please advise on the preferred way and I'll
>> > send an updated patch (2 patches really - one for the matcher, one for the
>> > visitdir optimization which makes it work with narrow).
>>
>> I'm fine with not doing rootglob:, but if I read foozy's proposal
>> right, the proposed files: will be what he would call rootfiles:. I
>> liked his proposal for a systematic naming, and if I got that right, I
>> think we should call it that from the beginning so we don't end up
>> with more aliases.
>
> Yeah, we should avoid confusion of naming !
>
>> I'd also like "rootfiles:foo" to *not* match the
>> file "foo", but only files in the directory "foo/". I mention that
>> because last I heard, he was unsure about that himself. Foozy, do you
>> agree?
>
> I don't have strong opinion against mode "XXX:", which matches against
> both "just this file" and "files directly under this directory"
>
> Therefore, I agree with adding new mode "XXX:", if it is needed (and
> Rodrigo/Google think so).
>
> But, name "files:" doesn't seem to suit for "XXX:" mode (at least, for
> me).
>
> Even if it matches against only "files directly under this directory",
> "files:" doesn't yet.
>
> Maybe, root cause of my bad feel is that "foo" of "files:foo" should
> be the directory in both cases, even though mode name "files:" has
> less "(under) this directory" flavor.
>
> If it is possible to combine 2 modes below for solving issues of
> Rodrigo/Google, I'm +1 for splitting "XXX:" into them, because naming
> "YYY:" should be easier than naming "XXX:".
>
>   - "file:" matching against "just this file"
>   - "YYY:" matching against "files directly under this directory"
>
> Are there any better (and short enough) names for "XXX:" or "YYY:"
> than "files:" ?
>
>   - "filesin:" (Files-In) for "YYY:" (<=> "files under" as "dir:") ?
>   - "fileorin:" (File-Or-(Files-)In) for "XXX:" ?
>
> Of course, I'm also OK with naming "XXX:" or "YYY:" as "files:", to go
> forward :-)

I like "file:" and "filesin:" for those two cases. But we should add
the "root" prefix so we don't have to do that later, right?

>
>>
>> >
>> > Thanks
>> > Rodrigo
>> >
>> >
>> > _______________________________________________
>> > Mercurial-devel mailing list
>> > Mercurial-devel@mercurial-scm.org
>> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>> >
>>
>
> --
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
via Mercurial-devel - Jan. 27, 2017, 1:27 a.m.
All sounds very reasonable, and "filesin:" or "rootfilesin:" LGTM.


On Thu, Jan 26, 2017 at 11:24 AM, Martin von Zweigbergk <
martinvonz@google.com> wrote:

> On Thu, Jan 26, 2017 at 11:19 AM, FUJIWARA Katsunori
> <foozy@lares.dti.ne.jp> wrote:
> >
> > At Wed, 25 Jan 2017 20:54:37 -0800,
> > Martin von Zweigbergk wrote:
> >>
> >> On Mon, Jan 23, 2017 at 5:02 PM, Rodrigo Damazio via Mercurial-devel
> >> <mercurial-devel@mercurial-scm.org> wrote:
> >> > Getting back to this after the end-of-year hiatus (yes, I know it
> happens to
> >> > be during another code freeze :) I seem to have good timing).
> >> >
> >> > On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David
> >> > <pierre-yves.david@ens-lyon.org> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
> >> >>>
> >> >>>     If I got these two pieces right, it looks like we could just
> apply
> >> >>>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
> >> >>>     usecase filled while not jumping into UI changes. Would that
> work
> >> >>>     for you ?
> >> >>>
> >> >>>
> >> >>> Not without a third set of changes, since set expansion doesn't use
> >> >>> visitdir (or the matcher being built) at all - the dependency is
> that
> >> >>> building the matcher depends on expanding the set (and thus the set
> >> >>> can't depend on the matcher).
> >> >>> It would technically be doable for re:, but I'm wary of getting
> into the
> >> >>> business of parsing and special-casing regexes to assume what they
> match
> >> >>> or don't.
> >> >>
> >> >>
> >> >> Rodrigo and I chatted directly about this a couple of days ago. Here
> is a
> >> >> quick summary of my new understanding of the situation.
> >> >>
> >> >> Fileset
> >> >> -------
> >> >>
> >> >> Fileset (behind "set:") can give the right result, but it is powered
> by
> >> >> not very modern code, it follow the old revset principle of "get
> everything
> >> >> and then run filters on that everything". That does not fit Rodrigo
> needs at
> >> >> all. It was easy to make 'set:' a bit smarter in the simple case but
> then we
> >> >> get into the issue that the matcher class is using 'set:' in a
> strange,
> >> >> non-lazy, way that does not use all the 'visitdir' feature
> Rodrigo/Google
> >> >> needs.
> >> >>
> >> >> So in short, fileset needs a rework before being usable in a
> demanding
> >> >> context.
> >> >>
> >> >>
> >> >> Current path restriction capability
> >> >> -----------------------------------
> >> >>
> >> >> The 'Match' class already have logic to restrict the path visited
> >> >> (implemented in the 'visitdir' method). To clarify, this logic as no
> effect
> >> >> on the returned match but is only an optimization for the directory
> we
> >> >> visit. It seems to only kicks in when treemanifest is used.
> >> >> This logic already works with a couple of patterns type (all pattern
> use
> >> >> the same class). However, that logic currently do not support the
> case were
> >> >> one want to select some subdirectory and skips the rest of the
> subtrees
> >> >> under it.
> >> >
> >> >
> >> > That is correct.
> >> >
> >> >> note: Rodrigo, you seems to have a good understanding of the logic.
> Do you
> >> >> think you could document the involved attributes (_includeroots,
> >> >> _includedirs, _excluderoots, etc) That would help a lot the next
> poor souls
> >> >> looking at the code.
> >> >
> >> >
> >> > Sure. It took me a while to understand that "roots" means "recursive
> >> > directories" and "dirs" means "non-recursive directories" in that
> code - it
> >> > all became much more clear after that. I'll be sure to add comments
> in my
> >> > patch and/or rename the attributes.
> >> >
> >> >>
> >> >>
> >> >> Way forward
> >> >> -----------
> >> >>
> >> >> That limitation in the matcher class optimization is the main
> blocker for
> >> >> Rodrigo/Google needs. The optimization is independent of the UI part
> we
> >> >> actually provides to user as all patterns use the same matcher class
> and
> >> >> some existing class could already benefit from this optimization.
> >> >>
> >> >> Rodrigo seems to have a patch to update the matcher code to track and
> >> >> optimize the "subdir-but-not-subtree" case. He has not submitted
> this patch
> >> >> yet. Submitting that patches seems the next step to me. It will get
> the
> >> >> matcher code in a state that can actually be used for the
> >> >> narrowhg+treemanifest usecase.
> >> >>
> >> >> Once that code is in, it seems easy to make sure various patterns
> use it
> >> >> basic, easily recognizable cases. We poked at updating the code to
> have
> >> >> basic regexp matching a subtree recognized as such and that was
> quite easy.
> >> >>
> >> >>
> >> >> Rodrigo, does that match your current understanding of the situation?
> >> >
> >> >
> >> > It does.
> >> > And just to clarify on the patches - I sent an initial patch, then
> after
> >> > comments changes it significantly, so those are two different changes:
> >> >
> >> > The first implements a "files:" matcher which matches all files
> inside a
> >> > directory, non-recursively. This has no wildcards, so special-casing
> it in
> >> > visitdir and any other places needed results in clean and simple code
> ("if
> >> > it's files:, don't recurse").
> >> > The second implements "rootglob:" which allows any number of
> wildcards at
> >> > point in the path, and is part of Foozy's plan for the new set of
> matchers.
> >> > This adds some complexity in splitting dirs and roots (mentioned
> above) by
> >> > having to parse the wildcards, and then the visitdir change looks
> less clean
> >> > ("if it's a rootglob that has a single /* wildcard at the end, then
> don't
> >> > recurse" - other cases are possible but start to get more complex).
> >> >
> >> > For these reasons, I'd still prefer to get "files:" or similar in,
> but I'm
> >> > open for doing it either way. Please advise on the preferred way and
> I'll
> >> > send an updated patch (2 patches really - one for the matcher, one
> for the
> >> > visitdir optimization which makes it work with narrow).
> >>
> >> I'm fine with not doing rootglob:, but if I read foozy's proposal
> >> right, the proposed files: will be what he would call rootfiles:. I
> >> liked his proposal for a systematic naming, and if I got that right, I
> >> think we should call it that from the beginning so we don't end up
> >> with more aliases.
> >
> > Yeah, we should avoid confusion of naming !
> >
> >> I'd also like "rootfiles:foo" to *not* match the
> >> file "foo", but only files in the directory "foo/". I mention that
> >> because last I heard, he was unsure about that himself. Foozy, do you
> >> agree?
> >
> > I don't have strong opinion against mode "XXX:", which matches against
> > both "just this file" and "files directly under this directory"
> >
> > Therefore, I agree with adding new mode "XXX:", if it is needed (and
> > Rodrigo/Google think so).
> >
> > But, name "files:" doesn't seem to suit for "XXX:" mode (at least, for
> > me).
> >
> > Even if it matches against only "files directly under this directory",
> > "files:" doesn't yet.
> >
> > Maybe, root cause of my bad feel is that "foo" of "files:foo" should
> > be the directory in both cases, even though mode name "files:" has
> > less "(under) this directory" flavor.
> >
> > If it is possible to combine 2 modes below for solving issues of
> > Rodrigo/Google, I'm +1 for splitting "XXX:" into them, because naming
> > "YYY:" should be easier than naming "XXX:".
> >
> >   - "file:" matching against "just this file"
> >   - "YYY:" matching against "files directly under this directory"
> >
> > Are there any better (and short enough) names for "XXX:" or "YYY:"
> > than "files:" ?
> >
> >   - "filesin:" (Files-In) for "YYY:" (<=> "files under" as "dir:") ?
> >   - "fileorin:" (File-Or-(Files-)In) for "XXX:" ?
> >
> > Of course, I'm also OK with naming "XXX:" or "YYY:" as "files:", to go
> > forward :-)
>
> I like "file:" and "filesin:" for those two cases. But we should add
> the "root" prefix so we don't have to do that later, right?
>
> >
> >>
> >> >
> >> > Thanks
> >> > Rodrigo
> >> >
> >> >
> >> > _______________________________________________
> >> > Mercurial-devel mailing list
> >> > Mercurial-devel@mercurial-scm.org
> >> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> >> >
> >>
> >
> > --
> > ----------------------------------------------------------------------
> > [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
>
Katsunori FUJIWARA - Jan. 27, 2017, 8:49 a.m.
At Thu, 26 Jan 2017 11:24:20 -0800,
Martin von Zweigbergk wrote:
> 
> On Thu, Jan 26, 2017 at 11:19 AM, FUJIWARA Katsunori
> <foozy@lares.dti.ne.jp> wrote:
> >
> > At Wed, 25 Jan 2017 20:54:37 -0800,
> > Martin von Zweigbergk wrote:
> >>
> >> On Mon, Jan 23, 2017 at 5:02 PM, Rodrigo Damazio via Mercurial-devel
> >> <mercurial-devel@mercurial-scm.org> wrote:
> >> > Getting back to this after the end-of-year hiatus (yes, I know it happens to
> >> > be during another code freeze :) I seem to have good timing).
> >> >
> >> > On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David
> >> > <pierre-yves.david@ens-lyon.org> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
> >> >>>
> >> >>>     If I got these two pieces right, it looks like we could just apply
> >> >>>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
> >> >>>     usecase filled while not jumping into UI changes. Would that work
> >> >>>     for you ?
> >> >>>
> >> >>>
> >> >>> Not without a third set of changes, since set expansion doesn't use
> >> >>> visitdir (or the matcher being built) at all - the dependency is that
> >> >>> building the matcher depends on expanding the set (and thus the set
> >> >>> can't depend on the matcher).
> >> >>> It would technically be doable for re:, but I'm wary of getting into the
> >> >>> business of parsing and special-casing regexes to assume what they match
> >> >>> or don't.
> >> >>
> >> >>
> >> >> Rodrigo and I chatted directly about this a couple of days ago. Here is a
> >> >> quick summary of my new understanding of the situation.
> >> >>
> >> >> Fileset
> >> >> -------
> >> >>
> >> >> Fileset (behind "set:") can give the right result, but it is powered by
> >> >> not very modern code, it follow the old revset principle of "get everything
> >> >> and then run filters on that everything". That does not fit Rodrigo needs at
> >> >> all. It was easy to make 'set:' a bit smarter in the simple case but then we
> >> >> get into the issue that the matcher class is using 'set:' in a strange,
> >> >> non-lazy, way that does not use all the 'visitdir' feature Rodrigo/Google
> >> >> needs.
> >> >>
> >> >> So in short, fileset needs a rework before being usable in a demanding
> >> >> context.
> >> >>
> >> >>
> >> >> Current path restriction capability
> >> >> -----------------------------------
> >> >>
> >> >> The 'Match' class already have logic to restrict the path visited
> >> >> (implemented in the 'visitdir' method). To clarify, this logic as no effect
> >> >> on the returned match but is only an optimization for the directory we
> >> >> visit. It seems to only kicks in when treemanifest is used.
> >> >> This logic already works with a couple of patterns type (all pattern use
> >> >> the same class). However, that logic currently do not support the case were
> >> >> one want to select some subdirectory and skips the rest of the subtrees
> >> >> under it.
> >> >
> >> >
> >> > That is correct.
> >> >
> >> >> note: Rodrigo, you seems to have a good understanding of the logic. Do you
> >> >> think you could document the involved attributes (_includeroots,
> >> >> _includedirs, _excluderoots, etc) That would help a lot the next poor souls
> >> >> looking at the code.
> >> >
> >> >
> >> > Sure. It took me a while to understand that "roots" means "recursive
> >> > directories" and "dirs" means "non-recursive directories" in that code - it
> >> > all became much more clear after that. I'll be sure to add comments in my
> >> > patch and/or rename the attributes.
> >> >
> >> >>
> >> >>
> >> >> Way forward
> >> >> -----------
> >> >>
> >> >> That limitation in the matcher class optimization is the main blocker for
> >> >> Rodrigo/Google needs. The optimization is independent of the UI part we
> >> >> actually provides to user as all patterns use the same matcher class and
> >> >> some existing class could already benefit from this optimization.
> >> >>
> >> >> Rodrigo seems to have a patch to update the matcher code to track and
> >> >> optimize the "subdir-but-not-subtree" case. He has not submitted this patch
> >> >> yet. Submitting that patches seems the next step to me. It will get the
> >> >> matcher code in a state that can actually be used for the
> >> >> narrowhg+treemanifest usecase.
> >> >>
> >> >> Once that code is in, it seems easy to make sure various patterns use it
> >> >> basic, easily recognizable cases. We poked at updating the code to have
> >> >> basic regexp matching a subtree recognized as such and that was quite easy.
> >> >>
> >> >>
> >> >> Rodrigo, does that match your current understanding of the situation?
> >> >
> >> >
> >> > It does.
> >> > And just to clarify on the patches - I sent an initial patch, then after
> >> > comments changes it significantly, so those are two different changes:
> >> >
> >> > The first implements a "files:" matcher which matches all files inside a
> >> > directory, non-recursively. This has no wildcards, so special-casing it in
> >> > visitdir and any other places needed results in clean and simple code ("if
> >> > it's files:, don't recurse").
> >> > The second implements "rootglob:" which allows any number of wildcards at
> >> > point in the path, and is part of Foozy's plan for the new set of matchers.
> >> > This adds some complexity in splitting dirs and roots (mentioned above) by
> >> > having to parse the wildcards, and then the visitdir change looks less clean
> >> > ("if it's a rootglob that has a single /* wildcard at the end, then don't
> >> > recurse" - other cases are possible but start to get more complex).
> >> >
> >> > For these reasons, I'd still prefer to get "files:" or similar in, but I'm
> >> > open for doing it either way. Please advise on the preferred way and I'll
> >> > send an updated patch (2 patches really - one for the matcher, one for the
> >> > visitdir optimization which makes it work with narrow).
> >>
> >> I'm fine with not doing rootglob:, but if I read foozy's proposal
> >> right, the proposed files: will be what he would call rootfiles:. I
> >> liked his proposal for a systematic naming, and if I got that right, I
> >> think we should call it that from the beginning so we don't end up
> >> with more aliases.
> >
> > Yeah, we should avoid confusion of naming !
> >
> >> I'd also like "rootfiles:foo" to *not* match the
> >> file "foo", but only files in the directory "foo/". I mention that
> >> because last I heard, he was unsure about that himself. Foozy, do you
> >> agree?
> >
> > I don't have strong opinion against mode "XXX:", which matches against
> > both "just this file" and "files directly under this directory"
> >
> > Therefore, I agree with adding new mode "XXX:", if it is needed (and
> > Rodrigo/Google think so).
> >
> > But, name "files:" doesn't seem to suit for "XXX:" mode (at least, for
> > me).
> >
> > Even if it matches against only "files directly under this directory",
> > "files:" doesn't yet.
> >
> > Maybe, root cause of my bad feel is that "foo" of "files:foo" should
> > be the directory in both cases, even though mode name "files:" has
> > less "(under) this directory" flavor.
> >
> > If it is possible to combine 2 modes below for solving issues of
> > Rodrigo/Google, I'm +1 for splitting "XXX:" into them, because naming
> > "YYY:" should be easier than naming "XXX:".
> >
> >   - "file:" matching against "just this file"
> >   - "YYY:" matching against "files directly under this directory"
> >
> > Are there any better (and short enough) names for "XXX:" or "YYY:"
> > than "files:" ?
> >
> >   - "filesin:" (Files-In) for "YYY:" (<=> "files under" as "dir:") ?
> >   - "fileorin:" (File-Or-(Files-)In) for "XXX:" ?
> >
> > Of course, I'm also OK with naming "XXX:" or "YYY:" as "files:", to go
> > forward :-)
> 
> I like "file:" and "filesin:" for those two cases. But we should add
> the "root" prefix so we don't have to do that later, right?

Oh, sorry. I intentionally omitted relativity-to prefix (e.g. "root")
to focus on discussion about "mode".

Yes, as you mentioned, we should compose actual pattern syntax name
with relative-to prefix and mode name, IMHO.

> >
> >>
> >> >
> >> > Thanks
> >> > Rodrigo
> >> >
> >> >
> >> > _______________________________________________
> >> > Mercurial-devel mailing list
> >> > Mercurial-devel@mercurial-scm.org
> >> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> >> >
> >>
> >
> > --
> > ----------------------------------------------------------------------
> > [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
>
Katsunori FUJIWARA - Jan. 27, 2017, 9:03 a.m.
At Thu, 26 Jan 2017 17:27:17 -0800,
Rodrigo Damazio wrote:
> 
> [1  <multipart/alternative (7bit)>]
> [1.1  <text/plain; UTF-8 (7bit)>]
> All sounds very reasonable, and "filesin:" or "rootfilesin:" LGTM.

Is it OK for your solution that "rootfilesin:FOO" doesn't match
against "file FOO", even though your patch posted in this thread made
"files:FOO" do so ? or, is combining "rootfile:" and "rootfilesin"
acceptable for your solution ?


> On Thu, Jan 26, 2017 at 11:24 AM, Martin von Zweigbergk <
> martinvonz@google.com> wrote:
> 
> > On Thu, Jan 26, 2017 at 11:19 AM, FUJIWARA Katsunori
> > <foozy@lares.dti.ne.jp> wrote:
> > >
> > > At Wed, 25 Jan 2017 20:54:37 -0800,
> > > Martin von Zweigbergk wrote:
> > >>
> > >> On Mon, Jan 23, 2017 at 5:02 PM, Rodrigo Damazio via Mercurial-devel
> > >> <mercurial-devel@mercurial-scm.org> wrote:
> > >> > Getting back to this after the end-of-year hiatus (yes, I know it
> > happens to
> > >> > be during another code freeze :) I seem to have good timing).
> > >> >
> > >> > On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David
> > >> > <pierre-yves.david@ens-lyon.org> wrote:
> > >> >>
> > >> >>
> > >> >>
> > >> >> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
> > >> >>>
> > >> >>>     If I got these two pieces right, it looks like we could just
> > apply
> > >> >>>     the improvement to 'visitdir' to 'set:your/glob/*' and have your
> > >> >>>     usecase filled while not jumping into UI changes. Would that
> > work
> > >> >>>     for you ?
> > >> >>>
> > >> >>>
> > >> >>> Not without a third set of changes, since set expansion doesn't use
> > >> >>> visitdir (or the matcher being built) at all - the dependency is
> > that
> > >> >>> building the matcher depends on expanding the set (and thus the set
> > >> >>> can't depend on the matcher).
> > >> >>> It would technically be doable for re:, but I'm wary of getting
> > into the
> > >> >>> business of parsing and special-casing regexes to assume what they
> > match
> > >> >>> or don't.
> > >> >>
> > >> >>
> > >> >> Rodrigo and I chatted directly about this a couple of days ago. Here
> > is a
> > >> >> quick summary of my new understanding of the situation.
> > >> >>
> > >> >> Fileset
> > >> >> -------
> > >> >>
> > >> >> Fileset (behind "set:") can give the right result, but it is powered
> > by
> > >> >> not very modern code, it follow the old revset principle of "get
> > everything
> > >> >> and then run filters on that everything". That does not fit Rodrigo
> > needs at
> > >> >> all. It was easy to make 'set:' a bit smarter in the simple case but
> > then we
> > >> >> get into the issue that the matcher class is using 'set:' in a
> > strange,
> > >> >> non-lazy, way that does not use all the 'visitdir' feature
> > Rodrigo/Google
> > >> >> needs.
> > >> >>
> > >> >> So in short, fileset needs a rework before being usable in a
> > demanding
> > >> >> context.
> > >> >>
> > >> >>
> > >> >> Current path restriction capability
> > >> >> -----------------------------------
> > >> >>
> > >> >> The 'Match' class already have logic to restrict the path visited
> > >> >> (implemented in the 'visitdir' method). To clarify, this logic as no
> > effect
> > >> >> on the returned match but is only an optimization for the directory
> > we
> > >> >> visit. It seems to only kicks in when treemanifest is used.
> > >> >> This logic already works with a couple of patterns type (all pattern
> > use
> > >> >> the same class). However, that logic currently do not support the
> > case were
> > >> >> one want to select some subdirectory and skips the rest of the
> > subtrees
> > >> >> under it.
> > >> >
> > >> >
> > >> > That is correct.
> > >> >
> > >> >> note: Rodrigo, you seems to have a good understanding of the logic.
> > Do you
> > >> >> think you could document the involved attributes (_includeroots,
> > >> >> _includedirs, _excluderoots, etc) That would help a lot the next
> > poor souls
> > >> >> looking at the code.
> > >> >
> > >> >
> > >> > Sure. It took me a while to understand that "roots" means "recursive
> > >> > directories" and "dirs" means "non-recursive directories" in that
> > code - it
> > >> > all became much more clear after that. I'll be sure to add comments
> > in my
> > >> > patch and/or rename the attributes.
> > >> >
> > >> >>
> > >> >>
> > >> >> Way forward
> > >> >> -----------
> > >> >>
> > >> >> That limitation in the matcher class optimization is the main
> > blocker for
> > >> >> Rodrigo/Google needs. The optimization is independent of the UI part
> > we
> > >> >> actually provides to user as all patterns use the same matcher class
> > and
> > >> >> some existing class could already benefit from this optimization.
> > >> >>
> > >> >> Rodrigo seems to have a patch to update the matcher code to track and
> > >> >> optimize the "subdir-but-not-subtree" case. He has not submitted
> > this patch
> > >> >> yet. Submitting that patches seems the next step to me. It will get
> > the
> > >> >> matcher code in a state that can actually be used for the
> > >> >> narrowhg+treemanifest usecase.
> > >> >>
> > >> >> Once that code is in, it seems easy to make sure various patterns
> > use it
> > >> >> basic, easily recognizable cases. We poked at updating the code to
> > have
> > >> >> basic regexp matching a subtree recognized as such and that was
> > quite easy.
> > >> >>
> > >> >>
> > >> >> Rodrigo, does that match your current understanding of the situation?
> > >> >
> > >> >
> > >> > It does.
> > >> > And just to clarify on the patches - I sent an initial patch, then
> > after
> > >> > comments changes it significantly, so those are two different changes:
> > >> >
> > >> > The first implements a "files:" matcher which matches all files
> > inside a
> > >> > directory, non-recursively. This has no wildcards, so special-casing
> > it in
> > >> > visitdir and any other places needed results in clean and simple code
> > ("if
> > >> > it's files:, don't recurse").
> > >> > The second implements "rootglob:" which allows any number of
> > wildcards at
> > >> > point in the path, and is part of Foozy's plan for the new set of
> > matchers.
> > >> > This adds some complexity in splitting dirs and roots (mentioned
> > above) by
> > >> > having to parse the wildcards, and then the visitdir change looks
> > less clean
> > >> > ("if it's a rootglob that has a single /* wildcard at the end, then
> > don't
> > >> > recurse" - other cases are possible but start to get more complex).
> > >> >
> > >> > For these reasons, I'd still prefer to get "files:" or similar in,
> > but I'm
> > >> > open for doing it either way. Please advise on the preferred way and
> > I'll
> > >> > send an updated patch (2 patches really - one for the matcher, one
> > for the
> > >> > visitdir optimization which makes it work with narrow).
> > >>
> > >> I'm fine with not doing rootglob:, but if I read foozy's proposal
> > >> right, the proposed files: will be what he would call rootfiles:. I
> > >> liked his proposal for a systematic naming, and if I got that right, I
> > >> think we should call it that from the beginning so we don't end up
> > >> with more aliases.
> > >
> > > Yeah, we should avoid confusion of naming !
> > >
> > >> I'd also like "rootfiles:foo" to *not* match the
> > >> file "foo", but only files in the directory "foo/". I mention that
> > >> because last I heard, he was unsure about that himself. Foozy, do you
> > >> agree?
> > >
> > > I don't have strong opinion against mode "XXX:", which matches against
> > > both "just this file" and "files directly under this directory"
> > >
> > > Therefore, I agree with adding new mode "XXX:", if it is needed (and
> > > Rodrigo/Google think so).
> > >
> > > But, name "files:" doesn't seem to suit for "XXX:" mode (at least, for
> > > me).
> > >
> > > Even if it matches against only "files directly under this directory",
> > > "files:" doesn't yet.
> > >
> > > Maybe, root cause of my bad feel is that "foo" of "files:foo" should
> > > be the directory in both cases, even though mode name "files:" has
> > > less "(under) this directory" flavor.
> > >
> > > If it is possible to combine 2 modes below for solving issues of
> > > Rodrigo/Google, I'm +1 for splitting "XXX:" into them, because naming
> > > "YYY:" should be easier than naming "XXX:".
> > >
> > >   - "file:" matching against "just this file"
> > >   - "YYY:" matching against "files directly under this directory"
> > >
> > > Are there any better (and short enough) names for "XXX:" or "YYY:"
> > > than "files:" ?
> > >
> > >   - "filesin:" (Files-In) for "YYY:" (<=> "files under" as "dir:") ?
> > >   - "fileorin:" (File-Or-(Files-)In) for "XXX:" ?
> > >
> > > Of course, I'm also OK with naming "XXX:" or "YYY:" as "files:", to go
> > > forward :-)
> >
> > I like "file:" and "filesin:" for those two cases. But we should add
> > the "root" prefix so we don't have to do that later, right?
> >
> > >
> > >>
> > >> >
> > >> > Thanks
> > >> > Rodrigo
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > Mercurial-devel mailing list
> > >> > Mercurial-devel@mercurial-scm.org
> > >> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> > >> >
> > >>
> > >
> > > --
> > > ----------------------------------------------------------------------
> > > [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
> >
> [1.2  <text/html; UTF-8 (quoted-printable)>]
> 
> [2 S/MIME Cryptographic Signature <application/pkcs7-signature (base64)>]
>
via Mercurial-devel - Jan. 27, 2017, 11:14 p.m.
On Fri, Jan 27, 2017 at 1:03 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
wrote:

>
> At Thu, 26 Jan 2017 17:27:17 -0800,
> Rodrigo Damazio wrote:
> >
> > [1  <multipart/alternative (7bit)>]
> > [1.1  <text/plain; UTF-8 (7bit)>]
> > All sounds very reasonable, and "filesin:" or "rootfilesin:" LGTM.
>
> Is it OK for your solution that "rootfilesin:FOO" doesn't match
> against "file FOO", even though your patch posted in this thread made
> "files:FOO" do so ? or, is combining "rootfile:" and "rootfilesin"
> acceptable for your solution ?
>

Yes, not matching files is fine, and actually the easiest to implement (the
regex is simpler and our custom server doesn't support files anyway).
For that, rootfilesin:foo/bar can produce regex ^foo/bar/[^/]+$ or similar
which would not match a file called bar. visitdir would have to be updated
accordingly, of course, but that shouldn't be too hard (and i can take the
opportunity to add some comments to the code).

If that looks good to you, let me know and I'll send an updated patch.

> On Thu, Jan 26, 2017 at 11:24 AM, Martin von Zweigbergk <
> > martinvonz@google.com> wrote:
> >
> > > On Thu, Jan 26, 2017 at 11:19 AM, FUJIWARA Katsunori
> > > <foozy@lares.dti.ne.jp> wrote:
> > > >
> > > > At Wed, 25 Jan 2017 20:54:37 -0800,
> > > > Martin von Zweigbergk wrote:
> > > >>
> > > >> On Mon, Jan 23, 2017 at 5:02 PM, Rodrigo Damazio via Mercurial-devel
> > > >> <mercurial-devel@mercurial-scm.org> wrote:
> > > >> > Getting back to this after the end-of-year hiatus (yes, I know it
> > > happens to
> > > >> > be during another code freeze :) I seem to have good timing).
> > > >> >
> > > >> > On Tue, Dec 27, 2016 at 2:14 AM, Pierre-Yves David
> > > >> > <pierre-yves.david@ens-lyon.org> wrote:
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On 12/21/2016 04:21 AM, Rodrigo Damazio wrote:
> > > >> >>>
> > > >> >>>     If I got these two pieces right, it looks like we could just
> > > apply
> > > >> >>>     the improvement to 'visitdir' to 'set:your/glob/*' and have
> your
> > > >> >>>     usecase filled while not jumping into UI changes. Would that
> > > work
> > > >> >>>     for you ?
> > > >> >>>
> > > >> >>>
> > > >> >>> Not without a third set of changes, since set expansion doesn't
> use
> > > >> >>> visitdir (or the matcher being built) at all - the dependency is
> > > that
> > > >> >>> building the matcher depends on expanding the set (and thus the
> set
> > > >> >>> can't depend on the matcher).
> > > >> >>> It would technically be doable for re:, but I'm wary of getting
> > > into the
> > > >> >>> business of parsing and special-casing regexes to assume what
> they
> > > match
> > > >> >>> or don't.
> > > >> >>
> > > >> >>
> > > >> >> Rodrigo and I chatted directly about this a couple of days ago.
> Here
> > > is a
> > > >> >> quick summary of my new understanding of the situation.
> > > >> >>
> > > >> >> Fileset
> > > >> >> -------
> > > >> >>
> > > >> >> Fileset (behind "set:") can give the right result, but it is
> powered
> > > by
> > > >> >> not very modern code, it follow the old revset principle of "get
> > > everything
> > > >> >> and then run filters on that everything". That does not fit
> Rodrigo
> > > needs at
> > > >> >> all. It was easy to make 'set:' a bit smarter in the simple case
> but
> > > then we
> > > >> >> get into the issue that the matcher class is using 'set:' in a
> > > strange,
> > > >> >> non-lazy, way that does not use all the 'visitdir' feature
> > > Rodrigo/Google
> > > >> >> needs.
> > > >> >>
> > > >> >> So in short, fileset needs a rework before being usable in a
> > > demanding
> > > >> >> context.
> > > >> >>
> > > >> >>
> > > >> >> Current path restriction capability
> > > >> >> -----------------------------------
> > > >> >>
> > > >> >> The 'Match' class already have logic to restrict the path visited
> > > >> >> (implemented in the 'visitdir' method). To clarify, this logic
> as no
> > > effect
> > > >> >> on the returned match but is only an optimization for the
> directory
> > > we
> > > >> >> visit. It seems to only kicks in when treemanifest is used.
> > > >> >> This logic already works with a couple of patterns type (all
> pattern
> > > use
> > > >> >> the same class). However, that logic currently do not support the
> > > case were
> > > >> >> one want to select some subdirectory and skips the rest of the
> > > subtrees
> > > >> >> under it.
> > > >> >
> > > >> >
> > > >> > That is correct.
> > > >> >
> > > >> >> note: Rodrigo, you seems to have a good understanding of the
> logic.
> > > Do you
> > > >> >> think you could document the involved attributes (_includeroots,
> > > >> >> _includedirs, _excluderoots, etc) That would help a lot the next
> > > poor souls
> > > >> >> looking at the code.
> > > >> >
> > > >> >
> > > >> > Sure. It took me a while to understand that "roots" means
> "recursive
> > > >> > directories" and "dirs" means "non-recursive directories" in that
> > > code - it
> > > >> > all became much more clear after that. I'll be sure to add
> comments
> > > in my
> > > >> > patch and/or rename the attributes.
> > > >> >
> > > >> >>
> > > >> >>
> > > >> >> Way forward
> > > >> >> -----------
> > > >> >>
> > > >> >> That limitation in the matcher class optimization is the main
> > > blocker for
> > > >> >> Rodrigo/Google needs. The optimization is independent of the UI
> part
> > > we
> > > >> >> actually provides to user as all patterns use the same matcher
> class
> > > and
> > > >> >> some existing class could already benefit from this optimization.
> > > >> >>
> > > >> >> Rodrigo seems to have a patch to update the matcher code to
> track and
> > > >> >> optimize the "subdir-but-not-subtree" case. He has not submitted
> > > this patch
> > > >> >> yet. Submitting that patches seems the next step to me. It will
> get
> > > the
> > > >> >> matcher code in a state that can actually be used for the
> > > >> >> narrowhg+treemanifest usecase.
> > > >> >>
> > > >> >> Once that code is in, it seems easy to make sure various patterns
> > > use it
> > > >> >> basic, easily recognizable cases. We poked at updating the code
> to
> > > have
> > > >> >> basic regexp matching a subtree recognized as such and that was
> > > quite easy.
> > > >> >>
> > > >> >>
> > > >> >> Rodrigo, does that match your current understanding of the
> situation?
> > > >> >
> > > >> >
> > > >> > It does.
> > > >> > And just to clarify on the patches - I sent an initial patch, then
> > > after
> > > >> > comments changes it significantly, so those are two different
> changes:
> > > >> >
> > > >> > The first implements a "files:" matcher which matches all files
> > > inside a
> > > >> > directory, non-recursively. This has no wildcards, so
> special-casing
> > > it in
> > > >> > visitdir and any other places needed results in clean and simple
> code
> > > ("if
> > > >> > it's files:, don't recurse").
> > > >> > The second implements "rootglob:" which allows any number of
> > > wildcards at
> > > >> > point in the path, and is part of Foozy's plan for the new set of
> > > matchers.
> > > >> > This adds some complexity in splitting dirs and roots (mentioned
> > > above) by
> > > >> > having to parse the wildcards, and then the visitdir change looks
> > > less clean
> > > >> > ("if it's a rootglob that has a single /* wildcard at the end,
> then
> > > don't
> > > >> > recurse" - other cases are possible but start to get more
> complex).
> > > >> >
> > > >> > For these reasons, I'd still prefer to get "files:" or similar in,
> > > but I'm
> > > >> > open for doing it either way. Please advise on the preferred way
> and
> > > I'll
> > > >> > send an updated patch (2 patches really - one for the matcher, one
> > > for the
> > > >> > visitdir optimization which makes it work with narrow).
> > > >>
> > > >> I'm fine with not doing rootglob:, but if I read foozy's proposal
> > > >> right, the proposed files: will be what he would call rootfiles:. I
> > > >> liked his proposal for a systematic naming, and if I got that
> right, I
> > > >> think we should call it that from the beginning so we don't end up
> > > >> with more aliases.
> > > >
> > > > Yeah, we should avoid confusion of naming !
> > > >
> > > >> I'd also like "rootfiles:foo" to *not* match the
> > > >> file "foo", but only files in the directory "foo/". I mention that
> > > >> because last I heard, he was unsure about that himself. Foozy, do
> you
> > > >> agree?
> > > >
> > > > I don't have strong opinion against mode "XXX:", which matches
> against
> > > > both "just this file" and "files directly under this directory"
> > > >
> > > > Therefore, I agree with adding new mode "XXX:", if it is needed (and
> > > > Rodrigo/Google think so).
> > > >
> > > > But, name "files:" doesn't seem to suit for "XXX:" mode (at least,
> for
> > > > me).
> > > >
> > > > Even if it matches against only "files directly under this
> directory",
> > > > "files:" doesn't yet.
> > > >
> > > > Maybe, root cause of my bad feel is that "foo" of "files:foo" should
> > > > be the directory in both cases, even though mode name "files:" has
> > > > less "(under) this directory" flavor.
> > > >
> > > > If it is possible to combine 2 modes below for solving issues of
> > > > Rodrigo/Google, I'm +1 for splitting "XXX:" into them, because naming
> > > > "YYY:" should be easier than naming "XXX:".
> > > >
> > > >   - "file:" matching against "just this file"
> > > >   - "YYY:" matching against "files directly under this directory"
> > > >
> > > > Are there any better (and short enough) names for "XXX:" or "YYY:"
> > > > than "files:" ?
> > > >
> > > >   - "filesin:" (Files-In) for "YYY:" (<=> "files under" as "dir:") ?
> > > >   - "fileorin:" (File-Or-(Files-)In) for "XXX:" ?
> > > >
> > > > Of course, I'm also OK with naming "XXX:" or "YYY:" as "files:", to
> go
> > > > forward :-)
> > >
> > > I like "file:" and "filesin:" for those two cases. But we should add
> > > the "root" prefix so we don't have to do that later, right?
> > >
> > > >
> > > >>
> > > >> >
> > > >> > Thanks
> > > >> > Rodrigo
> > > >> >
> > > >> >
> > > >> > _______________________________________________
> > > >> > Mercurial-devel mailing list
> > > >> > Mercurial-devel@mercurial-scm.org
> > > >> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> > > >> >
> > > >>
> > > >
> > > > --
> > > > ------------------------------------------------------------
> ----------
> > > > [FUJIWARA Katsunori]
> foozy@lares.dti.ne.jp
> > >
> > [1.2  <text/html; UTF-8 (quoted-printable)>]
> >
> > [2 S/MIME Cryptographic Signature <application/pkcs7-signature (base64)>]
> >
>
> --
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
>
Katsunori FUJIWARA - Jan. 29, 2017, 11:15 a.m.
At Fri, 27 Jan 2017 15:14:38 -0800,
Rodrigo Damazio wrote:
> 
> [1  <multipart/alternative (7bit)>]
> [1.1  <text/plain; UTF-8 (7bit)>]
> On Fri, Jan 27, 2017 at 1:03 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> wrote:
> 
> >
> > At Thu, 26 Jan 2017 17:27:17 -0800,
> > Rodrigo Damazio wrote:
> > >
> > > [1  <multipart/alternative (7bit)>]
> > > [1.1  <text/plain; UTF-8 (7bit)>]
> > > All sounds very reasonable, and "filesin:" or "rootfilesin:" LGTM.
> >
> > Is it OK for your solution that "rootfilesin:FOO" doesn't match
> > against "file FOO", even though your patch posted in this thread made
> > "files:FOO" do so ? or, is combining "rootfile:" and "rootfilesin"
> > acceptable for your solution ?
> >
> 
> Yes, not matching files is fine, and actually the easiest to implement (the
> regex is simpler and our custom server doesn't support files anyway).
> For that, rootfilesin:foo/bar can produce regex ^foo/bar/[^/]+$ or similar
> which would not match a file called bar. visitdir would have to be updated
> accordingly, of course, but that shouldn't be too hard (and i can take the
> opportunity to add some comments to the code).
> 
> If that looks good to you, let me know and I'll send an updated patch.

Sure, LGTM
via Mercurial-devel - Feb. 4, 2017, 4:26 a.m.
Finally working on this again.
On point which I discussed with Martin offline - which feels more intuitive
as a prefix, "root" or "abs"? (so, "rootfilesin" or "absfilesin"?) We think
it's "abs", but wanted to make sure that's OK with others.

Thanks


On Sun, Jan 29, 2017 at 3:15 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
wrote:

>
> At Fri, 27 Jan 2017 15:14:38 -0800,
> Rodrigo Damazio wrote:
> >
> > [1  <multipart/alternative (7bit)>]
> > [1.1  <text/plain; UTF-8 (7bit)>]
> > On Fri, Jan 27, 2017 at 1:03 AM, FUJIWARA Katsunori <
> foozy@lares.dti.ne.jp>
> > wrote:
> >
> > >
> > > At Thu, 26 Jan 2017 17:27:17 -0800,
> > > Rodrigo Damazio wrote:
> > > >
> > > > [1  <multipart/alternative (7bit)>]
> > > > [1.1  <text/plain; UTF-8 (7bit)>]
> > > > All sounds very reasonable, and "filesin:" or "rootfilesin:" LGTM.
> > >
> > > Is it OK for your solution that "rootfilesin:FOO" doesn't match
> > > against "file FOO", even though your patch posted in this thread made
> > > "files:FOO" do so ? or, is combining "rootfile:" and "rootfilesin"
> > > acceptable for your solution ?
> > >
> >
> > Yes, not matching files is fine, and actually the easiest to implement
> (the
> > regex is simpler and our custom server doesn't support files anyway).
> > For that, rootfilesin:foo/bar can produce regex ^foo/bar/[^/]+$ or
> similar
> > which would not match a file called bar. visitdir would have to be
> updated
> > accordingly, of course, but that shouldn't be too hard (and i can take
> the
> > opportunity to add some comments to the code).
> >
> > If that looks good to you, let me know and I'll send an updated patch.
>
> Sure, LGTM
>
> --
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
>
Katsunori FUJIWARA - Feb. 5, 2017, 12:01 p.m.
At Fri, 3 Feb 2017 20:26:04 -0800,
Rodrigo Damazio wrote:
> 
> [1  <multipart/alternative (7bit)>]
> [1.1  <text/plain; UTF-8 (7bit)>]
> Finally working on this again.
> On point which I discussed with Martin offline - which feels more intuitive
> as a prefix, "root" or "abs"? (so, "rootfilesin" or "absfilesin"?) We think
> it's "abs", but wanted to make sure that's OK with others.

In FileNamePatternsPlan page, I choose "root" as a name of point, to
which patterns are relative ("root", "cwd", and "any").

I'm OK with "abs", if other thinks that it is more intuitive for
"relative to the root".

BTW, are "cwd" and "any" prefixes are OK with "abs" ?

> Thanks
> 
> 
> On Sun, Jan 29, 2017 at 3:15 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> wrote:
> 
> >
> > At Fri, 27 Jan 2017 15:14:38 -0800,
> > Rodrigo Damazio wrote:
> > >
> > > [1  <multipart/alternative (7bit)>]
> > > [1.1  <text/plain; UTF-8 (7bit)>]
> > > On Fri, Jan 27, 2017 at 1:03 AM, FUJIWARA Katsunori <
> > foozy@lares.dti.ne.jp>
> > > wrote:
> > >
> > > >
> > > > At Thu, 26 Jan 2017 17:27:17 -0800,
> > > > Rodrigo Damazio wrote:
> > > > >
> > > > > [1  <multipart/alternative (7bit)>]
> > > > > [1.1  <text/plain; UTF-8 (7bit)>]
> > > > > All sounds very reasonable, and "filesin:" or "rootfilesin:" LGTM.
> > > >
> > > > Is it OK for your solution that "rootfilesin:FOO" doesn't match
> > > > against "file FOO", even though your patch posted in this thread made
> > > > "files:FOO" do so ? or, is combining "rootfile:" and "rootfilesin"
> > > > acceptable for your solution ?
> > > >
> > >
> > > Yes, not matching files is fine, and actually the easiest to implement
> > (the
> > > regex is simpler and our custom server doesn't support files anyway).
> > > For that, rootfilesin:foo/bar can produce regex ^foo/bar/[^/]+$ or
> > similar
> > > which would not match a file called bar. visitdir would have to be
> > updated
> > > accordingly, of course, but that shouldn't be too hard (and i can take
> > the
> > > opportunity to add some comments to the code).
> > >
> > > If that looks good to you, let me know and I'll send an updated patch.
> >
> > Sure, LGTM
> >
> > --
> > ----------------------------------------------------------------------
> > [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
> >
> [1.2  <text/html; UTF-8 (quoted-printable)>]
> 
> [2 S/MIME Cryptographic Signature <application/pkcs7-signature (base64)>]
>
via Mercurial-devel - Feb. 6, 2017, 8:16 p.m.
On Sun, Feb 5, 2017 at 4:01 AM, FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
wrote:

> At Fri, 3 Feb 2017 20:26:04 -0800,
> Rodrigo Damazio wrote:
> >
> > [1  <multipart/alternative (7bit)>]
> > [1.1  <text/plain; UTF-8 (7bit)>]
> > Finally working on this again.
> > On point which I discussed with Martin offline - which feels more
> intuitive
> > as a prefix, "root" or "abs"? (so, "rootfilesin" or "absfilesin"?) We
> think
> > it's "abs", but wanted to make sure that's OK with others.
>
> In FileNamePatternsPlan page, I choose "root" as a name of point, to
> which patterns are relative ("root", "cwd", and "any").
>
> I'm OK with "abs", if other thinks that it is more intuitive for
> "relative to the root".
>

Alright, I'll keep "root", it sounds more consistent when put that way.
(Augie also seems to prefer root)

BTW, are "cwd" and "any" prefixes are OK with "abs" ?


> > Thanks
> >
> >
> > On Sun, Jan 29, 2017 at 3:15 AM, FUJIWARA Katsunori <
> foozy@lares.dti.ne.jp>
> > wrote:
> >
> > >
> > > At Fri, 27 Jan 2017 15:14:38 -0800,
> > > Rodrigo Damazio wrote:
> > > >
> > > > [1  <multipart/alternative (7bit)>]
> > > > [1.1  <text/plain; UTF-8 (7bit)>]
> > > > On Fri, Jan 27, 2017 at 1:03 AM, FUJIWARA Katsunori <
> > > foozy@lares.dti.ne.jp>
> > > > wrote:
> > > >
> > > > >
> > > > > At Thu, 26 Jan 2017 17:27:17 -0800,
> > > > > Rodrigo Damazio wrote:
> > > > > >
> > > > > > [1  <multipart/alternative (7bit)>]
> > > > > > [1.1  <text/plain; UTF-8 (7bit)>]
> > > > > > All sounds very reasonable, and "filesin:" or "rootfilesin:"
> LGTM.
> > > > >
> > > > > Is it OK for your solution that "rootfilesin:FOO" doesn't match
> > > > > against "file FOO", even though your patch posted in this thread
> made
> > > > > "files:FOO" do so ? or, is combining "rootfile:" and "rootfilesin"
> > > > > acceptable for your solution ?
> > > > >
> > > >
> > > > Yes, not matching files is fine, and actually the easiest to
> implement
> > > (the
> > > > regex is simpler and our custom server doesn't support files anyway).
> > > > For that, rootfilesin:foo/bar can produce regex ^foo/bar/[^/]+$ or
> > > similar
> > > > which would not match a file called bar. visitdir would have to be
> > > updated
> > > > accordingly, of course, but that shouldn't be too hard (and i can
> take
> > > the
> > > > opportunity to add some comments to the code).
> > > >
> > > > If that looks good to you, let me know and I'll send an updated
> patch.
> > >
> > > Sure, LGTM
> > >
> > > --
> > > ----------------------------------------------------------------------
> > > [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
> > >
> > [1.2  <text/html; UTF-8 (quoted-printable)>]
> >
> > [2 S/MIME Cryptographic Signature <application/pkcs7-signature (base64)>]
> >
>
> --
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
>

Patch

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -105,6 +105,9 @@ 
         'glob:<glob>' - a glob relative to cwd
         're:<regexp>' - a regular expression
         'path:<path>' - a path relative to repository root
+        'files:<path>' - a path relative to repository root, which is matched
+                         non-recursively (files inside the directory will match,
+                         but subdirectories and files in them won't
         'relglob:<glob>' - an unrooted glob (*.c matches C files in all dirs)
         'relpath:<path>' - a path relative to cwd
         'relre:<regexp>' - a regexp that needn't match the start of a name
@@ -286,7 +289,7 @@ 
         for kind, pat in [_patsplit(p, default) for p in patterns]:
             if kind in ('glob', 'relpath'):
                 pat = pathutil.canonpath(root, cwd, pat, auditor)
-            elif kind in ('relglob', 'path'):
+            elif kind in ('relglob', 'path', 'files'):
                 pat = util.normpath(pat)
             elif kind in ('listfile', 'listfile0'):
                 try:
@@ -447,7 +450,8 @@ 
     if ':' in pattern:
         kind, pat = pattern.split(':', 1)
         if kind in ('re', 'glob', 'path', 'relglob', 'relpath', 'relre',
-                    'listfile', 'listfile0', 'set', 'include', 'subinclude'):
+                    'listfile', 'listfile0', 'set', 'include', 'subinclude',
+                    'files'):
             return kind, pat
     return default, pattern
 
@@ -540,6 +544,19 @@ 
         if pat == '.':
             return ''
         return '^' + util.re.escape(pat) + '(?:/|$)'
+    if kind == 'files':
+        # Match one of:
+        # For pat = 'some/dir':
+        # some/dir
+        # some/dir/
+        # some/dir/filename
+        # For pat = '' or pat = '.':
+        # filename
+        if pat == '.':
+            escaped = ''
+        else:
+            escaped = util.re.escape(pat)
+        return '^' + escaped + '(?:^|/|$)[^/]*$'
     if kind == 'relglob':
         return '(?:|.*/)' + _globre(pat) + globsuffix
     if kind == 'relpath':
@@ -628,7 +645,7 @@ 
                     break
                 root.append(p)
             r.append('/'.join(root) or '.')
-        elif kind in ('relpath', 'path'):
+        elif kind in ('relpath', 'path', 'files'):
             r.append(pat or '.')
         else: # relglob, re, relre
             r.append('.')
diff --git a/tests/test-locate.t b/tests/test-locate.t
--- a/tests/test-locate.t
+++ b/tests/test-locate.t
@@ -52,6 +52,12 @@ 
   t/b
   t/e.h
   t/x
+  $ hg locate files:
+  b
+  t.h
+  $ hg locate files:.
+  b
+  t.h
   $ hg locate -r 0 a
   a
   $ hg locate -r 0 NONEXISTENT
@@ -119,6 +125,13 @@ 
   ../t/e.h (glob)
   ../t/x (glob)
 
+  $ hg files files:
+  ../b (glob)
+  ../t.h (glob)
+  $ hg files files:.
+  ../b (glob)
+  ../t.h (glob)
+
   $ hg locate b
   ../b (glob)
   ../t/b (glob)
diff --git a/tests/test-walk.t b/tests/test-walk.t
--- a/tests/test-walk.t
+++ b/tests/test-walk.t
@@ -112,6 +112,8 @@ 
   f  beans/navy      ../beans/navy
   f  beans/pinto     ../beans/pinto
   f  beans/turtle    ../beans/turtle
+  $ hg debugwalk -I 'files:mammals'
+  f  mammals/skunk  skunk
   $ hg debugwalk .
   f  mammals/Procyonidae/cacomistle  Procyonidae/cacomistle
   f  mammals/Procyonidae/coatimundi  Procyonidae/coatimundi