Patchwork [2,of,2] match: add a subclass for dirstate normalizing of the matched patterns

login
register
mail settings
Submitter Matt Harbison
Date April 12, 2015, 8:52 p.m.
Message ID <6172eed8aa036002775a.1428871933@Envy>
Download mbox | patch
Permalink /patch/8625/
State Superseded
Commit baa11dde8c0e79ab84b51918e6b89edaf1eb0409
Headers show

Comments

Matt Harbison - April 12, 2015, 8:52 p.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1428817161 14400
#      Sun Apr 12 01:39:21 2015 -0400
# Node ID 6172eed8aa036002775a2ed02df47be5df02acc7
# Parent  75835458befcf5ddcef740c1a2ef0d5ce6804928
match: add a subclass for dirstate normalizing of the matched patterns

This class is only needed on case insensitive filesystems, and only for wdir
context matches.  It allows the user to not match the case of the items in the
filesystem- especially for naming directories, which dirstate doesn't handle[1].
Making dirstate handle mismatched directory cases is too expensive[2].

Since dirstate doesn't apply to committed csets, this is only created by
overriding basectx.match() in workingctx, and only on icasefs.  The default
arguments have been dropped, because the ctx must be passed to the matcher in
order to function.

For operations that can apply to both wdir and some other context, this ends up
normalizing the filename to the case as it exists in the filesystem, and using
that case for the lookup in the other context.  See the diff example in the
test.

Previously, given a directory with an inexact case:

  - add worked as expected

  - diff, forget and status would silently ignore the request

  - files would exit with 1

  - commit, revert and remove would fail (even when the commands leading up to
    them worked):

        $ hg ci -m "AbCDef" capsdir1/capsdir
        abort: CapsDir1/CapsDir: no match under directory!

        $ hg revert -r '.^' capsdir1/capsdir
        capsdir1\capsdir: no such file in rev 64dae27060b7

        $ hg remove capsdir1/capsdir
        not removing capsdir1\capsdir: no tracked files
        [1]

Globs are normalized, so that the -I and -X don't need to be specified with a
case match.  Without that, the second last remove (with -X) removes the files,
leaving nothing for the last remove.  However, specifying the files as
'glob:**.Txt' does not work.  Perhaps this requires 're.IGNORECASE'?

There are only a handful of places that create matchers directly, instead of
being routed through the context.match() method.  Some may benefit from changing
over to using ctx.match() as a factory function:

  revset.checkstatus()
  revset.contains()
  revset.filelog()
  revset._matchfiles()
  localrepository._loadfilter()
  ignore.ignore()
  fileset.subrepo()
  filemerge._picktool()
  overrides.addlargefiles()
  lfcommands.lfconvert()
  kwtemplate.__init__()
  eolfile.__init__()
  eolfile.checkrev()
  acl.buildmatch()

Currently, a toplevel subrepo can be named with an inexact case.  However, the
path auditor gets in the way of naming _anything_ in the subrepo if the top
level case doesn't match.

  --- a/tests/test-subrepo-deep-nested-change.t
  +++ b/tests/test-subrepo-deep-nested-change.t
  @@ -170,8 +170,15 @@
     R sub1/sub2/test.txt
     $ hg update -Cq
     $ touch sub1/sub2/folder/bar
  +#if icasefs
  +  $ hg addremove Sub1/sub2
  +  abort: path 'Sub1\sub2' is inside nested repo 'Sub1'
  +  [255]
  +  $ hg -q addremove sub1/sub2
  +#else
     $ hg addremove sub1/sub2
     adding sub1/sub2/folder/bar (glob)
  +#endif
     $ hg status -S
     A sub1/sub2/folder/bar
     ? foo/bar/abc

The narrowmatcher class may need to be tweaked when that is fixed.


[1] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068183.html
[2] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068191.html
Siddharth Agarwal - April 13, 2015, 5:36 p.m.
On 04/12/2015 04:52 PM, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1428817161 14400
> #      Sun Apr 12 01:39:21 2015 -0400
> # Node ID 6172eed8aa036002775a2ed02df47be5df02acc7
> # Parent  75835458befcf5ddcef740c1a2ef0d5ce6804928
> match: add a subclass for dirstate normalizing of the matched patterns
>
> This class is only needed on case insensitive filesystems, and only for wdir
> context matches.  It allows the user to not match the case of the items in the
> filesystem- especially for naming directories, which dirstate doesn't handle[1].
> Making dirstate handle mismatched directory cases is too expensive[2].
>
> Since dirstate doesn't apply to committed csets, this is only created by
> overriding basectx.match() in workingctx, and only on icasefs.  The default
> arguments have been dropped, because the ctx must be passed to the matcher in
> order to function.
>
> For operations that can apply to both wdir and some other context, this ends up
> normalizing the filename to the case as it exists in the filesystem, and using
> that case for the lookup in the other context.  See the diff example in the
> test.
>
> Previously, given a directory with an inexact case:
>
>   - add worked as expected
>
>   - diff, forget and status would silently ignore the request
>
>   - files would exit with 1
>
>   - commit, revert and remove would fail (even when the commands leading up to
>     them worked):
>
>         $ hg ci -m "AbCDef" capsdir1/capsdir
>         abort: CapsDir1/CapsDir: no match under directory!
>
>         $ hg revert -r '.^' capsdir1/capsdir
>         capsdir1\capsdir: no such file in rev 64dae27060b7
>
>         $ hg remove capsdir1/capsdir
>         not removing capsdir1\capsdir: no tracked files
>         [1]
>
> Globs are normalized, so that the -I and -X don't need to be specified with a
> case match.  Without that, the second last remove (with -X) removes the files,
> leaving nothing for the last remove.  However, specifying the files as
> 'glob:**.Txt' does not work.  Perhaps this requires 're.IGNORECASE'?
>
> There are only a handful of places that create matchers directly, instead of
> being routed through the context.match() method.  Some may benefit from changing
> over to using ctx.match() as a factory function:
>
>   revset.checkstatus()
>   revset.contains()
>   revset.filelog()
>   revset._matchfiles()
>   localrepository._loadfilter()
>   ignore.ignore()
>   fileset.subrepo()
>   filemerge._picktool()
>   overrides.addlargefiles()
>   lfcommands.lfconvert()
>   kwtemplate.__init__()
>   eolfile.__init__()
>   eolfile.checkrev()
>   acl.buildmatch()
>
> Currently, a toplevel subrepo can be named with an inexact case.  However, the
> path auditor gets in the way of naming _anything_ in the subrepo if the top
> level case doesn't match.

So this is a TODO then?

>
>   --- a/tests/test-subrepo-deep-nested-change.t
>   +++ b/tests/test-subrepo-deep-nested-change.t
>   @@ -170,8 +170,15 @@
>      R sub1/sub2/test.txt
>      $ hg update -Cq
>      $ touch sub1/sub2/folder/bar
>   +#if icasefs
>   +  $ hg addremove Sub1/sub2
>   +  abort: path 'Sub1\sub2' is inside nested repo 'Sub1'
>   +  [255]
>   +  $ hg -q addremove sub1/sub2
>   +#else
>      $ hg addremove sub1/sub2
>      adding sub1/sub2/folder/bar (glob)
>   +#endif
>      $ hg status -S
>      A sub1/sub2/folder/bar
>      ? foo/bar/abc
>
> The narrowmatcher class may need to be tweaked when that is fixed.
>
>
> [1] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068183.html
> [2] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068191.html
>
> diff --git a/mercurial/context.py b/mercurial/context.py
> --- a/mercurial/context.py
> +++ b/mercurial/context.py
> @@ -1424,6 +1424,19 @@
>              finally:
>                  wlock.release()
>  
> +    def match(self, pats=[], include=None, exclude=None, default='glob'):
> +        r = self._repo
> +
> +        # Only a case insensitive filesystem needs magic to translate user input
> +        # to actual case in the filesystem.
> +        if not util.checkcase(r.root):
> +            return matchmod.icasefsmatcher(r.root, r.getcwd(), pats, include,
> +                                           exclude, default, False, r.auditor,
> +                                           self)
> +        return matchmod.match(r.root, r.getcwd(), pats,
> +                              include, exclude, default,
> +                              auditor=r.auditor, ctx=self)
> +
>      def _filtersuspectsymlink(self, files):
>          if not files or self._repo.dirstate._checklink:
>              return files
> diff --git a/mercurial/match.py b/mercurial/match.py
> --- a/mercurial/match.py
> +++ b/mercurial/match.py
> @@ -273,6 +273,34 @@
>      def rel(self, f):
>          return self._matcher.rel(self._path + "/" + f)
>  
> +class icasefsmatcher(match):
> +    """A matcher for wdir on case insenstive filesystems, which normalizes the
> +    given patterns to the case in the filesystem.
> +    """
> +
> +    def __init__(self, root, cwd, patterns, include, exclude, default, exact,
> +                 auditor, ctx):
> +        init = super(icasefsmatcher, self).__init__
> +        self._dsnormalize = ctx.repo().dirstate.normalize
> +
> +        init(root, cwd, patterns, include, exclude, default, exact, auditor,
> +             ctx)
> +
> +        # Exact matches must be based off of the actual user input, otherwise
> +        # inexact case matches are treated as exact, and not noted without -v.
> +        if not exact and self._files:
> +            self._fmap = set(_roots(self._kp))
> +
> +    def _normalize(self, patterns, default, root, cwd, auditor):

We shouldn't apply case normalization on exact matchers at all, I think.

Other than that this looks fine. dirstate.normalize is a little more
expensive than necessary but the number of patterns is usually very small.

- Siddharth

> +        self._kp = super(icasefsmatcher, self)._normalize(patterns, default,
> +                                                          root, cwd, auditor)
> +        kindpats = []
> +        for kind, pats in self._kp:
> +            if kind not in ('re', 'relre'):  # regex can't be normalized
> +                pats = self._dsnormalize(pats)
> +            kindpats.append((kind, pats))
> +        return kindpats
> +
>  def patkind(pattern, default=None):
>      '''If pattern is 'kind:pat' with a known kind, return kind.'''
>      return _patsplit(pattern, default)[0]
> diff --git a/tests/test-add.t b/tests/test-add.t
> --- a/tests/test-add.t
> +++ b/tests/test-add.t
> @@ -176,12 +176,48 @@
>    $ mkdir CapsDir1/CapsDir/SubDir
>    $ echo def > CapsDir1/CapsDir/SubDir/Def.txt
>  
> -  $ hg add -v capsdir1/capsdir
> +  $ hg add capsdir1/capsdir
>    adding CapsDir1/CapsDir/AbC.txt (glob)
>    adding CapsDir1/CapsDir/SubDir/Def.txt (glob)
>  
>    $ hg forget capsdir1/capsdir/abc.txt
>    removing CapsDir1/CapsDir/AbC.txt (glob)
> +
> +  $ hg forget capsdir1/capsdir
> +  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
> +
> +  $ hg add capsdir1
> +  adding CapsDir1/CapsDir/AbC.txt (glob)
> +  adding CapsDir1/CapsDir/SubDir/Def.txt (glob)
> +
> +  $ hg ci -m "AbCDef" capsdir1/capsdir
> +
> +  $ hg status -A capsdir1/capsdir
> +  C CapsDir1/CapsDir/AbC.txt
> +  C CapsDir1/CapsDir/SubDir/Def.txt
> +
> +  $ hg files capsdir1/capsdir
> +  CapsDir1/CapsDir/AbC.txt (glob)
> +  CapsDir1/CapsDir/SubDir/Def.txt (glob)
> +
> +  $ echo xyz > CapsDir1/CapsDir/SubDir/Def.txt
> +  $ hg ci -m xyz capsdir1/capsdir/subdir/def.txt
> +
> +  $ hg revert -r '.^' capsdir1/capsdir
> +  reverting CapsDir1/CapsDir/SubDir/Def.txt (glob)
> +
> +  $ hg diff capsdir1/capsdir
> +  diff -r 5112e00e781d CapsDir1/CapsDir/SubDir/Def.txt
> +  --- a/CapsDir1/CapsDir/SubDir/Def.txt	Thu Jan 01 00:00:00 1970 +0000
> +  +++ b/CapsDir1/CapsDir/SubDir/Def.txt	* +0000 (glob)
> +  @@ -1,1 +1,1 @@
> +  -xyz
> +  +def
> +
> +  $ hg remove -f 'glob:**.txt' -X capsdir1/capsdir
> +  $ hg remove -f 'glob:**.txt' -I capsdir1/capsdir
> +  removing CapsDir1/CapsDir/AbC.txt (glob)
> +  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
>  #endif
>  
>    $ cd ..
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Matt Harbison - April 13, 2015, 11:50 p.m.
On Mon, 13 Apr 2015 13:36:53 -0400, Siddharth Agarwal  
<sid@less-broken.com> wrote:

> On 04/12/2015 04:52 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1428817161 14400
>> #      Sun Apr 12 01:39:21 2015 -0400
>> # Node ID 6172eed8aa036002775a2ed02df47be5df02acc7
>> # Parent  75835458befcf5ddcef740c1a2ef0d5ce6804928
>> match: add a subclass for dirstate normalizing of the matched patterns
>>
>> This class is only needed on case insensitive filesystems, and only for  
>> wdir
>> context matches.  It allows the user to not match the case of the items  
>> in the
>> filesystem- especially for naming directories, which dirstate doesn't  
>> handle[1].
>> Making dirstate handle mismatched directory cases is too expensive[2].
>>
>> Since dirstate doesn't apply to committed csets, this is only created by
>> overriding basectx.match() in workingctx, and only on icasefs.  The  
>> default
>> arguments have been dropped, because the ctx must be passed to the  
>> matcher in
>> order to function.
>>
>> For operations that can apply to both wdir and some other context, this  
>> ends up
>> normalizing the filename to the case as it exists in the filesystem,  
>> and using
>> that case for the lookup in the other context.  See the diff example in  
>> the
>> test.
>>
>> Previously, given a directory with an inexact case:
>>
>>   - add worked as expected
>>
>>   - diff, forget and status would silently ignore the request
>>
>>   - files would exit with 1
>>
>>   - commit, revert and remove would fail (even when the commands  
>> leading up to
>>     them worked):
>>
>>         $ hg ci -m "AbCDef" capsdir1/capsdir
>>         abort: CapsDir1/CapsDir: no match under directory!
>>
>>         $ hg revert -r '.^' capsdir1/capsdir
>>         capsdir1\capsdir: no such file in rev 64dae27060b7
>>
>>         $ hg remove capsdir1/capsdir
>>         not removing capsdir1\capsdir: no tracked files
>>         [1]
>>
>> Globs are normalized, so that the -I and -X don't need to be specified  
>> with a
>> case match.  Without that, the second last remove (with -X) removes the  
>> files,
>> leaving nothing for the last remove.  However, specifying the files as
>> 'glob:**.Txt' does not work.  Perhaps this requires 're.IGNORECASE'?
>>
>> There are only a handful of places that create matchers directly,  
>> instead of
>> being routed through the context.match() method.  Some may benefit from  
>> changing
>> over to using ctx.match() as a factory function:
>>
>>   revset.checkstatus()
>>   revset.contains()
>>   revset.filelog()
>>   revset._matchfiles()
>>   localrepository._loadfilter()
>>   ignore.ignore()
>>   fileset.subrepo()
>>   filemerge._picktool()
>>   overrides.addlargefiles()
>>   lfcommands.lfconvert()
>>   kwtemplate.__init__()
>>   eolfile.__init__()
>>   eolfile.checkrev()
>>   acl.buildmatch()
>>
>> Currently, a toplevel subrepo can be named with an inexact case.   
>> However, the
>> path auditor gets in the way of naming _anything_ in the subrepo if the  
>> top
>> level case doesn't match.
>
> So this is a TODO then?

Yes.  It might be tricky though, because localrepository._checknested()  
checks 'prefix in ctx.substate', which might mean normalizing the keys in  
workingctx.substate.

>>
>>   --- a/tests/test-subrepo-deep-nested-change.t
>>   +++ b/tests/test-subrepo-deep-nested-change.t
>>   @@ -170,8 +170,15 @@
>>      R sub1/sub2/test.txt
>>      $ hg update -Cq
>>      $ touch sub1/sub2/folder/bar
>>   +#if icasefs
>>   +  $ hg addremove Sub1/sub2
>>   +  abort: path 'Sub1\sub2' is inside nested repo 'Sub1'
>>   +  [255]
>>   +  $ hg -q addremove sub1/sub2
>>   +#else
>>      $ hg addremove sub1/sub2
>>      adding sub1/sub2/folder/bar (glob)
>>   +#endif
>>      $ hg status -S
>>      A sub1/sub2/folder/bar
>>      ? foo/bar/abc
>>
>> The narrowmatcher class may need to be tweaked when that is fixed.
>>
>>
>> [1]  
>> http://www.selenic.com/pipermail/mercurial-devel/2015-April/068183.html
>> [2]  
>> http://www.selenic.com/pipermail/mercurial-devel/2015-April/068191.html
>>
>> diff --git a/mercurial/context.py b/mercurial/context.py
>> --- a/mercurial/context.py
>> +++ b/mercurial/context.py
>> @@ -1424,6 +1424,19 @@
>>              finally:
>>                  wlock.release()
>>
>> +    def match(self, pats=[], include=None, exclude=None,  
>> default='glob'):
>> +        r = self._repo
>> +
>> +        # Only a case insensitive filesystem needs magic to translate  
>> user input
>> +        # to actual case in the filesystem.
>> +        if not util.checkcase(r.root):
>> +            return matchmod.icasefsmatcher(r.root, r.getcwd(), pats,  
>> include,
>> +                                           exclude, default, False,  
>> r.auditor,
>> +                                           self)
>> +        return matchmod.match(r.root, r.getcwd(), pats,
>> +                              include, exclude, default,
>> +                              auditor=r.auditor, ctx=self)
>> +
>>      def _filtersuspectsymlink(self, files):
>>          if not files or self._repo.dirstate._checklink:
>>              return files
>> diff --git a/mercurial/match.py b/mercurial/match.py
>> --- a/mercurial/match.py
>> +++ b/mercurial/match.py
>> @@ -273,6 +273,34 @@
>>      def rel(self, f):
>>          return self._matcher.rel(self._path + "/" + f)
>>
>> +class icasefsmatcher(match):
>> +    """A matcher for wdir on case insenstive filesystems, which  
>> normalizes the
>> +    given patterns to the case in the filesystem.
>> +    """
>> +
>> +    def __init__(self, root, cwd, patterns, include, exclude, default,  
>> exact,
>> +                 auditor, ctx):
>> +        init = super(icasefsmatcher, self).__init__
>> +        self._dsnormalize = ctx.repo().dirstate.normalize
>> +
>> +        init(root, cwd, patterns, include, exclude, default, exact,  
>> auditor,
>> +             ctx)
>> +
>> +        # Exact matches must be based off of the actual user input,  
>> otherwise
>> +        # inexact case matches are treated as exact, and not noted  
>> without -v.
>> +        if not exact and self._files:
>> +            self._fmap = set(_roots(self._kp))
>> +
>> +    def _normalize(self, patterns, default, root, cwd, auditor):
>
> We shouldn't apply case normalization on exact matchers at all, I think.

Agreed.  The superclass doesn't call _normalize() if exact.

What this is trying to say is that _fmap needs to be updated with roots in  
the user specified case, so that m.exact('name') is testing against what  
the user provided.  Otherwise, existing tests drop 'adding xyz' lines when  
only the case is different.  matchmod.exact([a, b, c]) bypasses this new  
class always.

I wasn't sure if both constructors should take the same parameters, in the  
same order, for sanity.  If not, I can drop the 'exact' variable.  This is  
only created with exact == False anyway.

> Other than that this looks fine. dirstate.normalize is a little more
> expensive than necessary but the number of patterns is usually very  
> small.
>
> - Siddharth
>
>> +        self._kp = super(icasefsmatcher, self)._normalize(patterns,  
>> default,
>> +                                                          root, cwd,  
>> auditor)
>> +        kindpats = []
>> +        for kind, pats in self._kp:
>> +            if kind not in ('re', 'relre'):  # regex can't be  
>> normalized
>> +                pats = self._dsnormalize(pats)
>> +            kindpats.append((kind, pats))
>> +        return kindpats
>> +
>>  def patkind(pattern, default=None):
>>      '''If pattern is 'kind:pat' with a known kind, return kind.'''
>>      return _patsplit(pattern, default)[0]
>> diff --git a/tests/test-add.t b/tests/test-add.t
>> --- a/tests/test-add.t
>> +++ b/tests/test-add.t
>> @@ -176,12 +176,48 @@
>>    $ mkdir CapsDir1/CapsDir/SubDir
>>    $ echo def > CapsDir1/CapsDir/SubDir/Def.txt
>>
>> -  $ hg add -v capsdir1/capsdir
>> +  $ hg add capsdir1/capsdir
>>    adding CapsDir1/CapsDir/AbC.txt (glob)
>>    adding CapsDir1/CapsDir/SubDir/Def.txt (glob)
>>
>>    $ hg forget capsdir1/capsdir/abc.txt
>>    removing CapsDir1/CapsDir/AbC.txt (glob)
>> +
>> +  $ hg forget capsdir1/capsdir
>> +  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
>> +
>> +  $ hg add capsdir1
>> +  adding CapsDir1/CapsDir/AbC.txt (glob)
>> +  adding CapsDir1/CapsDir/SubDir/Def.txt (glob)
>> +
>> +  $ hg ci -m "AbCDef" capsdir1/capsdir
>> +
>> +  $ hg status -A capsdir1/capsdir
>> +  C CapsDir1/CapsDir/AbC.txt
>> +  C CapsDir1/CapsDir/SubDir/Def.txt
>> +
>> +  $ hg files capsdir1/capsdir
>> +  CapsDir1/CapsDir/AbC.txt (glob)
>> +  CapsDir1/CapsDir/SubDir/Def.txt (glob)
>> +
>> +  $ echo xyz > CapsDir1/CapsDir/SubDir/Def.txt
>> +  $ hg ci -m xyz capsdir1/capsdir/subdir/def.txt
>> +
>> +  $ hg revert -r '.^' capsdir1/capsdir
>> +  reverting CapsDir1/CapsDir/SubDir/Def.txt (glob)
>> +
>> +  $ hg diff capsdir1/capsdir
>> +  diff -r 5112e00e781d CapsDir1/CapsDir/SubDir/Def.txt
>> +  --- a/CapsDir1/CapsDir/SubDir/Def.txt	Thu Jan 01 00:00:00 1970 +0000
>> +  +++ b/CapsDir1/CapsDir/SubDir/Def.txt	* +0000 (glob)
>> +  @@ -1,1 +1,1 @@
>> +  -xyz
>> +  +def
>> +
>> +  $ hg remove -f 'glob:**.txt' -X capsdir1/capsdir
>> +  $ hg remove -f 'glob:**.txt' -I capsdir1/capsdir
>> +  removing CapsDir1/CapsDir/AbC.txt (glob)
>> +  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
>>  #endif
>>
>>    $ cd ..
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
Pierre-Yves David - April 14, 2015, 8:27 p.m.
On 04/13/2015 01:36 PM, Siddharth Agarwal wrote:
> On 04/12/2015 04:52 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1428817161 14400
>> #      Sun Apr 12 01:39:21 2015 -0400
>> # Node ID 6172eed8aa036002775a2ed02df47be5df02acc7
>> # Parent  75835458befcf5ddcef740c1a2ef0d5ce6804928
>> match: add a subclass for dirstate normalizing of the matched patterns
>>
>> This class is only needed on case insensitive filesystems, and only for wdir
>> context matches.  It allows the user to not match the case of the items in the
>> filesystem- especially for naming directories, which dirstate doesn't handle[1].
>> Making dirstate handle mismatched directory cases is too expensive[2].
>>
>> Since dirstate doesn't apply to committed csets, this is only created by
>> overriding basectx.match() in workingctx, and only on icasefs.  The default
>> arguments have been dropped, because the ctx must be passed to the matcher in
>> order to function.
>>
>> For operations that can apply to both wdir and some other context, this ends up
>> normalizing the filename to the case as it exists in the filesystem, and using
>> that case for the lookup in the other context.  See the diff example in the
>> test.
>>
>> Previously, given a directory with an inexact case:
>>
>>    - add worked as expected
>>
>>    - diff, forget and status would silently ignore the request
>>
>>    - files would exit with 1
>>
>>    - commit, revert and remove would fail (even when the commands leading up to
>>      them worked):
>>
>>          $ hg ci -m "AbCDef" capsdir1/capsdir
>>          abort: CapsDir1/CapsDir: no match under directory!
>>
>>          $ hg revert -r '.^' capsdir1/capsdir
>>          capsdir1\capsdir: no such file in rev 64dae27060b7
>>
>>          $ hg remove capsdir1/capsdir
>>          not removing capsdir1\capsdir: no tracked files
>>          [1]
>>
>> Globs are normalized, so that the -I and -X don't need to be specified with a
>> case match.  Without that, the second last remove (with -X) removes the files,
>> leaving nothing for the last remove.  However, specifying the files as
>> 'glob:**.Txt' does not work.  Perhaps this requires 're.IGNORECASE'?
>>
>> There are only a handful of places that create matchers directly, instead of
>> being routed through the context.match() method.  Some may benefit from changing
>> over to using ctx.match() as a factory function:
>>
>>    revset.checkstatus()
>>    revset.contains()
>>    revset.filelog()
>>    revset._matchfiles()
>>    localrepository._loadfilter()
>>    ignore.ignore()
>>    fileset.subrepo()
>>    filemerge._picktool()
>>    overrides.addlargefiles()
>>    lfcommands.lfconvert()
>>    kwtemplate.__init__()
>>    eolfile.__init__()
>>    eolfile.checkrev()
>>    acl.buildmatch()
>>
>> Currently, a toplevel subrepo can be named with an inexact case.  However, the
>> path auditor gets in the way of naming _anything_ in the subrepo if the top
>> level case doesn't match.
>
> So this is a TODO then?
>
>>
>>    --- a/tests/test-subrepo-deep-nested-change.t
>>    +++ b/tests/test-subrepo-deep-nested-change.t
>>    @@ -170,8 +170,15 @@
>>       R sub1/sub2/test.txt
>>       $ hg update -Cq
>>       $ touch sub1/sub2/folder/bar
>>    +#if icasefs
>>    +  $ hg addremove Sub1/sub2
>>    +  abort: path 'Sub1\sub2' is inside nested repo 'Sub1'
>>    +  [255]
>>    +  $ hg -q addremove sub1/sub2
>>    +#else
>>       $ hg addremove sub1/sub2
>>       adding sub1/sub2/folder/bar (glob)
>>    +#endif
>>       $ hg status -S
>>       A sub1/sub2/folder/bar
>>       ? foo/bar/abc
>>
>> The narrowmatcher class may need to be tweaked when that is fixed.
>>
>>
>> [1] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068183.html
>> [2] http://www.selenic.com/pipermail/mercurial-devel/2015-April/068191.html
>>
>> diff --git a/mercurial/context.py b/mercurial/context.py
>> --- a/mercurial/context.py
>> +++ b/mercurial/context.py
>> @@ -1424,6 +1424,19 @@
>>               finally:
>>                   wlock.release()
>>
>> +    def match(self, pats=[], include=None, exclude=None, default='glob'):
>> +        r = self._repo
>> +
>> +        # Only a case insensitive filesystem needs magic to translate user input
>> +        # to actual case in the filesystem.
>> +        if not util.checkcase(r.root):
>> +            return matchmod.icasefsmatcher(r.root, r.getcwd(), pats, include,
>> +                                           exclude, default, False, r.auditor,
>> +                                           self)
>> +        return matchmod.match(r.root, r.getcwd(), pats,
>> +                              include, exclude, default,
>> +                              auditor=r.auditor, ctx=self)
>> +
>>       def _filtersuspectsymlink(self, files):
>>           if not files or self._repo.dirstate._checklink:
>>               return files
>> diff --git a/mercurial/match.py b/mercurial/match.py
>> --- a/mercurial/match.py
>> +++ b/mercurial/match.py
>> @@ -273,6 +273,34 @@
>>       def rel(self, f):
>>           return self._matcher.rel(self._path + "/" + f)
>>
>> +class icasefsmatcher(match):
>> +    """A matcher for wdir on case insenstive filesystems, which normalizes the
>> +    given patterns to the case in the filesystem.
>> +    """
>> +
>> +    def __init__(self, root, cwd, patterns, include, exclude, default, exact,
>> +                 auditor, ctx):
>> +        init = super(icasefsmatcher, self).__init__
>> +        self._dsnormalize = ctx.repo().dirstate.normalize
>> +
>> +        init(root, cwd, patterns, include, exclude, default, exact, auditor,
>> +             ctx)
>> +
>> +        # Exact matches must be based off of the actual user input, otherwise
>> +        # inexact case matches are treated as exact, and not noted without -v.
>> +        if not exact and self._files:
>> +            self._fmap = set(_roots(self._kp))
>> +
>> +    def _normalize(self, patterns, default, root, cwd, auditor):
>
> We shouldn't apply case normalization on exact matchers at all, I think.
>
> Other than that this looks fine. dirstate.normalize is a little more
> expensive than necessary but the number of patterns is usually very small.
>
> - Siddharth

So, what's the state of this? Should we queue it or expect a V2?
Siddharth Agarwal - April 14, 2015, 8:31 p.m.
On 04/13/2015 07:50 PM, Matt Harbison wrote:
> Yes.  It might be tricky though, because
> localrepository._checknested() checks 'prefix in ctx.substate', which
> might mean normalizing the keys in workingctx.substate.

That's fine -- could you add a note in the comments?

> Agreed.  The superclass doesn't call _normalize() if exact.
>
> What this is trying to say is that _fmap needs to be updated with
> roots in the user specified case, so that m.exact('name') is testing
> against what the user provided.  Otherwise, existing tests drop
> 'adding xyz' lines when only the case is different. 
> matchmod.exact([a, b, c]) bypasses this new class always.
>
> I wasn't sure if both constructors should take the same parameters, in
> the same order, for sanity.  If not, I can drop the 'exact' variable. 
> This is only created with exact == False anyway.

Yeah, drop the exact variable I think. That makes it way less confusing
to readers (which we should optimize for over writers).

- Siddharth

>
>> Other than that this looks fine. dirstate.normalize is a little more
>> expensive than necessary but the number of patterns is usually very
>> small.
>>
>> - Siddharth
>>
>>> +        self._kp = super(icasefsmatcher, self)._normalize(patterns,
>>> default,
>>> +                                                          root,
>>> cwd, auditor)
>>> +        kindpats = []
>>> +        for kind, pats in self._kp:
>>> +            if kind not in ('re', 'relre'):  # regex can't be
>>> normalized
>>> +                pats = self._dsnormalize(pats)
>>> +            kindpats.append((kind, pats))
>>> +        return kindpats
>>> +
>>>  def patkind(pattern, default=None):
>>>      '''If pattern is 'kind:pat' with a known kind, return kind.'''
>>>      return _patsplit(pattern, default)[0]
>>> diff --git a/tests/test-add.t b/tests/test-add.t
>>> --- a/tests/test-add.t
>>> +++ b/tests/test-add.t
>>> @@ -176,12 +176,48 @@
>>>    $ mkdir CapsDir1/CapsDir/SubDir
>>>    $ echo def > CapsDir1/CapsDir/SubDir/Def.txt
>>>
>>> -  $ hg add -v capsdir1/capsdir
>>> +  $ hg add capsdir1/capsdir
>>>    adding CapsDir1/CapsDir/AbC.txt (glob)
>>>    adding CapsDir1/CapsDir/SubDir/Def.txt (glob)
>>>
>>>    $ hg forget capsdir1/capsdir/abc.txt
>>>    removing CapsDir1/CapsDir/AbC.txt (glob)
>>> +
>>> +  $ hg forget capsdir1/capsdir
>>> +  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
>>> +
>>> +  $ hg add capsdir1
>>> +  adding CapsDir1/CapsDir/AbC.txt (glob)
>>> +  adding CapsDir1/CapsDir/SubDir/Def.txt (glob)
>>> +
>>> +  $ hg ci -m "AbCDef" capsdir1/capsdir
>>> +
>>> +  $ hg status -A capsdir1/capsdir
>>> +  C CapsDir1/CapsDir/AbC.txt
>>> +  C CapsDir1/CapsDir/SubDir/Def.txt
>>> +
>>> +  $ hg files capsdir1/capsdir
>>> +  CapsDir1/CapsDir/AbC.txt (glob)
>>> +  CapsDir1/CapsDir/SubDir/Def.txt (glob)
>>> +
>>> +  $ echo xyz > CapsDir1/CapsDir/SubDir/Def.txt
>>> +  $ hg ci -m xyz capsdir1/capsdir/subdir/def.txt
>>> +
>>> +  $ hg revert -r '.^' capsdir1/capsdir
>>> +  reverting CapsDir1/CapsDir/SubDir/Def.txt (glob)
>>> +
>>> +  $ hg diff capsdir1/capsdir
>>> +  diff -r 5112e00e781d CapsDir1/CapsDir/SubDir/Def.txt
>>> +  --- a/CapsDir1/CapsDir/SubDir/Def.txt    Thu Jan 01 00:00:00 1970
>>> +0000
>>> +  +++ b/CapsDir1/CapsDir/SubDir/Def.txt    * +0000 (glob)
>>> +  @@ -1,1 +1,1 @@
>>> +  -xyz
>>> +  +def
>>> +
>>> +  $ hg remove -f 'glob:**.txt' -X capsdir1/capsdir
>>> +  $ hg remove -f 'glob:**.txt' -I capsdir1/capsdir
>>> +  removing CapsDir1/CapsDir/AbC.txt (glob)
>>> +  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
>>>  #endif
>>>
>>>    $ cd ..
>>> _______________________________________________
>>> Mercurial-devel mailing list
>>> Mercurial-devel@selenic.com
>>> http://selenic.com/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/context.py b/mercurial/context.py
--- a/mercurial/context.py
+++ b/mercurial/context.py
@@ -1424,6 +1424,19 @@ 
             finally:
                 wlock.release()
 
+    def match(self, pats=[], include=None, exclude=None, default='glob'):
+        r = self._repo
+
+        # Only a case insensitive filesystem needs magic to translate user input
+        # to actual case in the filesystem.
+        if not util.checkcase(r.root):
+            return matchmod.icasefsmatcher(r.root, r.getcwd(), pats, include,
+                                           exclude, default, False, r.auditor,
+                                           self)
+        return matchmod.match(r.root, r.getcwd(), pats,
+                              include, exclude, default,
+                              auditor=r.auditor, ctx=self)
+
     def _filtersuspectsymlink(self, files):
         if not files or self._repo.dirstate._checklink:
             return files
diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -273,6 +273,34 @@ 
     def rel(self, f):
         return self._matcher.rel(self._path + "/" + f)
 
+class icasefsmatcher(match):
+    """A matcher for wdir on case insenstive filesystems, which normalizes the
+    given patterns to the case in the filesystem.
+    """
+
+    def __init__(self, root, cwd, patterns, include, exclude, default, exact,
+                 auditor, ctx):
+        init = super(icasefsmatcher, self).__init__
+        self._dsnormalize = ctx.repo().dirstate.normalize
+
+        init(root, cwd, patterns, include, exclude, default, exact, auditor,
+             ctx)
+
+        # Exact matches must be based off of the actual user input, otherwise
+        # inexact case matches are treated as exact, and not noted without -v.
+        if not exact and self._files:
+            self._fmap = set(_roots(self._kp))
+
+    def _normalize(self, patterns, default, root, cwd, auditor):
+        self._kp = super(icasefsmatcher, self)._normalize(patterns, default,
+                                                          root, cwd, auditor)
+        kindpats = []
+        for kind, pats in self._kp:
+            if kind not in ('re', 'relre'):  # regex can't be normalized
+                pats = self._dsnormalize(pats)
+            kindpats.append((kind, pats))
+        return kindpats
+
 def patkind(pattern, default=None):
     '''If pattern is 'kind:pat' with a known kind, return kind.'''
     return _patsplit(pattern, default)[0]
diff --git a/tests/test-add.t b/tests/test-add.t
--- a/tests/test-add.t
+++ b/tests/test-add.t
@@ -176,12 +176,48 @@ 
   $ mkdir CapsDir1/CapsDir/SubDir
   $ echo def > CapsDir1/CapsDir/SubDir/Def.txt
 
-  $ hg add -v capsdir1/capsdir
+  $ hg add capsdir1/capsdir
   adding CapsDir1/CapsDir/AbC.txt (glob)
   adding CapsDir1/CapsDir/SubDir/Def.txt (glob)
 
   $ hg forget capsdir1/capsdir/abc.txt
   removing CapsDir1/CapsDir/AbC.txt (glob)
+
+  $ hg forget capsdir1/capsdir
+  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
+
+  $ hg add capsdir1
+  adding CapsDir1/CapsDir/AbC.txt (glob)
+  adding CapsDir1/CapsDir/SubDir/Def.txt (glob)
+
+  $ hg ci -m "AbCDef" capsdir1/capsdir
+
+  $ hg status -A capsdir1/capsdir
+  C CapsDir1/CapsDir/AbC.txt
+  C CapsDir1/CapsDir/SubDir/Def.txt
+
+  $ hg files capsdir1/capsdir
+  CapsDir1/CapsDir/AbC.txt (glob)
+  CapsDir1/CapsDir/SubDir/Def.txt (glob)
+
+  $ echo xyz > CapsDir1/CapsDir/SubDir/Def.txt
+  $ hg ci -m xyz capsdir1/capsdir/subdir/def.txt
+
+  $ hg revert -r '.^' capsdir1/capsdir
+  reverting CapsDir1/CapsDir/SubDir/Def.txt (glob)
+
+  $ hg diff capsdir1/capsdir
+  diff -r 5112e00e781d CapsDir1/CapsDir/SubDir/Def.txt
+  --- a/CapsDir1/CapsDir/SubDir/Def.txt	Thu Jan 01 00:00:00 1970 +0000
+  +++ b/CapsDir1/CapsDir/SubDir/Def.txt	* +0000 (glob)
+  @@ -1,1 +1,1 @@
+  -xyz
+  +def
+
+  $ hg remove -f 'glob:**.txt' -X capsdir1/capsdir
+  $ hg remove -f 'glob:**.txt' -I capsdir1/capsdir
+  removing CapsDir1/CapsDir/AbC.txt (glob)
+  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
 #endif
 
   $ cd ..