Patchwork dirstate: do full walk with icasefs even if ignored and unknown isn't wanted

login
register
mail settings
Submitter Matt Harbison
Date April 2, 2015, 7:12 p.m.
Message ID <d55f5d6eeb470d1675f9.1428001940@MATT7H-PC.attotech.com>
Download mbox | patch
Permalink /patch/8457/
State Changes Requested
Headers show

Comments

Matt Harbison - April 2, 2015, 7:12 p.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1427855782 14400
#      Tue Mar 31 22:36:22 2015 -0400
# Node ID d55f5d6eeb470d1675f917136b8dcd4f99308e6a
# Parent  37a2b446985f2ef77b9690a0548c8630828b7412
dirstate: do full walk with icasefs even if ignored and unknown isn't wanted

This is needed in order to let forget work when specifying a directory, but not
matching its case in the filesystem.  The previous forget behavior was to
silently ignore the given directory if it was not an exact case match.  That is
unexpected, because forgetting a file with a case mismatch works, as does adding
a directory with a case mismatch.

'dirignore' cannot be always() with icasefs, because in this test case, the work
list from self._walkexplicit() is "('CapsDir1/CapsDir', 'capsdir1/capsdir')".
But if directories are always ignored, it drops that tuple from the work list,
and never loops through step 2 to find the underlying file.

Testing child directories with 'ignore' instead of 'dirignore' seems like a
typo?  Without s/ignore/dirignore/, if 'ignore' is always(), it won't recurse
into child directories to look for files.  (So in this test case, 'SubDir' is
not added to the work list for processing.)  If ignore is set to something other
than always() when ignored files are not requested, dirstate.status() ends up
putting the ignored files into the unknown list.  Some callers are not expecting
these files, and cause test failures.  The most common failure is the identify
command, which appends a '+' if any list in repo.status() is not empty (it
doesn't explicitly ask for ignored or unknown files).

Perhaps the ignore/dirignore check can be optimized as:

        'if (f != nf) and dirignore(nf) or ignore(nf)'

in order to short-circuit the test on case sensitive filesystems, as it was
prior to this change.  Doing that now would cloud the actual changes required
here though.

The non-normalized file is passed to the matchers, consistent with how 'hg add'
was fixed in 2bb13f2b778c.  The problem with passing 'nf' is that for this test,
the 'nf' values inside the 'f, kind, st in entries' loop are:

  capsdir1/capsdir/AbC.txt
  capsdir1/capsdir/SubDir
  CapsDir1/CapsDir/SubDir/Def.txt

The upper case form is rejected by the matcher.
Siddharth Agarwal - April 2, 2015, 10:15 p.m.
On 04/02/2015 12:12 PM, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1427855782 14400
> #      Tue Mar 31 22:36:22 2015 -0400
> # Node ID d55f5d6eeb470d1675f917136b8dcd4f99308e6a
> # Parent  37a2b446985f2ef77b9690a0548c8630828b7412
> dirstate: do full walk with icasefs even if ignored and unknown isn't wanted

There's a very significant performance cost to this :(

Instead of this, can we create a new matcher that contains normalized
patterns? Seems like that'll be a cleaner solution with roughly zero
perf cost.

- Siddharth
Matt Harbison - April 2, 2015, 11:53 p.m.
On Thu, 02 Apr 2015 18:15:31 -0400, Siddharth Agarwal  
<sid@less-broken.com> wrote:

> On 04/02/2015 12:12 PM, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1427855782 14400
>> #      Tue Mar 31 22:36:22 2015 -0400
>> # Node ID d55f5d6eeb470d1675f917136b8dcd4f99308e6a
>> # Parent  37a2b446985f2ef77b9690a0548c8630828b7412
>> dirstate: do full walk with icasefs even if ignored and unknown isn't  
>> wanted
>
> There's a very significant performance cost to this :(
>
> Instead of this, can we create a new matcher that contains normalized
> patterns? Seems like that'll be a cleaner solution with roughly zero
> perf cost.
>
> - Siddharth

I'm not sure that there's a way to create a new one- it is passed from  
commands.py to repo.status() in cmdutil.forget().  The fact that the files  
don't show in the result of status() is how these are missed.

Does it seem reasonable to override basectx.match() in workingctx, to  
normalize the patterns there?  Then we aren't normalizing when referring  
to other revisions.  (But maybe that will cause problems for commands that  
can refer to multiple revisions, like "hg diff -r '.^' File.Txt"?)

Or can we just push this normalizing into the matcher, after it has broken  
down the pattern lists?  We really can only normalize path and relpath  
types.  Fixing up the matcher in either of these ways would seem to fix a  
bunch of these problems.

--Matt
Siddharth Agarwal - April 3, 2015, 12:06 a.m.
On 04/02/2015 04:53 PM, Matt Harbison wrote:
>
> Or can we just push this normalizing into the matcher, after it has
> broken down the pattern lists?  We really can only normalize path and
> relpath types.  Fixing up the matcher in either of these ways would
> seem to fix a bunch of these problems.

That sounds pretty reasonable, but we'll also need to make sure the
(significant) perf cost of normalization is only paid for when
necessary. In particular, it seems to be required for working contexts
but not others.

- Siddharth
Augie Fackler - April 3, 2015, 2:58 p.m.
On Thu, Apr 02, 2015 at 05:06:20PM -0700, Siddharth Agarwal wrote:
> On 04/02/2015 04:53 PM, Matt Harbison wrote:
> >
> > Or can we just push this normalizing into the matcher, after it has
> > broken down the pattern lists?  We really can only normalize path and
> > relpath types.  Fixing up the matcher in either of these ways would
> > seem to fix a bunch of these problems.
>
> That sounds pretty reasonable, but we'll also need to make sure the
> (significant) perf cost of normalization is only paid for when
> necessary. In particular, it seems to be required for working contexts
> but not others.

Perhaps the right thing is to have a workingmatcher or similar that
knows to apply filesystem-specific normalizations? Then hide things in
a factory function?

>
> - Siddharth
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -721,6 +721,11 @@  class dirstate(object):
         elif unknown:
             ignore = self._ignore
             dirignore = self._dirignore
+        elif self._checkcase:
+            # Don't drop dir recursion with icasefs, in case the matcher only
+            # specifies a dir with a case that does NOT match the filesystem.
+            ignore = util.always
+            dirignore = self._dirignore
         else:
             # if not unknown and not ignored, drop dir recursion and step 2
             ignore = util.always
@@ -776,33 +781,33 @@  class dirstate(object):
                         continue
                     raise
                 for f, kind, st in entries:
+                    f = nd and (nd + "/" + f) or f
                     if normalizefile:
                         # even though f might be a directory, we're only
                         # interested in comparing it to files currently in the
                         # dmap -- therefore normalizefile is enough
-                        nf = normalizefile(nd and (nd + "/" + f) or f, True,
-                                           True)
+                        nf = normalizefile(f, True, True)
                     else:
-                        nf = nd and (nd + "/" + f) or f
+                        nf = f
                     if nf not in results:
                         if kind == dirkind:
-                            if not ignore(nf):
+                            if not dirignore(nf):
                                 if matchtdir:
                                     matchtdir(nf)
                                 wadd(nf)
-                            if nf in dmap and (matchalways or matchfn(nf)):
+                            if nf in dmap and (matchalways or matchfn(f)):
                                 results[nf] = None
                         elif kind == regkind or kind == lnkkind:
                             if nf in dmap:
-                                if matchalways or matchfn(nf):
+                                if matchalways or matchfn(f):
                                     results[nf] = st
-                            elif ((matchalways or matchfn(nf))
+                            elif ((matchalways or matchfn(f))
                                   and not ignore(nf)):
                                 # unknown file -- normalize if necessary
                                 if not alreadynormed:
                                     nf = normalize(nf, False, True)
                                 results[nf] = st
-                        elif nf in dmap and (matchalways or matchfn(nf)):
+                        elif nf in dmap and (matchalways or matchfn(f)):
                             results[nf] = None
 
         for nd, d in work:
diff --git a/tests/test-add.t b/tests/test-add.t
--- a/tests/test-add.t
+++ b/tests/test-add.t
@@ -182,6 +182,9 @@  Test that adding a directory doesn't req
 
   $ hg forget capsdir1/capsdir/abc.txt
   removing CapsDir1/CapsDir/AbC.txt (glob)
+
+  $ hg forget -v capsdir1/capsdir
+  removing CapsDir1/CapsDir/SubDir/Def.txt (glob)
 #endif
 
   $ cd ..