Patchwork dirstate: drop explicitly-specified files that shouldn't match (BC)

login
register
mail settings
Submitter Yuya Nishihara
Date Feb. 11, 2018, 4:09 a.m.
Message ID <ba8846546795996af10f.1518322190@mimosa>
Download mbox | patch
Permalink /patch/27523/
State Accepted
Headers show

Comments

Yuya Nishihara - Feb. 11, 2018, 4:09 a.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1516963719 -32400
#      Fri Jan 26 19:48:39 2018 +0900
# Node ID ba8846546795996af10fe93db4317b855018d749
# Parent  91aac8e6604d1aa08b2683c1d4c7d1936f226e48
dirstate: drop explicitly-specified files that shouldn't match (BC)

Before, wctx.walk() could include files excluded by -X pattern, which
disagrees with wctx.matches() and ctx.walk()/matches() behavior. This patch
fixes the problem by testing stat results against the matcher if the matcher
may contain false paths.

I have no idea if the fix should be made before the workaround for case-
insensitive filesystems, but that shouldn't matter since match.anypats()
means 'not match.isexact()'.

This patch also makes sparse.py to not exclude explicit paths on walk()
because it appears that sparse depends on the buggy behavior.

.. bc::

   Working-directory commands now respect ``-X PATTERN`` no matter if PATTERN
   matches explicitly-specified FILEs. For example, ``hg add foo -X foo`` no
   longer add the file ``foo``.
Augie Fackler - Feb. 11, 2018, 1:58 p.m.
On Sun, Feb 11, 2018 at 01:09:50PM +0900, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1516963719 -32400
> #      Fri Jan 26 19:48:39 2018 +0900
> # Node ID ba8846546795996af10fe93db4317b855018d749
> # Parent  91aac8e6604d1aa08b2683c1d4c7d1936f226e48
> dirstate: drop explicitly-specified files that shouldn't match (BC)

Hmm. I'm not sure I like this behavior change. The current behavior is
consistent with how an explicitly-tracked file shows up in status even
if it's matched by .hgignore.

Any particular reason to go this route?

>
> Before, wctx.walk() could include files excluded by -X pattern, which
> disagrees with wctx.matches() and ctx.walk()/matches() behavior. This patch
> fixes the problem by testing stat results against the matcher if the matcher
> may contain false paths.
>
> I have no idea if the fix should be made before the workaround for case-
> insensitive filesystems, but that shouldn't matter since match.anypats()
> means 'not match.isexact()'.
>
> This patch also makes sparse.py to not exclude explicit paths on walk()
> because it appears that sparse depends on the buggy behavior.
>
> .. bc::
>
>    Working-directory commands now respect ``-X PATTERN`` no matter if PATTERN
>    matches explicitly-specified FILEs. For example, ``hg add foo -X foo`` no
>    longer add the file ``foo``.
>
> diff --git a/hgext/sparse.py b/hgext/sparse.py
> --- a/hgext/sparse.py
> +++ b/hgext/sparse.py
> @@ -194,7 +194,11 @@ def _setupdirstate(ui):
>      """
>
>      def walk(orig, self, match, subrepos, unknown, ignored, full=True):
> -        match = matchmod.intersectmatchers(match, self._sparsematcher)
> +        # hack to not exclude explicitly-specified paths so that they can
> +        # be warned later on e.g. dirstate.add()
> +        em = matchmod.exact(match._root, match._cwd, match.files())
> +        sm = matchmod.unionmatcher([self._sparsematcher, em])
> +        match = matchmod.intersectmatchers(match, sm)
>          return orig(self, match, subrepos, unknown, ignored, full)
>
>      extensions.wrapfunction(dirstate.dirstate, 'walk', walk)
> diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
> --- a/mercurial/dirstate.py
> +++ b/mercurial/dirstate.py
> @@ -787,6 +787,17 @@ class dirstate(object):
>                      else:
>                          badfn(ff, encoding.strtolocal(inst.strerror))
>
> +        # match.files() may contain explicitly-specified paths that shouldn't
> +        # be taken; drop them from the list of files found. dirsfound/notfound
> +        # aren't filtered here because they will be tested later.
> +        if match.anypats():
> +            for f in list(results):
> +                if f == '.hg' or f in subrepos:
> +                    # keep sentinel to disable further out-of-repo walks
> +                    continue
> +                if not match(f):
> +                    del results[f]
> +
>          # Case insensitive filesystems cannot rely on lstat() failing to detect
>          # a case-only rename.  Prune the stat object for any file that does not
>          # match the case in the filesystem, if there are multiple files that
> diff --git a/tests/test-add.t b/tests/test-add.t
> --- a/tests/test-add.t
> +++ b/tests/test-add.t
> @@ -146,6 +146,13 @@ Issue683: peculiarity with hg revert of
>    M a
>    ? a.orig
>
> +excluded file shouldn't be added even if it is explicitly specified
> +
> +  $ hg add a.orig -X '*.orig'
> +  $ hg st
> +  M a
> +  ? a.orig
> +
>  Forgotten file can be added back (as either clean or modified)
>
>    $ hg forget b
> diff --git a/tests/test-sparse.t b/tests/test-sparse.t
> --- a/tests/test-sparse.t
> +++ b/tests/test-sparse.t
> @@ -129,6 +129,10 @@ Adding an excluded file should fail
>    (include file with `hg debugsparse --include <pattern>` or use `hg add -s <file>` to include file directory while adding)
>    [255]
>
> +But adding a truly excluded file shouldn't count
> +
> +  $ hg add hide3 -X hide3
> +
>  Verify deleting sparseness while a file has changes fails
>
>    $ hg debugsparse --delete 'show*'
> diff --git a/tests/test-walk.t b/tests/test-walk.t
> --- a/tests/test-walk.t
> +++ b/tests/test-walk.t
> @@ -304,12 +304,10 @@
>    f  beans/turtle    beans/turtle
>    $ hg debugwalk -Xbeans/black beans/black
>    matcher: <differencematcher m1=<patternmatcher patterns='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans\\/black(?:/|$))'>>
> -  f  beans/black  beans/black  exact
>    $ hg debugwalk -Xbeans/black -Ibeans/black
>    matcher: <differencematcher m1=<includematcher includes='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans\\/black(?:/|$))'>>
>    $ hg debugwalk -Xbeans beans/black
>    matcher: <differencematcher m1=<patternmatcher patterns='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans(?:/|$))'>>
> -  f  beans/black  beans/black  exact
>    $ hg debugwalk -Xbeans -Ibeans/black
>    matcher: <differencematcher m1=<includematcher includes='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans(?:/|$))'>>
>    $ hg debugwalk 'glob:mammals/../beans/b*'
> @@ -345,17 +343,13 @@
>    [255]
>
>  Test explicit paths and excludes:
> -(BROKEN: nothing should be included, but wctx.walk() does)
>
>    $ hg debugwalk fennel -X fennel
>    matcher: <differencematcher m1=<patternmatcher patterns='(?:fennel(?:/|$))'>, m2=<includematcher includes='(?:fennel(?:/|$))'>>
> -  f  fennel  fennel  exact
>    $ hg debugwalk fennel -X 'f*'
>    matcher: <differencematcher m1=<patternmatcher patterns='(?:fennel(?:/|$))'>, m2=<includematcher includes='(?:f[^/]*(?:/|$))'>>
> -  f  fennel  fennel  exact
>    $ hg debugwalk beans/black -X 'path:beans'
>    matcher: <differencematcher m1=<patternmatcher patterns='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans(?:/|$))'>>
> -  f  beans/black  beans/black  exact
>    $ hg debugwalk -I 'path:beans/black' -X 'path:beans'
>    matcher: <differencematcher m1=<includematcher includes='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans(?:/|$))'>>
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - Feb. 11, 2018, 2:17 p.m.
On Sun, 11 Feb 2018 08:58:48 -0500, Augie Fackler wrote:
> On Sun, Feb 11, 2018 at 01:09:50PM +0900, Yuya Nishihara wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya@tcha.org>
> > # Date 1516963719 -32400
> > #      Fri Jan 26 19:48:39 2018 +0900
> > # Node ID ba8846546795996af10fe93db4317b855018d749
> > # Parent  91aac8e6604d1aa08b2683c1d4c7d1936f226e48
> > dirstate: drop explicitly-specified files that shouldn't match (BC)
> 
> Hmm. I'm not sure I like this behavior change. The current behavior is
> consistent with how an explicitly-tracked file shows up in status even
> if it's matched by .hgignore.
> 
> Any particular reason to go this route?

The current behavior is inconsistent with other commands, notably log and
files.

  $ hg files hg -X hg

Even more, it's inconsistent with status --change.

  $ hg status --all hg -X hg
  C hg
  $ hg status --all hg -X hg --change .

So I believe it's a bug of dirsate.walk().
Matt Harbison - Feb. 11, 2018, 8:54 p.m.
On Sun, 11 Feb 2018 09:17:42 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sun, 11 Feb 2018 08:58:48 -0500, Augie Fackler wrote:
>> On Sun, Feb 11, 2018 at 01:09:50PM +0900, Yuya Nishihara wrote:
>> > # HG changeset patch
>> > # User Yuya Nishihara <yuya@tcha.org>
>> > # Date 1516963719 -32400
>> > #      Fri Jan 26 19:48:39 2018 +0900
>> > # Node ID ba8846546795996af10fe93db4317b855018d749
>> > # Parent  91aac8e6604d1aa08b2683c1d4c7d1936f226e48
>> > dirstate: drop explicitly-specified files that shouldn't match (BC)
>>
>> Hmm. I'm not sure I like this behavior change. The current behavior is
>> consistent with how an explicitly-tracked file shows up in status even
>> if it's matched by .hgignore.
>>
>> Any particular reason to go this route?
>
> The current behavior is inconsistent with other commands, notably log and
> files.
>
>   $ hg files hg -X hg
>
> Even more, it's inconsistent with status --change.
>
>   $ hg status --all hg -X hg
>   C hg
>   $ hg status --all hg -X hg --change .
>
> So I believe it's a bug of dirsate.walk().

Related: https://bz.mercurial-scm.org/show_bug.cgi?id=4679
via Mercurial-devel - Feb. 13, 2018, 5:28 p.m.
I agree with Yuya (and apparently with former me; thanks for digging that
up, Matt). I think it's just an unintended consequence of how the dirstate
walk works, but I'm not sure. The exception for explicit files also
bothered me when I was working on the matcher code a year or so ago. I
actually added the exception to the matcher code because I thought it was
always working like that (not just for dirstate) in a83a7d27911e (match:
handle excludes using new differencematcher, 2017-05-16). It was only
recently that Yuya realized that it used to be inconsistent and that I
probably made it consistently bad because I didn't realize it was
inconsistent to start with, see 821d8a5ab4ff (match: do not weirdly include
explicit files excluded by -X option, 2018-01-16).

So I guess I just think it feels wrong to include explicit matches. I wish
I could point to something that would actually crash (e.g. narrowhg), but I
can't think of anything.

Also, thanks for working on this, Yuya! I feel a bit guilty for never
getting around to it.

On Sun, Feb 11, 2018 at 12:54 PM Matt Harbison <mharbison72@gmail.com>
wrote:

> On Sun, 11 Feb 2018 09:17:42 -0500, Yuya Nishihara <yuya@tcha.org> wrote:
>
> > On Sun, 11 Feb 2018 08:58:48 -0500, Augie Fackler wrote:
> >> On Sun, Feb 11, 2018 at 01:09:50PM +0900, Yuya Nishihara wrote:
> >> > # HG changeset patch
> >> > # User Yuya Nishihara <yuya@tcha.org>
> >> > # Date 1516963719 -32400
> >> > #      Fri Jan 26 19:48:39 2018 +0900
> >> > # Node ID ba8846546795996af10fe93db4317b855018d749
> >> > # Parent  91aac8e6604d1aa08b2683c1d4c7d1936f226e48
> >> > dirstate: drop explicitly-specified files that shouldn't match (BC)
> >>
> >> Hmm. I'm not sure I like this behavior change. The current behavior is
> >> consistent with how an explicitly-tracked file shows up in status even
> >> if it's matched by .hgignore.
> >>
> >> Any particular reason to go this route?
> >
> > The current behavior is inconsistent with other commands, notably log and
> > files.
> >
> >   $ hg files hg -X hg
> >
> > Even more, it's inconsistent with status --change.
> >
> >   $ hg status --all hg -X hg
> >   C hg
> >   $ hg status --all hg -X hg --change .
> >
> > So I believe it's a bug of dirsate.walk().
>
> Related: https://bz.mercurial-scm.org/show_bug.cgi?id=4679
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Augie Fackler - Feb. 13, 2018, 5:48 p.m.
Queued, per the very helpful explanations. Thanks all!

> On Feb 13, 2018, at 12:28, Martin von Zweigbergk <martinvonz@google.com> wrote:
> 
> I agree with Yuya (and apparently with former me; thanks for digging that up, Matt). I think it's just an unintended consequence of how the dirstate walk works, but I'm not sure. The exception for explicit files also bothered me when I was working on the matcher code a year or so ago. I actually added the exception to the matcher code because I thought it was always working like that (not just for dirstate) in a83a7d27911e (match: handle excludes using new differencematcher, 2017-05-16). It was only recently that Yuya realized that it used to be inconsistent and that I probably made it consistently bad because I didn't realize it was inconsistent to start with, see 821d8a5ab4ff (match: do not weirdly include explicit files excluded by -X option, 2018-01-16).
> 
> So I guess I just think it feels wrong to include explicit matches. I wish I could point to something that would actually crash (e.g. narrowhg), but I can't think of anything.
> 
> Also, thanks for working on this, Yuya! I feel a bit guilty for never getting around to it.
> 
> On Sun, Feb 11, 2018 at 12:54 PM Matt Harbison <mharbison72@gmail.com <mailto:mharbison72@gmail.com>> wrote:
> On Sun, 11 Feb 2018 09:17:42 -0500, Yuya Nishihara <yuya@tcha.org <mailto:yuya@tcha.org>> wrote:
> 
> > On Sun, 11 Feb 2018 08:58:48 -0500, Augie Fackler wrote:
> >> On Sun, Feb 11, 2018 at 01:09:50PM +0900, Yuya Nishihara wrote:
> >> > # HG changeset patch
> >> > # User Yuya Nishihara <yuya@tcha.org <mailto:yuya@tcha.org>>
> >> > # Date 1516963719 -32400
> >> > #      Fri Jan 26 19:48:39 2018 +0900
> >> > # Node ID ba8846546795996af10fe93db4317b855018d749
> >> > # Parent  91aac8e6604d1aa08b2683c1d4c7d1936f226e48
> >> > dirstate: drop explicitly-specified files that shouldn't match (BC)
> >>
> >> Hmm. I'm not sure I like this behavior change. The current behavior is
> >> consistent with how an explicitly-tracked file shows up in status even
> >> if it's matched by .hgignore.
> >>
> >> Any particular reason to go this route?
> >
> > The current behavior is inconsistent with other commands, notably log and
> > files.
> >
> >   $ hg files hg -X hg
> >
> > Even more, it's inconsistent with status --change.
> >
> >   $ hg status --all hg -X hg
> >   C hg
> >   $ hg status --all hg -X hg --change .
> >
> > So I believe it's a bug of dirsate.walk().
> 
> Related: https://bz.mercurial-scm.org/show_bug.cgi?id=4679 <https://bz.mercurial-scm.org/show_bug.cgi?id=4679>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org <mailto:Mercurial-devel@mercurial-scm.org>
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel <https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel>

Patch

diff --git a/hgext/sparse.py b/hgext/sparse.py
--- a/hgext/sparse.py
+++ b/hgext/sparse.py
@@ -194,7 +194,11 @@  def _setupdirstate(ui):
     """
 
     def walk(orig, self, match, subrepos, unknown, ignored, full=True):
-        match = matchmod.intersectmatchers(match, self._sparsematcher)
+        # hack to not exclude explicitly-specified paths so that they can
+        # be warned later on e.g. dirstate.add()
+        em = matchmod.exact(match._root, match._cwd, match.files())
+        sm = matchmod.unionmatcher([self._sparsematcher, em])
+        match = matchmod.intersectmatchers(match, sm)
         return orig(self, match, subrepos, unknown, ignored, full)
 
     extensions.wrapfunction(dirstate.dirstate, 'walk', walk)
diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -787,6 +787,17 @@  class dirstate(object):
                     else:
                         badfn(ff, encoding.strtolocal(inst.strerror))
 
+        # match.files() may contain explicitly-specified paths that shouldn't
+        # be taken; drop them from the list of files found. dirsfound/notfound
+        # aren't filtered here because they will be tested later.
+        if match.anypats():
+            for f in list(results):
+                if f == '.hg' or f in subrepos:
+                    # keep sentinel to disable further out-of-repo walks
+                    continue
+                if not match(f):
+                    del results[f]
+
         # Case insensitive filesystems cannot rely on lstat() failing to detect
         # a case-only rename.  Prune the stat object for any file that does not
         # match the case in the filesystem, if there are multiple files that
diff --git a/tests/test-add.t b/tests/test-add.t
--- a/tests/test-add.t
+++ b/tests/test-add.t
@@ -146,6 +146,13 @@  Issue683: peculiarity with hg revert of 
   M a
   ? a.orig
 
+excluded file shouldn't be added even if it is explicitly specified
+
+  $ hg add a.orig -X '*.orig'
+  $ hg st
+  M a
+  ? a.orig
+
 Forgotten file can be added back (as either clean or modified)
 
   $ hg forget b
diff --git a/tests/test-sparse.t b/tests/test-sparse.t
--- a/tests/test-sparse.t
+++ b/tests/test-sparse.t
@@ -129,6 +129,10 @@  Adding an excluded file should fail
   (include file with `hg debugsparse --include <pattern>` or use `hg add -s <file>` to include file directory while adding)
   [255]
 
+But adding a truly excluded file shouldn't count
+
+  $ hg add hide3 -X hide3
+
 Verify deleting sparseness while a file has changes fails
 
   $ hg debugsparse --delete 'show*'
diff --git a/tests/test-walk.t b/tests/test-walk.t
--- a/tests/test-walk.t
+++ b/tests/test-walk.t
@@ -304,12 +304,10 @@ 
   f  beans/turtle    beans/turtle
   $ hg debugwalk -Xbeans/black beans/black
   matcher: <differencematcher m1=<patternmatcher patterns='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans\\/black(?:/|$))'>>
-  f  beans/black  beans/black  exact
   $ hg debugwalk -Xbeans/black -Ibeans/black
   matcher: <differencematcher m1=<includematcher includes='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans\\/black(?:/|$))'>>
   $ hg debugwalk -Xbeans beans/black
   matcher: <differencematcher m1=<patternmatcher patterns='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans(?:/|$))'>>
-  f  beans/black  beans/black  exact
   $ hg debugwalk -Xbeans -Ibeans/black
   matcher: <differencematcher m1=<includematcher includes='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans(?:/|$))'>>
   $ hg debugwalk 'glob:mammals/../beans/b*'
@@ -345,17 +343,13 @@ 
   [255]
 
 Test explicit paths and excludes:
-(BROKEN: nothing should be included, but wctx.walk() does)
 
   $ hg debugwalk fennel -X fennel
   matcher: <differencematcher m1=<patternmatcher patterns='(?:fennel(?:/|$))'>, m2=<includematcher includes='(?:fennel(?:/|$))'>>
-  f  fennel  fennel  exact
   $ hg debugwalk fennel -X 'f*'
   matcher: <differencematcher m1=<patternmatcher patterns='(?:fennel(?:/|$))'>, m2=<includematcher includes='(?:f[^/]*(?:/|$))'>>
-  f  fennel  fennel  exact
   $ hg debugwalk beans/black -X 'path:beans'
   matcher: <differencematcher m1=<patternmatcher patterns='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans(?:/|$))'>>
-  f  beans/black  beans/black  exact
   $ hg debugwalk -I 'path:beans/black' -X 'path:beans'
   matcher: <differencematcher m1=<includematcher includes='(?:beans\\/black(?:/|$))'>, m2=<includematcher includes='(?:beans(?:/|$))'>>