Patchwork [03,of,17] match: handle excludes using new differencematcher

login
register
mail settings
Submitter via Mercurial-devel
Date May 25, 2017, 6:24 p.m.
Message ID <dc1a97dee1b5de53dda2.1495736684@martinvonz.svl.corp.google.com>
Download mbox | patch
Permalink /patch/20905/
State Accepted
Headers show

Comments

via Mercurial-devel - May 25, 2017, 6:24 p.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@google.com>
# Date 1494977808 25200
#      Tue May 16 16:36:48 2017 -0700
# Node ID dc1a97dee1b5de53dda25285ffd7ebcc16105549
# Parent  8446a121f00de0575739e5285af58d564119a052
match: handle excludes using new differencematcher

As I've said on earlier patches, I'm hoping to use more composition of
simpler matchers instead of the single complex matcher we currently
have. This extracts a first new matcher that composes two other
matchers. It matches if the first matcher matches but the second does
not. As such, we can use it for excludes, which this patch also
does. We'll remove the now-unncessary code for excludes in the next
patch.
Yuya Nishihara - May 26, 2017, 3:10 p.m.
On Thu, 25 May 2017 11:24:44 -0700, Martin von Zweigbergk via Mercurial-devel wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1494977808 25200
> #      Tue May 16 16:36:48 2017 -0700
> # Node ID dc1a97dee1b5de53dda25285ffd7ebcc16105549
> # Parent  8446a121f00de0575739e5285af58d564119a052
> match: handle excludes using new differencematcher

> +class differencematcher(basematcher):
> +    '''Composes two matchers by matching if the first matches and the second
> +    does not. Well, almost... If the user provides a pattern like "-X foo foo",
> +    Mercurial actually does match "foo" against that. That's because exact
> +    matches are treated specially. So, since this differencematcher is used for
> +    excludes, it needs to special-case exact matching.

Uh.

> +    def isexact(self):
> +        return self._m1.isexact()

I don't know if new behavior is correct, but before, isexact() would be False
if exclude patterns were specified. match.isexact is defined as follows:

    def isexact(self):
        return self.matchfn == self.exact

and matchfn would be a composition of matchfns[].

That said, the overall change looks great and no tests fail, so queued the
first 4 patches. Thanks.
via Mercurial-devel - May 26, 2017, 3:22 p.m.
On Fri, May 26, 2017 at 8:10 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Thu, 25 May 2017 11:24:44 -0700, Martin von Zweigbergk via Mercurial-devel wrote:
>> # HG changeset patch
>> # User Martin von Zweigbergk <martinvonz@google.com>
>> # Date 1494977808 25200
>> #      Tue May 16 16:36:48 2017 -0700
>> # Node ID dc1a97dee1b5de53dda25285ffd7ebcc16105549
>> # Parent  8446a121f00de0575739e5285af58d564119a052
>> match: handle excludes using new differencematcher
>
>> +class differencematcher(basematcher):
>> +    '''Composes two matchers by matching if the first matches and the second
>> +    does not. Well, almost... If the user provides a pattern like "-X foo foo",
>> +    Mercurial actually does match "foo" against that. That's because exact
>> +    matches are treated specially. So, since this differencematcher is used for
>> +    excludes, it needs to special-case exact matching.
>
> Uh.
>
>> +    def isexact(self):
>> +        return self._m1.isexact()
>
> I don't know if new behavior is correct, but before, isexact() would be False
> if exclude patterns were specified. match.isexact is defined as follows:
>
>     def isexact(self):
>         return self.matchfn == self.exact
>
> and matchfn would be a composition of matchfns[].

Ah, good catch. That was actually an intentional change I made that
wasn't really necessary, but just an optimization. Sorry, I should
have mentioned that. isexact() is used in various places to allow fast
paths. So by saying that an exact matcher minus something is still
exact allows for those fast paths. However, I don't think exact
matchers ever have excludes in hg core (despite the comment somewhere
in match.py saying that excludes and includes still apply), so this is
probably only relevant for extensions.

>
> That said, the overall change looks great and no tests fail, so queued the
> first 4 patches. Thanks.
Yuya Nishihara - May 26, 2017, 3:34 p.m.
On Fri, 26 May 2017 08:22:10 -0700, Martin von Zweigbergk wrote:
> On Fri, May 26, 2017 at 8:10 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Thu, 25 May 2017 11:24:44 -0700, Martin von Zweigbergk via Mercurial-devel wrote:
> >> # HG changeset patch
> >> # User Martin von Zweigbergk <martinvonz@google.com>
> >> # Date 1494977808 25200
> >> #      Tue May 16 16:36:48 2017 -0700
> >> # Node ID dc1a97dee1b5de53dda25285ffd7ebcc16105549
> >> # Parent  8446a121f00de0575739e5285af58d564119a052
> >> match: handle excludes using new differencematcher
> >
> >> +class differencematcher(basematcher):
> >> +    '''Composes two matchers by matching if the first matches and the second
> >> +    does not. Well, almost... If the user provides a pattern like "-X foo foo",
> >> +    Mercurial actually does match "foo" against that. That's because exact
> >> +    matches are treated specially. So, since this differencematcher is used for
> >> +    excludes, it needs to special-case exact matching.
> >
> > Uh.
> >
> >> +    def isexact(self):
> >> +        return self._m1.isexact()
> >
> > I don't know if new behavior is correct, but before, isexact() would be False
> > if exclude patterns were specified. match.isexact is defined as follows:
> >
> >     def isexact(self):
> >         return self.matchfn == self.exact
> >
> > and matchfn would be a composition of matchfns[].
> 
> Ah, good catch. That was actually an intentional change I made that
> wasn't really necessary, but just an optimization. Sorry, I should
> have mentioned that. isexact() is used in various places to allow fast
> paths. So by saying that an exact matcher minus something is still
> exact allows for those fast paths. However, I don't think exact
> matchers ever have excludes in hg core (despite the comment somewhere
> in match.py saying that excludes and includes still apply), so this is
> probably only relevant for extensions.

Got it, thanks. And that's why _files are filtered if exact. I really like
to see the matcher is getting saner.

Patch

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -142,10 +142,15 @@ 
                 kindpats.append((kind, pats, source))
             return kindpats
 
-    return matcher(root, cwd, normalize, patterns, include=include,
-                   exclude=exclude, default=default, exact=exact,
-                   auditor=auditor, ctx=ctx, listsubrepos=listsubrepos,
-                   warn=warn, badfn=badfn)
+    m = matcher(root, cwd, normalize, patterns, include=include, exclude=None,
+                default=default, exact=exact, auditor=auditor, ctx=ctx,
+                listsubrepos=listsubrepos, warn=warn, badfn=badfn)
+    if exclude:
+        em = matcher(root, cwd, normalize, [], include=exclude, exclude=None,
+                     default=default, exact=False, auditor=auditor, ctx=ctx,
+                     listsubrepos=listsubrepos, warn=warn, badfn=None)
+        m = differencematcher(m, em)
+    return m
 
 def exact(root, cwd, files, badfn=None):
     return match(root, cwd, files, exact=True, badfn=badfn)
@@ -418,6 +423,62 @@ 
                 (self._files, self.patternspat, self.includepat,
                  self.excludepat))
 
+class differencematcher(basematcher):
+    '''Composes two matchers by matching if the first matches and the second
+    does not. Well, almost... If the user provides a pattern like "-X foo foo",
+    Mercurial actually does match "foo" against that. That's because exact
+    matches are treated specially. So, since this differencematcher is used for
+    excludes, it needs to special-case exact matching.
+
+    The second matcher's non-matching-attributes (root, cwd, bad, explicitdir,
+    traversedir) are ignored.
+
+    TODO: If we want to keep the behavior described above for exact matches, we
+    should consider instead treating the above case something like this:
+    union(exact(foo), difference(pattern(foo), include(foo)))
+    '''
+    def __init__(self, m1, m2):
+        super(differencematcher, self).__init__(m1._root, m1._cwd)
+        self._m1 = m1
+        self._m2 = m2
+        self.bad = m1.bad
+        self.explicitdir = m1.explicitdir
+        self.traversedir = m1.traversedir
+
+    def matchfn(self, f):
+        return self._m1(f) and (not self._m2(f) or self._m1.exact(f))
+
+    @propertycache
+    def _files(self):
+        if self.isexact():
+            return [f for f in self._m1.files() if self(f)]
+        # If m1 is not an exact matcher, we can't easily figure out the set of
+        # files, because its files() are not always files. For example, if
+        # m1 is "path:dir" and m2 is "rootfileins:.", we don't
+        # want to remove "dir" from the set even though it would match m2,
+        # because the "dir" in m1 may not be a file.
+        return self._m1.files()
+
+    def visitdir(self, dir):
+        if self._m2.visitdir(dir) == 'all':
+            # There's a bug here: If m1 matches file 'dir/file' and m2 excludes
+            # 'dir' (recursively), we should still visit 'dir' due to the
+            # exception we have for exact matches.
+            return False
+        return bool(self._m1.visitdir(dir))
+
+    def isexact(self):
+        return self._m1.isexact()
+
+    def anypats(self):
+        return self._m1.anypats() or self._m2.anypats()
+
+    def prefix(self):
+        return not self.always() and not self.isexact() and not self.anypats()
+
+    def __repr__(self):
+        return ('<differencematcher m1=%r, m2=%r>' % (self._m1, self._m2))
+
 class subdirmatcher(basematcher):
     """Adapt a matcher to work on a subdirectory only.
 
diff --git a/tests/test-walk.t b/tests/test-walk.t
--- a/tests/test-walk.t
+++ b/tests/test-walk.t
@@ -76,7 +76,7 @@ 
   f  mammals/Procyonidae/raccoon     Procyonidae/raccoon
   f  mammals/skunk                   skunk
   $ hg debugwalk -X ../beans
-  matcher: <matcher files=[], patterns=None, includes=None, excludes='(?:beans(?:/|$))'>
+  matcher: <differencematcher m1=<matcher files=[], patterns=None, includes=None, excludes=None>, m2=<matcher files=[], patterns=None, includes='(?:beans(?:/|$))', excludes=None>>
   f  fennel                          ../fennel
   f  fenugreek                       ../fenugreek
   f  fiddlehead                      ../fiddlehead
@@ -146,7 +146,7 @@ 
   f  fenugreek   ../fenugreek
   f  fiddlehead  ../fiddlehead
   $ hg debugwalk -X 'rootfilesin:'
-  matcher: <matcher files=[], patterns=None, includes=None, excludes='(?:^[^/]+$)'>
+  matcher: <differencematcher m1=<matcher files=[], patterns=None, includes=None, excludes=None>, m2=<matcher files=[], patterns=None, includes='(?:^[^/]+$)', excludes=None>>
   f  beans/black                     ../beans/black
   f  beans/borlotti                  ../beans/borlotti
   f  beans/kidney                    ../beans/kidney
@@ -194,7 +194,7 @@ 
   matcher: <matcher files=[], patterns=None, includes='(?:^mammals/[^/]+$)', excludes=None>
   f  mammals/skunk  skunk
   $ hg debugwalk -X 'rootfilesin:mammals'
-  matcher: <matcher files=[], patterns=None, includes=None, excludes='(?:^mammals/[^/]+$)'>
+  matcher: <differencematcher m1=<matcher files=[], patterns=None, includes=None, excludes=None>, m2=<matcher files=[], patterns=None, includes='(?:^mammals/[^/]+$)', excludes=None>>
   f  beans/black                     ../beans/black
   f  beans/borlotti                  ../beans/borlotti
   f  beans/kidney                    ../beans/kidney
@@ -289,35 +289,35 @@ 
   matcher: <matcher files=['beans'], patterns='(?:beans(?:/|$))', includes='(?:beans\\/black(?:/|$))', excludes=None>
   f  beans/black  beans/black
   $ hg debugwalk -Xbeans/black beans
-  matcher: <matcher files=['beans'], patterns='(?:beans(?:/|$))', includes=None, excludes='(?:beans\\/black(?:/|$))'>
+  matcher: <differencematcher m1=<matcher files=['beans'], patterns='(?:beans(?:/|$))', includes=None, excludes=None>, m2=<matcher files=[], patterns=None, includes='(?:beans\\/black(?:/|$))', excludes=None>>
   f  beans/borlotti  beans/borlotti
   f  beans/kidney    beans/kidney
   f  beans/navy      beans/navy
   f  beans/pinto     beans/pinto
   f  beans/turtle    beans/turtle
   $ hg debugwalk -Xbeans/black -Ibeans
-  matcher: <matcher files=[], patterns=None, includes='(?:beans(?:/|$))', excludes='(?:beans\\/black(?:/|$))'>
+  matcher: <differencematcher m1=<matcher files=[], patterns=None, includes='(?:beans(?:/|$))', excludes=None>, m2=<matcher files=[], patterns=None, includes='(?:beans\\/black(?:/|$))', excludes=None>>
   f  beans/borlotti  beans/borlotti
   f  beans/kidney    beans/kidney
   f  beans/navy      beans/navy
   f  beans/pinto     beans/pinto
   f  beans/turtle    beans/turtle
   $ hg debugwalk -Xbeans/black beans/black
-  matcher: <matcher files=['beans/black'], patterns='(?:beans\\/black(?:/|$))', includes=None, excludes='(?:beans\\/black(?:/|$))'>
+  matcher: <differencematcher m1=<matcher files=['beans/black'], patterns='(?:beans\\/black(?:/|$))', includes=None, excludes=None>, m2=<matcher files=[], patterns=None, includes='(?:beans\\/black(?:/|$))', excludes=None>>
   f  beans/black  beans/black  exact
   $ hg debugwalk -Xbeans/black -Ibeans/black
-  matcher: <matcher files=[], patterns=None, includes='(?:beans\\/black(?:/|$))', excludes='(?:beans\\/black(?:/|$))'>
+  matcher: <differencematcher m1=<matcher files=[], patterns=None, includes='(?:beans\\/black(?:/|$))', excludes=None>, m2=<matcher files=[], patterns=None, includes='(?:beans\\/black(?:/|$))', excludes=None>>
   $ hg debugwalk -Xbeans beans/black
-  matcher: <matcher files=['beans/black'], patterns='(?:beans\\/black(?:/|$))', includes=None, excludes='(?:beans(?:/|$))'>
+  matcher: <differencematcher m1=<matcher files=['beans/black'], patterns='(?:beans\\/black(?:/|$))', includes=None, excludes=None>, m2=<matcher files=[], patterns=None, includes='(?:beans(?:/|$))', excludes=None>>
   f  beans/black  beans/black  exact
   $ hg debugwalk -Xbeans -Ibeans/black
-  matcher: <matcher files=[], patterns=None, includes='(?:beans\\/black(?:/|$))', excludes='(?:beans(?:/|$))'>
+  matcher: <differencematcher m1=<matcher files=[], patterns=None, includes='(?:beans\\/black(?:/|$))', excludes=None>, m2=<matcher files=[], patterns=None, includes='(?:beans(?:/|$))', excludes=None>>
   $ hg debugwalk 'glob:mammals/../beans/b*'
   matcher: <matcher files=['beans'], patterns='(?:beans\\/b[^/]*$)', includes=None, excludes=None>
   f  beans/black     beans/black
   f  beans/borlotti  beans/borlotti
   $ hg debugwalk '-X*/Procyonidae' mammals
-  matcher: <matcher files=['mammals'], patterns='(?:mammals(?:/|$))', includes=None, excludes='(?:[^/]*\\/Procyonidae(?:/|$))'>
+  matcher: <differencematcher m1=<matcher files=['mammals'], patterns='(?:mammals(?:/|$))', includes=None, excludes=None>, m2=<matcher files=[], patterns=None, includes='(?:[^/]*\\/Procyonidae(?:/|$))', excludes=None>>
   f  mammals/skunk  mammals/skunk
   $ hg debugwalk path:mammals
   matcher: <matcher files=['mammals'], patterns='(?:^mammals(?:/|$))', includes=None, excludes=None>