Patchwork verify: replace _validpath() by matcher

login
register
mail settings
Submitter via Mercurial-devel
Date Feb. 2, 2017, 5:46 p.m.
Message ID <548f852a5d6a2d016e1b.1486057565@martinvonz.mtv.corp.google.com>
Download mbox | patch
Permalink /patch/18299/
State Accepted
Headers show

Comments

via Mercurial-devel - Feb. 2, 2017, 5:46 p.m.
# HG changeset patch
# User Martin von Zweigbergk <martinvonz@google.com>
# Date 1485197335 28800
#      Mon Jan 23 10:48:55 2017 -0800
# Branch stable
# Node ID 548f852a5d6a2d016e1b1002168463437a5bcb95
# Parent  2d6b86cadc1039f7cf72b64c29399bebe02e7dd3
verify: replace _validpath() by matcher

The verifier calls out to _validpath() to check if it should verify
that path and the narrowhg extension overrides _validpath() to tell
the verifier to skip that path. In treemanifest repos, the verifier
calls the same method to check if it should visit a
directory. However, the decision to visit a directory is different
from the condition that it's a matching path, and narrowhg was working
around it by returning True from its _validpath() override if *either*
was true.

Similar to how one can do "hg files -I foo/bar/ -X foo/" (making the
include pointless), narrowhg can be configured to track the same
paths. In that case match("foo/bar/baz") would be false, but
match.visitdir("foo/bar/baz") turns out to be true, causing verify to
fail. This may seem like a bug in visitdir(), but it's explicitly
documented to be undefined for subdirectories of excluded
directories. When using treemanifests, the walk would not descend into
foo/, so verification would pass. However, when using flat manifests,
there is no recursive directory walk and the file path "foo/bar/baz"
would be passed to _validpath() without "foo/" (actually without the
slash) being passed first. As explained above, _validpath() would
return true for the file path and "hg verify" would fail.

Replacing the _validpath() method by a matcher seems like the obvious
fix. Narrowhg can then pass in its own matcher and not have to
conflate the two matching functions (for dirs and files). I think it
also makes the code clearer.
Yuya Nishihara - Feb. 3, 2017, 1:38 p.m.
On Thu, 02 Feb 2017 09:46:05 -0800, Martin von Zweigbergk via Mercurial-devel wrote:
> # HG changeset patch
> # User Martin von Zweigbergk <martinvonz@google.com>
> # Date 1485197335 28800
> #      Mon Jan 23 10:48:55 2017 -0800
> # Branch stable
> # Node ID 548f852a5d6a2d016e1b1002168463437a5bcb95
> # Parent  2d6b86cadc1039f7cf72b64c29399bebe02e7dd3
> verify: replace _validpath() by matcher

> Replacing the _validpath() method by a matcher seems like the obvious
> fix. Narrowhg can then pass in its own matcher and not have to
> conflate the two matching functions (for dirs and files). I think it
> also makes the code clearer.

Makes sense. Queued, thanks.

Patch

diff -r 2d6b86cadc10 -r 548f852a5d6a mercurial/verify.py
--- a/mercurial/verify.py	Mon Jan 30 18:03:17 2017 -0500
+++ b/mercurial/verify.py	Mon Jan 23 10:48:55 2017 -0800
@@ -18,6 +18,7 @@ 
 from . import (
     error,
     revlog,
+    scmutil,
     util,
 )
 
@@ -32,21 +33,13 @@ 
         f = f.replace('//', '/')
     return f
 
-def _validpath(repo, path):
-    """Returns False if a path should NOT be treated as part of a repo.
-
-    For all in-core cases, this returns True, as we have no way for a
-    path to be mentioned in the history but not actually be
-    relevant. For narrow clones, this is important because many
-    filelogs will be missing, and changelog entries may mention
-    modified files that are outside the narrow scope.
-    """
-    return True
-
 class verifier(object):
-    def __init__(self, repo):
+    # The match argument is always None in hg core, but e.g. the narrowhg
+    # extension will pass in a matcher here.
+    def __init__(self, repo, match=None):
         self.repo = repo.unfiltered()
         self.ui = repo.ui
+        self.match = match or scmutil.matchall(repo)
         self.badrevs = set()
         self.errors = 0
         self.warnings = 0
@@ -170,6 +163,7 @@ 
     def _verifychangelog(self):
         ui = self.ui
         repo = self.repo
+        match = self.match
         cl = repo.changelog
 
         ui.status(_("checking changesets\n"))
@@ -189,7 +183,7 @@ 
                     mflinkrevs.setdefault(changes[0], []).append(i)
                     self.refersmf = True
                 for f in changes[3]:
-                    if _validpath(repo, f):
+                    if match(f):
                         filelinkrevs.setdefault(_normpath(f), []).append(i)
             except Exception as inst:
                 self.refersmf = True
@@ -201,6 +195,7 @@ 
                         progress=None):
         repo = self.repo
         ui = self.ui
+        match = self.match
         mfl = self.repo.manifestlog
         mf = mfl._revlog.dirlog(dir)
 
@@ -243,12 +238,14 @@ 
                     elif f == "/dev/null":  # ignore this in very old repos
                         continue
                     fullpath = dir + _normpath(f)
-                    if not _validpath(repo, fullpath):
-                        continue
                     if fl == 't':
+                        if not match.visitdir(fullpath):
+                            continue
                         subdirnodes.setdefault(fullpath + '/', {}).setdefault(
                             fn, []).append(lr)
                     else:
+                        if not match(fullpath):
+                            continue
                         filenodes.setdefault(fullpath, {}).setdefault(fn, lr)
             except Exception as inst:
                 self.exc(lr, _("reading delta %s") % short(n), inst, label)