Patchwork D7311: utils: move the `dirs` definition in a dedicated module (API)

login
register
mail settings
Submitter phabricator
Date Nov. 8, 2019, 9:32 a.m.
Message ID <differential-rev-PHID-DREV-ta3ow7li2s2lgcov45sr-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/42919/
State Superseded
Headers show

Comments

phabricator - Nov. 8, 2019, 9:32 a.m.
marmoute created this revision.
Herald added a reviewer: durin42.
Herald added a reviewer: martinvonz.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Before this change, the `dirs` class was accessible through the `mercurial.util`
  module. That module is expected to stay free of scm specific content.
  
  This work is part of a refactoring to unify the revlog index and the nodemap.
  This unification prepare the use of a persistent nodemap.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

AFFECTED FILES
  hgext/narrow/narrowcommands.py
  hgext/uncommit.py
  mercurial/cmdutil.py
  mercurial/dirstate.py
  mercurial/manifest.py
  mercurial/match.py
  mercurial/repair.py
  mercurial/util.py
  mercurial/utils/dirstateutil.py

CHANGE DETAILS




To: marmoute, durin42, martinvonz, #hg-reviewers
Cc: mercurial-devel
phabricator - Nov. 8, 2019, 1:11 p.m.
durin42 added a comment.


  Can you fix up the fuzzer in contrib/fuzz/dirstate.cc?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare
Cc: mercurial-devel
phabricator - Nov. 8, 2019, 3:08 p.m.
Alphare added a comment.


  Note that there is a comment in `match.py` about `util.dirs`. It's not that important, but it should be updated.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare
Cc: mercurial-devel
phabricator - Nov. 8, 2019, 3:18 p.m.
marmoute added a comment.


  In D7311#107111 <https://phab.mercurial-scm.org/D7311#107111>, @durin42 wrote:
  
  > Can you fix up the fuzzer in contrib/fuzz/dirstate.cc?
  
  I do not understand what need fixing there. Are you talking about D7314 <https://phab.mercurial-scm.org/D7314> ?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare
Cc: mercurial-devel
phabricator - Nov. 8, 2019, 3:21 p.m.
durin42 added a comment.


  In D7311#107177 <https://phab.mercurial-scm.org/D7311#107177>, @marmoute wrote:
  
  > In D7311#107111 <https://phab.mercurial-scm.org/D7311#107111>, @durin42 wrote:
  >
  >> Can you fix up the fuzzer in contrib/fuzz/dirstate.cc?
  >
  > I do not understand what need fixing there. Are you talking about D7314 <https://phab.mercurial-scm.org/D7314> ?
  
  I was talking about this patch. https://www.mercurial-scm.org/repo/hg/file/tip/contrib/fuzz/dirs.cc#l18 is what I was talking about: some of our fuzzers actually contain a tiny Python script that gets evaluated to drive the fuzzing.
  
  It turns out in this case it's not a problem. :)

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare
Cc: mercurial-devel
phabricator - Nov. 8, 2019, 3:22 p.m.
Alphare added inline comments.

INLINE COMMENTS

> match.py:632
>  
>  # This is basically a reimplementation of util.dirs that stores the children
>  # instead of just a count of them, plus a small optional optimization to avoid

Should be `dirstateutil.dirs`

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare
Cc: mercurial-devel
phabricator - Nov. 8, 2019, 7:05 p.m.
martinvonz added inline comments.

INLINE COMMENTS

> dirstateutil.py:14
> +
> +class dirs(object):
> +    '''a multiset of directory names from a dirstate or manifest'''

Is `pathutil.py` an appropriate home for this instead of creating a new file?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare
Cc: mercurial-devel
phabricator - Nov. 8, 2019, 7:23 p.m.
This revision now requires changes to proceed.
indygreg added a comment.
indygreg requested changes to this revision.


  I agree with Martin that we should put this in `pathutils`.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare, indygreg
Cc: indygreg, mercurial-devel
phabricator - Nov. 9, 2019, 12:19 a.m.
marmoute added inline comments.

INLINE COMMENTS

> martinvonz wrote in dirstateutil.py:14
> Is `pathutil.py` an appropriate home for this instead of creating a new file?

I feel like `pathutil.py` is mostly scm agnostic and I would rather keep it this way.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare, indygreg
Cc: indygreg, mercurial-devel
phabricator - Nov. 9, 2019, 12:22 a.m.
martinvonz added inline comments.

INLINE COMMENTS

> marmoute wrote in dirstateutil.py:14
> I feel like `pathutil.py` is mostly scm agnostic and I would rather keep it this way.

But how is `dirs` not scm-agnostic? Oh, you mean the ugly `skip` thing? Can we see if we can just remove that instead (move it outside of the class)?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare, indygreg
Cc: indygreg, mercurial-devel
phabricator - Nov. 9, 2019, 12:39 a.m.
marmoute added inline comments.

INLINE COMMENTS

> martinvonz wrote in dirstateutil.py:14
> But how is `dirs` not scm-agnostic? Oh, you mean the ugly `skip` thing? Can we see if we can just remove that instead (move it outside of the class)?

It explicitly mention dirstate and manifest. Both are scm related context.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare, indygreg
Cc: indygreg, mercurial-devel
phabricator - Nov. 9, 2019, 12:41 a.m.
martinvonz added inline comments.

INLINE COMMENTS

> marmoute wrote in dirstateutil.py:14
> It explicitly mention dirstate and manifest. Both are scm related context.

That's easy to fix, though. Just replace "dirstate or manifest" by "set of file paths".

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare, indygreg
Cc: indygreg, mercurial-devel
phabricator - Nov. 9, 2019, 1:32 a.m.
martinvonz added inline comments.

INLINE COMMENTS

> util.py:3494-3499
>  def finddirs(path):
>      pos = path.rfind(b'/')
>      while pos != -1:
>          yield path[:pos]
>          pos = path.rfind(b'/', 0, pos)
>      yield b''

Could you follow up with a patch to move this one as well?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare, indygreg
Cc: indygreg, mercurial-devel
phabricator - Nov. 9, 2019, 1:36 a.m.
martinvonz added a comment.


  FYI, I had to resolve some conflicts with Augie's recent patches for removing r'' prefixes and for consecutive slashes. I'm pretty sure I did it right, but take a look if you want.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7311/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7311

To: marmoute, durin42, martinvonz, #hg-reviewers, Alphare, indygreg
Cc: indygreg, mercurial-devel

Patch

diff --git a/mercurial/utils/dirstateutil.py b/mercurial/utils/dirstateutil.py
new file mode 100644
--- /dev/null
+++ b/mercurial/utils/dirstateutil.py
@@ -0,0 +1,59 @@ 
+from __future__ import absolute_import
+
+from .. import (
+    error,
+    policy,
+    pycompat,
+    util,
+)
+
+rustdirs = policy.importrust(r'dirstate', r'Dirs')
+parsers = policy.importmod(r'parsers')
+
+
+class dirs(object):
+    '''a multiset of directory names from a dirstate or manifest'''
+
+    def __init__(self, map, skip=None):
+        self._dirs = {}
+        addpath = self.addpath
+        if isinstance(map, dict) and skip is not None:
+            for f, s in pycompat.iteritems(map):
+                if s[0] != skip:
+                    addpath(f)
+        elif skip is not None:
+            raise error.ProgrammingError(
+                b"skip character is only supported with a dict source"
+            )
+        else:
+            for f in map:
+                addpath(f)
+
+    def addpath(self, path):
+        dirs = self._dirs
+        for base in util.finddirs(path):
+            if base in dirs:
+                dirs[base] += 1
+                return
+            dirs[base] = 1
+
+    def delpath(self, path):
+        dirs = self._dirs
+        for base in util.finddirs(path):
+            if dirs[base] > 1:
+                dirs[base] -= 1
+                return
+            del dirs[base]
+
+    def __iter__(self):
+        return iter(self._dirs)
+
+    def __contains__(self, d):
+        return d in self._dirs
+
+
+if util.safehasattr(parsers, 'dirs'):
+    dirs = parsers.dirs
+
+if rustdirs is not None:
+    dirs = rustdirs
diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -57,11 +57,8 @@ 
     stringutil,
 )
 
-rustdirs = policy.importrust(r'dirstate', r'Dirs')
-
 base85 = policy.importmod(r'base85')
 osutil = policy.importmod(r'osutil')
-parsers = policy.importmod(r'parsers')
 
 b85decode = base85.b85decode
 b85encode = base85.b85encode
@@ -3494,54 +3491,6 @@ 
     f.flush()
 
 
-class dirs(object):
-    '''a multiset of directory names from a dirstate or manifest'''
-
-    def __init__(self, map, skip=None):
-        self._dirs = {}
-        addpath = self.addpath
-        if isinstance(map, dict) and skip is not None:
-            for f, s in pycompat.iteritems(map):
-                if s[0] != skip:
-                    addpath(f)
-        elif skip is not None:
-            raise error.ProgrammingError(
-                b"skip character is only supported with a dict source"
-            )
-        else:
-            for f in map:
-                addpath(f)
-
-    def addpath(self, path):
-        dirs = self._dirs
-        for base in finddirs(path):
-            if base in dirs:
-                dirs[base] += 1
-                return
-            dirs[base] = 1
-
-    def delpath(self, path):
-        dirs = self._dirs
-        for base in finddirs(path):
-            if dirs[base] > 1:
-                dirs[base] -= 1
-                return
-            del dirs[base]
-
-    def __iter__(self):
-        return iter(self._dirs)
-
-    def __contains__(self, d):
-        return d in self._dirs
-
-
-if safehasattr(parsers, 'dirs'):
-    dirs = parsers.dirs
-
-if rustdirs is not None:
-    dirs = rustdirs
-
-
 def finddirs(path):
     pos = path.rfind(b'/')
     while pos != -1:
diff --git a/mercurial/repair.py b/mercurial/repair.py
--- a/mercurial/repair.py
+++ b/mercurial/repair.py
@@ -28,7 +28,10 @@ 
     pycompat,
     util,
 )
-from .utils import stringutil
+from .utils import (
+    dirstateutil,
+    stringutil,
+)
 
 
 def backupbundle(
@@ -476,7 +479,7 @@ 
         if b'treemanifest' in repo.requirements:
             # This logic is safe if treemanifest isn't enabled, but also
             # pointless, so we skip it if treemanifest isn't enabled.
-            for dir in util.dirs(seenfiles):
+            for dir in dirstateutil.dirs(seenfiles):
                 i = b'meta/%s/00manifest.i' % dir
                 d = b'meta/%s/00manifest.d' % dir
 
diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -22,7 +22,10 @@ 
     pycompat,
     util,
 )
-from .utils import stringutil
+from .utils import (
+    dirstateutil,
+    stringutil,
+)
 
 rustmod = policy.importrust(r'filepatterns')
 
@@ -595,7 +598,7 @@ 
 
     @propertycache
     def _dirs(self):
-        return set(util.dirs(self._fileset))
+        return set(dirstateutil.dirs(self._fileset))
 
     def visitdir(self, dir):
         dir = normalizerootdir(dir, b'visitdir')
@@ -760,7 +763,7 @@ 
 
     @propertycache
     def _dirs(self):
-        return set(util.dirs(self._fileset))
+        return set(dirstateutil.dirs(self._fileset))
 
     def visitdir(self, dir):
         dir = normalizerootdir(dir, b'visitdir')
@@ -1507,8 +1510,8 @@ 
     p = set()
     # Add the parents as non-recursive/exact directories, since they must be
     # scanned to get to either the roots or the other exact directories.
-    p.update(util.dirs(d))
-    p.update(util.dirs(r))
+    p.update(dirstateutil.dirs(d))
+    p.update(dirstateutil.dirs(r))
 
     # FIXME: all uses of this function convert these to sets, do so before
     # returning.
diff --git a/mercurial/manifest.py b/mercurial/manifest.py
--- a/mercurial/manifest.py
+++ b/mercurial/manifest.py
@@ -32,6 +32,7 @@ 
     repository,
     util as interfaceutil,
 )
+from .utils import dirstateutil
 
 parsers = policy.importmod(r'parsers')
 propertycache = util.propertycache
@@ -494,7 +495,7 @@ 
 
     @propertycache
     def _dirs(self):
-        return util.dirs(self)
+        return dirstateutil.dirs(self)
 
     def dirs(self):
         return self._dirs
@@ -1104,7 +1105,7 @@ 
 
     @propertycache
     def _alldirs(self):
-        return util.dirs(self)
+        return dirstateutil.dirs(self)
 
     def dirs(self):
         return self._alldirs
diff --git a/mercurial/dirstate.py b/mercurial/dirstate.py
--- a/mercurial/dirstate.py
+++ b/mercurial/dirstate.py
@@ -36,6 +36,8 @@ 
     util as interfaceutil,
 )
 
+from .utils import dirstateutil
+
 parsers = policy.importmod(r'parsers')
 rustmod = policy.importrust(r'dirstate')
 
@@ -1504,11 +1506,11 @@ 
 
     @propertycache
     def _dirs(self):
-        return util.dirs(self._map, b'r')
+        return dirstateutil.dirs(self._map, b'r')
 
     @propertycache
     def _alldirs(self):
-        return util.dirs(self._map)
+        return dirstateutil.dirs(self._map)
 
     def _opendirstatefile(self):
         fp, mode = txnutil.trypending(self._root, self._opener, self._filename)
diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -58,6 +58,7 @@ 
 
 from .utils import (
     dateutil,
+    dirstateutil,
     stringutil,
 )
 
@@ -2606,7 +2607,7 @@ 
     progress.complete()
 
     # warn about failure to delete explicit files/dirs
-    deleteddirs = util.dirs(deleted)
+    deleteddirs = dirstateutil.dirs(deleted)
     files = m.files()
     progress = ui.makeprogress(
         _(b'deleting'), total=len(files), unit=_(b'files')
diff --git a/hgext/uncommit.py b/hgext/uncommit.py
--- a/hgext/uncommit.py
+++ b/hgext/uncommit.py
@@ -33,8 +33,8 @@ 
     registrar,
     rewriteutil,
     scmutil,
-    util,
 )
+from mercurial.utils import dirstateutil
 
 cmdtable = {}
 command = registrar.command(cmdtable)
@@ -185,7 +185,7 @@ 
             # if not everything tracked in that directory can be
             # uncommitted.
             if badfiles:
-                badfiles -= {f for f in util.dirs(eligible)}
+                badfiles -= {f for f in dirstateutil.dirs(eligible)}
 
             for f in sorted(badfiles):
                 if f in s.clean:
diff --git a/hgext/narrow/narrowcommands.py b/hgext/narrow/narrowcommands.py
--- a/hgext/narrow/narrowcommands.py
+++ b/hgext/narrow/narrowcommands.py
@@ -31,6 +31,7 @@ 
     wireprototypes,
 )
 from mercurial.interfaces import repository
+from mercurial.utils import dirstateutil
 
 table = {}
 command = registrar.command(table)
@@ -277,7 +278,7 @@ 
                     todelete.append(f)
             elif f.startswith(b'meta/'):
                 dir = f[5:-13]
-                dirs = sorted(util.dirs({dir})) + [dir]
+                dirs = sorted(dirstateutil.dirs({dir})) + [dir]
                 include = True
                 for d in dirs:
                     visit = newmatch.visitdir(d)