Patchwork [2,of,3,VFS] vfs: add walk

login
register
mail settings
Submitter Katsunori FUJIWARA
Date April 11, 2015, 2:17 p.m.
Message ID <e848285bccb2733268ce.1428761866@juju>
Download mbox | patch
Permalink /patch/8616/
State Accepted
Headers show

Comments

Katsunori FUJIWARA - April 11, 2015, 2:17 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1428760804 -32400
#      Sat Apr 11 23:00:04 2015 +0900
# Node ID e848285bccb2733268cea52ff5ba94705f8780e5
# Parent  2e5a665f2c4fc7b26b69af765337d45bc9ad4e07
vfs: add walk

To eliminate "path prefix" (= "the root of vfs") part from "dirpath"
yielded by "os.walk()" correctly, "path prefix" should have "os.sep"
at the end of own string, but it isn't easy to ensure it, because:

  - examination by "path.endswith(os.sep)" isn't portable

    Some problematic encodings use 0x5c (= "os.sep" on Windows) as the
    tail byte of some multi-byte characters.

  - "os.path.join(path, '')" isn't portable

    With Python 2.7.9, this invocation doesn't add "os.sep" at the end
    of UNC path (see issue4557 for detail).

Python 2.7.9 changed also behavior of "os.path.normpath()" (see *) and
"os.path.splitdrive()" for UNC path.

    vfs root        normpath       splitdrive          os.sep required
    =============== ============== =================== ============
    z:\             z:\            z: + \              no
    z:\foo          z:\foo         z: + \foo           yes
    z:\foo\         z:\foo         z: + \foo           yes

    [before Python 2.7.9]
    \\foo\bar       \\foo\bar      '' + \\foo\bar      yes
    \\foo\bar\      \\foo\bar  (*) '' + \\foo\bar      yes
    \\foo\bar\baz   \\foo\bar\baz  '' + \\foo\bar\baz  yes
    \\foo\bar\baz\  \\foo\bar\baz  '' + \\foo\bar\baz  yes

    [Python 2.7.9]
    \\foo\bar       \\foo\bar      \\foo\bar + ''      yes
    \\foo\bar\      \\foo\bar\ (*) \\foo\bar + \       no
    \\foo\bar\baz   \\foo\bar\baz  \\foo\bar + \baz    yes
    \\foo\bar\baz\  \\foo\bar\baz  \\foo\bar + \baz    yes

If it is ensured that "normpath()"-ed vfs root is passed to
"splitdrive()", adding "os.sep" is required only when "path" part of
"splitdrive()" result isn't "os.sep" itself. This is just what
"pathutil.nameasprefix()" examines.

This patch applies "os.path.normpath()" on "self.join(None)"
explicitly, because it isn't ensured that vfs root is already
normalized: vfs itself is constructed with "realpath=False" (= avoid
normalizing in "vfs.__init__()") in many code paths.

This normalization should be much cheaper than subsequent file I/O for
directory traversal.

Patch

diff --git a/mercurial/scmutil.py b/mercurial/scmutil.py
--- a/mercurial/scmutil.py
+++ b/mercurial/scmutil.py
@@ -356,6 +356,22 @@  class abstractvfs(object):
     def utime(self, path=None, t=None):
         return os.utime(self.join(path), t)
 
+    def walk(self, path=None, onerror=None):
+        """Yield (dirpath, dirs, files) tuple for each directories under path
+
+        ``dirpath`` is relative one from the root of this vfs. This
+        uses ``os.sep`` as path separator, even you specify POSIX
+        style ``path``.
+
+        "The root of this vfs" is represented as empty ``dirpath``.
+        """
+        root = os.path.normpath(self.join(None))
+        # when dirpath == root, dirpath[prefixlen:] becomes empty
+        # because len(dirpath) < prefixlen.
+        prefixlen = len(pathutil.normasprefix(root))
+        for dirpath, dirs, files in os.walk(self.join(path), onerror=onerror):
+            yield (dirpath[prefixlen:], dirs, files)
+
 class vfs(abstractvfs):
     '''Operate files relative to a base directory