Patchwork [5,of,5,STABLE,V2] subrepo: normalize path in the specific way for problematic encodings

login
register
mail settings
Submitter Katsunori FUJIWARA
Date May 8, 2014, 10:31 a.m.
Message ID <d7a028c6e5ee5d9c5d3a.1399545080@juju>
Download mbox | patch
Permalink /patch/4670/
State Accepted
Commit 8dd17b19e722209693c786f1c3c318a1dab05086
Headers show

Comments

Katsunori FUJIWARA - May 8, 2014, 10:31 a.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1399543380 -32400
#      Thu May 08 19:03:00 2014 +0900
# Branch stable
# Node ID d7a028c6e5ee5d9c5d3adf92f83be357154b8200
# Parent  4e9712d2233f2c4af0f97740fde9afae05917569
subrepo: normalize path in the specific way for problematic encodings

Before this patch, "reporelpath()" uses "rstrip(os.sep)" to trim
"os.sep" at the end of "parent.root" path.

But it doesn't work correctly with some problematic encodings on
Windows, because some multi-byte characters in such encodings contain
'\\' (0x5c) as the tail byte of them.

In such cases, "reporelpath()" leaves unexpected '\\' at the beginning
of the path returned to callers.

"lcalrepository.root" seems not to have tail "os.sep", because it is
always normalized by "os.path.realpath()" in "vfs.__init__()", but in
fact it has tail "os.sep", if it is a root (of the drive): path
normalization trims tail "os.sep" off "/foo/bar/", but doesn't trim
one off "/".

So, just avoiding "rstrip(os.sep)" in "reporelpath()" causes
regression around issue3033 fixed by fccd350acf79.

This patch introduces "pathutil.normasprefix" to normalize specified
path in the specific way for problematic encodings without regression
around issue3033.
Matt Mackall - May 27, 2014, 9:16 p.m.
On Thu, 2014-05-08 at 19:31 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1399543380 -32400
> #      Thu May 08 19:03:00 2014 +0900
> # Branch stable
> # Node ID d7a028c6e5ee5d9c5d3adf92f83be357154b8200
> # Parent  4e9712d2233f2c4af0f97740fde9afae05917569
> subrepo: normalize path in the specific way for problematic encodings

These are queued for stable, thanks. We might want to
audit/replace/check-code os.sep.

Patch

diff --git a/mercurial/pathutil.py b/mercurial/pathutil.py
--- a/mercurial/pathutil.py
+++ b/mercurial/pathutil.py
@@ -142,3 +142,25 @@  def canonpath(root, cwd, myname, auditor
             name = dirname
 
         raise util.Abort(_("%s not under root '%s'") % (myname, root))
+
+def normasprefix(path):
+    '''normalize the specified path as path prefix
+
+    Returned vaule can be used safely for "p.startswith(prefix)",
+    "p[len(prefix):]", and so on.
+
+    For efficiency, this expects "path" argument to be already
+    normalized by "os.path.normpath", "os.path.realpath", and so on.
+
+    See also issue3033 for detail about need of this function.
+
+    >>> normasprefix('/foo/bar').replace(os.sep, '/')
+    '/foo/bar/'
+    >>> normasprefix('/').replace(os.sep, '/')
+    '/'
+    '''
+    d, p = os.path.splitdrive(path)
+    if len(p) != len(os.sep):
+        return path + os.sep
+    else:
+        return path
diff --git a/mercurial/subrepo.py b/mercurial/subrepo.py
--- a/mercurial/subrepo.py
+++ b/mercurial/subrepo.py
@@ -276,8 +276,7 @@  def reporelpath(repo):
     parent = repo
     while util.safehasattr(parent, '_subparent'):
         parent = parent._subparent
-    p = parent.root.rstrip(os.sep)
-    return repo.root[len(p) + 1:]
+    return repo.root[len(pathutil.normasprefix(parent.root)):]
 
 def subrelpath(sub):
     """return path to this subrepo as seen from outermost repo"""
diff --git a/tests/test-doctest.py b/tests/test-doctest.py
--- a/tests/test-doctest.py
+++ b/tests/test-doctest.py
@@ -19,6 +19,7 @@  testmod('mercurial.hg')
 testmod('mercurial.hgweb.hgwebdir_mod')
 testmod('mercurial.match')
 testmod('mercurial.minirst')
+testmod('mercurial.pathutil')
 testmod('mercurial.revset')
 testmod('mercurial.store')
 testmod('mercurial.subrepo')