Patchwork posix: fix split() for the case where the path is at the root of the filesystem

login
register
mail settings
Submitter Remy Blank
Date Jan. 6, 2013, 1:50 a.m.
Message ID <c068597242626dbb517e.1357437009@pat.athome>
Download mbox | patch
Permalink /patch/402/
State Superseded
Commit 0d5a22f73a1f8d7b871c51bc8426264e8c629ce4
Headers show

Comments

Remy Blank - Jan. 6, 2013, 1:50 a.m.
# HG changeset patch
# User Remy Blank <remy.blank at pobox.com>
# Date 1357401181 -3600
# Node ID c068597242626dbb517e750a7ef0bed664131517
# Parent  83aa4359c49f67bcb98fb9c7d885ed4ac7443239
posix: fix split() for the case where the path is at the root of the filesystem

posixpath.split() strips '/' from the dirname *unless it is the root*. This
patch reproduces this behavior in posix.split(). The old behavior causes a
crash when creating a file at the root of the repo with localrepo.wfile()
when the repo is at the root of the filesystem.
Mads Kiilerich - Jan. 6, 2013, 2:43 a.m.
Looks ok to me. Just some minor comments that doesn't lead to any 
conclusion ...

Remy Blank wrote, On 01/06/2013 02:50 AM:
> # HG changeset patch
> # User Remy Blank <remy.blank at pobox.com>
> # Date 1357401181 -3600
> # Node ID c068597242626dbb517e750a7ef0bed664131517
> # Parent  83aa4359c49f67bcb98fb9c7d885ed4ac7443239
> posix: fix split() for the case where the path is at the root of the filesystem
>
> posixpath.split() strips '/' from the dirname *unless it is the root*. This
> patch reproduces this behavior in posix.split(). The old behavior causes a
> crash when creating a file at the root of the repo with localrepo.wfile()
> when the repo is at the root of the filesystem.
>
> diff --git a/mercurial/posix.py b/mercurial/posix.py
> --- a/mercurial/posix.py
> +++ b/mercurial/posix.py
> @@ -21,14 +21,32 @@ umask = os.umask(0)
>   os.umask(umask)
>   
>   def split(p):
> -    '''Same as os.path.split, but faster'''
> +    '''Same as os.path.split, but faster
> +

Nice test coverage. A bit verbose for a doctest ... but probably ok.

> +    >>> split('/absolute/path/to/file')
> +    ('/absolute/path/to', 'file')
> +    >>> split('relative/path/to/file')
> +    ('relative/path/to', 'file')
> +    >>> split('file_alone')
> +    ('', 'file_alone')
> +    >>> split('path/to/directory/')
> +    ('path/to/directory', '')
> +    >>> split('/multiple/path//separators')
> +    ('/multiple/path', 'separators')

> +    >>> split('/file_at_root')
> +    ('/', 'file_at_root')

(This is the case that gave ('', 'file_at_root') without this patch. 
That could perhaps be made more clear in the patch description.)

> +    >>> split('///multiple_leading_separators_at_root')
> +    ('///', 'multiple_leading_separators_at_root')

That patch also changes this result from '//' to '///'. I am not sure 
that is an improvement ... but probably not a problem.

> +    >>> split('')
> +    ('', '')

I can not imagine a case where this is relevant, so it should perhaps 
just be left undefined and untested.

> +    '''
>       ht = p.rsplit('/', 1)
>       if len(ht) == 1:
>           return '', p
>       nh = ht[0].rstrip('/')
>       if nh:
>           return nh, ht[1]
> -    return ht
> +    return ht[0] + '/', ht[1]
>   
>   def openhardlinks():
>       '''return true if it is safe to hold open file handles to hardlinks'''
> diff --git a/tests/test-doctest.py b/tests/test-doctest.py
> --- a/tests/test-doctest.py
> +++ b/tests/test-doctest.py
> @@ -6,6 +6,7 @@ import doctest
>   
>   import mercurial.util
>   doctest.testmod(mercurial.util)
> +doctest.testmod(mercurial.util.platform)

This is a clever way to make sure posix.py only is doctested on posix 
systems ... but I tend to think it would be better to be more explicit.

Somewhat related:
We already use posixpath (= os.path on posix systems) on windows too. We 
might soon want to use this posix.split on windows too, so it should 
perhaps be made generally available in util somehow ...

/Mads
Remy Blank - Jan. 6, 2013, 8:02 a.m.
Mads Kiilerich wrote:
> (This is the case that gave ('', 'file_at_root') without this patch. 
> That could perhaps be made more clear in the patch description.)

Will update.

>> +    >>> split('///multiple_leading_separators_at_root')
>> +    ('///', 'multiple_leading_separators_at_root')
> 
> That patch also changes this result from '//' to '///'. I am not sure 
> that is an improvement ... but probably not a problem.

It's what posixpath.split() does, and considering this is supposed to be
a faster drop-in replacement, I would say it's a good thing.

>> +    >>> split('')
>> +    ('', '')
> 
> I can not imagine a case where this is relevant, so it should perhaps 
> just be left undefined and untested.

Again, this just checks that the behavior is identical to
posixpath.split(). Maybe the doctest should be written more explicitly
as such, by iterating over a list of strings and asserting that both
split() and posixpath.split() return the same result?

> This is a clever way to make sure posix.py only is doctested on posix 
> systems ... but I tend to think it would be better to be more explicit.

Do you mean by adding a comment, or by repeating the import logic from
util.py?

if os.name == 'nt':
    import mercurial.windows
    doctest.testmod(mercurial.windows)
else:
    import mercurial.posix
    doctest.testmod(mercurial.posix)

> Somewhat related:
> We already use posixpath (= os.path on posix systems) on windows too. We 
> might soon want to use this posix.split on windows too, so it should 
> perhaps be made generally available in util somehow ...

I assume this would go into a separate patch?

-- Remy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: OpenPGP digital signature
URL: <http://selenic.com/pipermail/mercurial-devel/attachments/20130106/3987ce9e/attachment.pgp>

Patch

diff --git a/mercurial/posix.py b/mercurial/posix.py
--- a/mercurial/posix.py
+++ b/mercurial/posix.py
@@ -21,14 +21,32 @@  umask = os.umask(0)
 os.umask(umask)
 
 def split(p):
-    '''Same as os.path.split, but faster'''
+    '''Same as os.path.split, but faster
+
+    >>> split('/absolute/path/to/file')
+    ('/absolute/path/to', 'file')
+    >>> split('relative/path/to/file')
+    ('relative/path/to', 'file')
+    >>> split('file_alone')
+    ('', 'file_alone')
+    >>> split('path/to/directory/')
+    ('path/to/directory', '')
+    >>> split('/multiple/path//separators')
+    ('/multiple/path', 'separators')
+    >>> split('/file_at_root')
+    ('/', 'file_at_root')
+    >>> split('///multiple_leading_separators_at_root')
+    ('///', 'multiple_leading_separators_at_root')
+    >>> split('')
+    ('', '')
+    '''
     ht = p.rsplit('/', 1)
     if len(ht) == 1:
         return '', p
     nh = ht[0].rstrip('/')
     if nh:
         return nh, ht[1]
-    return ht
+    return ht[0] + '/', ht[1]
 
 def openhardlinks():
     '''return true if it is safe to hold open file handles to hardlinks'''
diff --git a/tests/test-doctest.py b/tests/test-doctest.py
--- a/tests/test-doctest.py
+++ b/tests/test-doctest.py
@@ -6,6 +6,7 @@  import doctest
 
 import mercurial.util
 doctest.testmod(mercurial.util)
+doctest.testmod(mercurial.util.platform)
 
 import mercurial.changelog
 doctest.testmod(mercurial.changelog)