Patchwork py3: add an os.fsencode backport to ease path handling

login
register
mail settings
Submitter Martijn Pieters
Date Oct. 9, 2016, 3:45 p.m.
Message ID <1fef9008cbd2098f058f.1476027916@mjpieters-mbp>
Download mbox | patch
Permalink /patch/17016/
State Accepted
Headers show

Comments

Martijn Pieters - Oct. 9, 2016, 3:45 p.m.
# HG changeset patch
# User Martijn Pieters <mjpieters@fb.com>
# Date 1476027863 -7200
#      Sun Oct 09 17:44:23 2016 +0200
# Node ID 1fef9008cbd2098f058f1458df4d59552da88c16
# Parent  82489cd912f332be976cf432673ad47af0d04cd7
py3: add an os.fsencode backport to ease path handling
Yuya Nishihara - Oct. 9, 2016, 8:51 p.m.
On Sun, 09 Oct 2016 17:45:16 +0200, Martijn Pieters wrote:
> # HG changeset patch
> # User Martijn Pieters <mjpieters@fb.com>
> # Date 1476027863 -7200
> #      Sun Oct 09 17:44:23 2016 +0200
> # Node ID 1fef9008cbd2098f058f1458df4d59552da88c16
> # Parent  82489cd912f332be976cf432673ad47af0d04cd7
> py3: add an os.fsencode backport to ease path handling
> 
> diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
> --- a/mercurial/pycompat.py
> +++ b/mercurial/pycompat.py
> @@ -34,6 +34,7 @@
>  if ispy3:
>      import builtins
>      import functools
> +    from os import fsencode
>  
>      def sysstr(s):
>          """Return a keyword str to be passed to Python functions such as
> @@ -64,6 +65,34 @@
>      def sysstr(s):
>          return s
>  
> +    # Partial backport from os.py in Python 3
> +    def _fscodec():
> +        encoding = sys.getfilesystemencoding()
> +        if encoding == 'mbcs':
> +            errors = 'strict'
> +        else:
> +            errors = 'surrogateescape'
> +
> +        def fsencode(filename):
> +            """
> +            Encode filename to the filesystem encoding with 'surrogateescape'
> +            error handler, return bytes unchanged. On Windows, use 'strict'
> +            error handler if the file system encoding is 'mbcs' (which is the
> +            default encoding).
> +            """
> +            if isinstance(filename, str):
> +                return filename
> +            elif isinstance(filename, unicode):
> +                return filename.encode(encoding, errors)

It appears that 'surrogateescape' is a Python 3 thing. We have
encoding.fromutf8b/toutf8b() which should do the same thing for 'utf-8', but
I have no idea how 'surrogateescape' works for the other encodings.

https://docs.python.org/3/library/codecs.html
Martijn Pieters - Oct. 10, 2016, 10:48 a.m.
On 9 October 2016 at 22:51, Yuya Nishihara <yuya@tcha.org> wrote:
>
> It appears that 'surrogateescape' is a Python 3 thing. We have
> encoding.fromutf8b/toutf8b() which should do the same thing for 'utf-8',
> but
> I have no idea how 'surrogateescape' works for the other encodings.
>

Oh crumbs. That's true. We could also just use strict or ever not support
unicode in Python 2 *at all*; the number of unicode paths in Py2 in
mercurial should be 0 *anyway*. The function is mostly there to make it
easier to write code that works both on 2 and 3 and the encoding is only
needed in 3 really.
Yuya Nishihara - Oct. 10, 2016, 6:57 p.m.
On Mon, 10 Oct 2016 12:48:32 +0200, Martijn Pieters wrote:
> On 9 October 2016 at 22:51, Yuya Nishihara <yuya@tcha.org> wrote:
> >
> > It appears that 'surrogateescape' is a Python 3 thing. We have
> > encoding.fromutf8b/toutf8b() which should do the same thing for 'utf-8',
> > but
> > I have no idea how 'surrogateescape' works for the other encodings.
> 
> Oh crumbs. That's true. We could also just use strict or ever not support
> unicode in Python 2 *at all*; the number of unicode paths in Py2 in
> mercurial should be 0 *anyway*. The function is mostly there to make it
> easier to write code that works both on 2 and 3 and the encoding is only
> needed in 3 really.

+1 for dropping unicode support on Py2 so we can be sure no unicode paths
are used on Python 2.

Patch

diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
--- a/mercurial/pycompat.py
+++ b/mercurial/pycompat.py
@@ -34,6 +34,7 @@ 
 if ispy3:
     import builtins
     import functools
+    from os import fsencode
 
     def sysstr(s):
         """Return a keyword str to be passed to Python functions such as
@@ -64,6 +65,34 @@ 
     def sysstr(s):
         return s
 
+    # Partial backport from os.py in Python 3
+    def _fscodec():
+        encoding = sys.getfilesystemencoding()
+        if encoding == 'mbcs':
+            errors = 'strict'
+        else:
+            errors = 'surrogateescape'
+
+        def fsencode(filename):
+            """
+            Encode filename to the filesystem encoding with 'surrogateescape'
+            error handler, return bytes unchanged. On Windows, use 'strict'
+            error handler if the file system encoding is 'mbcs' (which is the
+            default encoding).
+            """
+            if isinstance(filename, str):
+                return filename
+            elif isinstance(filename, unicode):
+                return filename.encode(encoding, errors)
+            else:
+                raise TypeError(
+                    "expect str or unicode, not %s" % type(filename).__name__)
+
+        return fsencode
+
+    fsencode = _fscodec()
+    del _fscodec
+
 stringio = io.StringIO
 empty = _queue.Empty
 queue = _queue.Queue