Patchwork py3: have an utility function to return string

login
register
mail settings
Submitter Pulkit Goyal
Date Sept. 14, 2016, 5:15 p.m.
Message ID <ec133d50af780e84a6a2.1473873327@pulkit-goyal>
Download mbox | patch
Permalink /patch/16629/
State Changes Requested
Headers show

Comments

Pulkit Goyal - Sept. 14, 2016, 5:15 p.m.
# HG changeset patch
# User Pulkit Goyal <7895pulkit@gmail.com>
# Date 1473787789 -19800
#      Tue Sep 13 22:59:49 2016 +0530
# Node ID ec133d50af780e84a6a24825b52d433c10f9cd55
# Parent  85bd31515225e7fdf9bd88edde054db2c74a33f8
py3: have an utility function to return string

There are cases when we need strings and can't use bytes in python 3.
We need an utility function for these cases. I agree that this may not
be the best possible way out. I will be happy if anybody else can suggest
a better approach. We need this functions for os.path.join(), __slots__
and few more things. Added the function in pycompat.py as it is not too big
to import.
Yuya Nishihara - Sept. 15, 2016, 1:36 p.m.
On Wed, 14 Sep 2016 22:45:27 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1473787789 -19800
> #      Tue Sep 13 22:59:49 2016 +0530
> # Node ID ec133d50af780e84a6a24825b52d433c10f9cd55
> # Parent  85bd31515225e7fdf9bd88edde054db2c74a33f8
> py3: have an utility function to return string
> 
> There are cases when we need strings and can't use bytes in python 3.
> We need an utility function for these cases. I agree that this may not
> be the best possible way out. I will be happy if anybody else can suggest
> a better approach. We need this functions for os.path.join(),

We should stick to bytes for filesystem API, and translate bytes to unicode
at VFS layer as necessary.

https://www.mercurial-scm.org/wiki/WindowsUTF8Plan

(Also, we'll have to disable PEP 528 and 529 on Python 3.6, which will break
existing repositories.)

https://docs.python.org/3.6/whatsnew/3.6.html

> __slots__

__slots__ can be considered private data, so just use u''.

> and few more things.

for instance?

> +# This function converts its arguments to strings
> +# on the basis of python version. Strings in python 3
> +# are unicodes and our transformer converts everything to bytes
> +# in python 3. So we need to decode it to unicodes in
> +# py3.
> +
> +def coverttostr(word):
> +    if sys.version_info[0] < 3:
> +        assert isinstance(word, str), "Not a string in Python 2"
> +        return word
> +    # Checking word is bytes because we have the transformer, else
> +    # raising error
> +    assert isinstance(word, bytes), "Should be bytes because of transformer"
> +    return word.decode(sys.getfilesystemencoding())

Can we assume 'word' was encoded in file-system codec?
Christian Ebert - Sept. 15, 2016, 1:37 p.m.
* Pulkit Goyal on Wednesday, September 14, 2016 at 22:45:27 +0530
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1473787789 -19800
> #      Tue Sep 13 22:59:49 2016 +0530
> # Node ID ec133d50af780e84a6a24825b52d433c10f9cd55
> # Parent  85bd31515225e7fdf9bd88edde054db2c74a33f8
> py3: have an utility function to return string
> 
> There are cases when we need strings and can't use bytes in python 3.
> We need an utility function for these cases. I agree that this may not
> be the best possible way out. I will be happy if anybody else can suggest
> a better approach. We need this functions for os.path.join(), __slots__
> and few more things. Added the function in pycompat.py as it is not too big
> to import.
> 
> diff -r 85bd31515225 -r ec133d50af78 mercurial/pycompat.py
> --- a/mercurial/pycompat.py	Sun Aug 21 13:16:21 2016 +0900
> +++ b/mercurial/pycompat.py	Tue Sep 13 22:59:49 2016 +0530
> @@ -164,3 +164,18 @@
>         "SimpleHTTPRequestHandler",
>         "CGIHTTPRequestHandler",
>     ))
> +
> +# This function converts its arguments to strings
> +# on the basis of python version. Strings in python 3
> +# are unicodes and our transformer converts everything to bytes
> +# in python 3. So we need to decode it to unicodes in
> +# py3.
> +
> +def coverttostr(word):

converttostr I presume?
Pulkit Goyal - Sept. 15, 2016, 6:29 p.m.
On Thu, Sep 15, 2016 at 7:06 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Wed, 14 Sep 2016 22:45:27 +0530, Pulkit Goyal wrote:
>> # HG changeset patch
>> # User Pulkit Goyal <7895pulkit@gmail.com>
>> # Date 1473787789 -19800
>> #      Tue Sep 13 22:59:49 2016 +0530
>> # Node ID ec133d50af780e84a6a24825b52d433c10f9cd55
>> # Parent  85bd31515225e7fdf9bd88edde054db2c74a33f8
>> py3: have an utility function to return string
>>
>> There are cases when we need strings and can't use bytes in python 3.
>> We need an utility function for these cases. I agree that this may not
>> be the best possible way out. I will be happy if anybody else can suggest
>> a better approach. We need this functions for os.path.join(),
>
> We should stick to bytes for filesystem API, and translate bytes to unicode
> at VFS layer as necessary.
>
> https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
>
> (Also, we'll have to disable PEP 528 and 529 on Python 3.6, which will break
> existing repositories.)
>
> https://docs.python.org/3.6/whatsnew/3.6.html
>
>> __slots__
>
> __slots__ can be considered private data, so just use u''.
>
>> and few more things.
>
> for instance?
This function was motivated from Gregory's reply to
https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-August/086704.html
, unfortunately I see that he replied to me only so I pasted it here
https://bpaste.net/show/ab0d3ea39749

I am going through python documentation and there are things like
__slots__, is_frozen() which accepts str in both py2 and py3. Since
they are not same, I made this function to get help in such cases. If
we can use unicodes in __slots__ in py2, than thats good.
>
>> +# This function converts its arguments to strings
>> +# on the basis of python version. Strings in python 3
>> +# are unicodes and our transformer converts everything to bytes
>> +# in python 3. So we need to decode it to unicodes in
>> +# py3.
>> +
>> +def coverttostr(word):
>> +    if sys.version_info[0] < 3:
>> +        assert isinstance(word, str), "Not a string in Python 2"
>> +        return word
>> +    # Checking word is bytes because we have the transformer, else
>> +    # raising error
>> +    assert isinstance(word, bytes), "Should be bytes because of transformer"
>> +    return word.decode(sys.getfilesystemencoding())
>
> Can we assume 'word' was encoded in file-system codec?

Yeah because of the tranformer, we added b'' everywhere.
Pulkit Goyal - Sept. 15, 2016, 6:31 p.m.
On Thu, Sep 15, 2016 at 7:07 PM, Christian Ebert <blacktrash@gmx.net> wrote:
> * Pulkit Goyal on Wednesday, September 14, 2016 at 22:45:27 +0530
>> # HG changeset patch
>> # User Pulkit Goyal <7895pulkit@gmail.com>
>> # Date 1473787789 -19800
>> #      Tue Sep 13 22:59:49 2016 +0530
>> # Node ID ec133d50af780e84a6a24825b52d433c10f9cd55
>> # Parent  85bd31515225e7fdf9bd88edde054db2c74a33f8
>> py3: have an utility function to return string
>>
>> There are cases when we need strings and can't use bytes in python 3.
>> We need an utility function for these cases. I agree that this may not
>> be the best possible way out. I will be happy if anybody else can suggest
>> a better approach. We need this functions for os.path.join(), __slots__
>> and few more things. Added the function in pycompat.py as it is not too big
>> to import.
>>
>> diff -r 85bd31515225 -r ec133d50af78 mercurial/pycompat.py
>> --- a/mercurial/pycompat.py   Sun Aug 21 13:16:21 2016 +0900
>> +++ b/mercurial/pycompat.py   Tue Sep 13 22:59:49 2016 +0530
>> @@ -164,3 +164,18 @@
>>         "SimpleHTTPRequestHandler",
>>         "CGIHTTPRequestHandler",
>>     ))
>> +
>> +# This function converts its arguments to strings
>> +# on the basis of python version. Strings in python 3
>> +# are unicodes and our transformer converts everything to bytes
>> +# in python 3. So we need to decode it to unicodes in
>> +# py3.
>> +
>> +def coverttostr(word):
>
> converttostr I presume?

Yeah a typo by mistake, it should be converttostr
Pierre-Yves David - Sept. 16, 2016, 10:09 a.m.
On 09/15/2016 03:36 PM, Yuya Nishihara wrote:
> On Wed, 14 Sep 2016 22:45:27 +0530, Pulkit Goyal wrote:
>> # HG changeset patch
>> # User Pulkit Goyal <7895pulkit@gmail.com>
>> # Date 1473787789 -19800
>> #      Tue Sep 13 22:59:49 2016 +0530
>> # Node ID ec133d50af780e84a6a24825b52d433c10f9cd55
>> # Parent  85bd31515225e7fdf9bd88edde054db2c74a33f8
>> py3: have an utility function to return string
>>
>> There are cases when we need strings and can't use bytes in python 3.
>> We need an utility function for these cases. I agree that this may not
>> be the best possible way out. I will be happy if anybody else can suggest
>> a better approach. We need this functions for os.path.join(),
>
> We should stick to bytes for filesystem API, and translate bytes to unicode
> at VFS layer as necessary.
>
> https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
>
> (Also, we'll have to disable PEP 528 and 529 on Python 3.6, which will break
> existing repositories.)
>
> https://docs.python.org/3.6/whatsnew/3.6.html
>
>> __slots__
>
> __slots__ can be considered private data, so just use u''.
>
>> and few more things.
>
> for instance?
>
>> +# This function converts its arguments to strings
>> +# on the basis of python version. Strings in python 3
>> +# are unicodes and our transformer converts everything to bytes
>> +# in python 3. So we need to decode it to unicodes in
>> +# py3.
>> +
>> +def coverttostr(word):

Any reason, this comment is not the python docstring?

>> +    if sys.version_info[0] < 3:
>> +        assert isinstance(word, str), "Not a string in Python 2"
>> +        return word
>> +    # Checking word is bytes because we have the transformer, else
>> +    # raising error
>> +    assert isinstance(word, bytes), "Should be bytes because of transformer"
>> +    return word.decode(sys.getfilesystemencoding())
>
> Can we assume 'word' was encoded in file-system codec?

On what kind of string is this going to be used. If we intend to us this 
on Mercurial internal identifier only, we can probably assume (and 
actually, enforce) ascii to keep things simple.

Cheers,
Martijn Pieters - Sept. 16, 2016, 10:27 a.m.
On 16 September 2016 at 11:09, Pierre-Yves David
<pierre-yves.david@ens-lyon.org> wrote:
>>> +    return word.decode(sys.getfilesystemencoding())
>>
>>
>> Can we assume 'word' was encoded in file-system codec?

No, this is being used for *source code literals*, so
getfilesystemencoding is the wrong codec here. Probably the function
should be given an encoding='utf8' default instead, so you can specify
a different codec.

>
> On what kind of string is this going to be used. If we intend to us this on
> Mercurial internal identifier only, we can probably assume (and actually,
> enforce) ascii to keep things simple.

If this is only going to be used for Python identifiers in strings
(e.g. the string(s) __slots__ accepts) then ASCII is fine, especially
because we need to keep the code working in both Python 2 and 3 and 2
only accepts ASCII for identifiers.
Martijn Pieters - Sept. 16, 2016, 10:34 a.m.
And having properly read Gregory's email, I see he intended his patch
to be used for *paths* in Python 3, and Pulkit is re-using this for
*Python identifiers in __slots__*.

This at least explains why getfilesystemencoding was used; it is the
right choice for the first use case, not the second.

On 16 September 2016 at 11:27, Martijn Pieters <mj@zopatista.com> wrote:
> On 16 September 2016 at 11:09, Pierre-Yves David
> <pierre-yves.david@ens-lyon.org> wrote:
>>>> +    return word.decode(sys.getfilesystemencoding())
>>>
>>>
>>> Can we assume 'word' was encoded in file-system codec?
>
> No, this is being used for *source code literals*, so
> getfilesystemencoding is the wrong codec here. Probably the function
> should be given an encoding='utf8' default instead, so you can specify
> a different codec.
>
>>
>> On what kind of string is this going to be used. If we intend to us this on
>> Mercurial internal identifier only, we can probably assume (and actually,
>> enforce) ascii to keep things simple.
>
> If this is only going to be used for Python identifiers in strings
> (e.g. the string(s) __slots__ accepts) then ASCII is fine, especially
> because we need to keep the code working in both Python 2 and 3 and 2
> only accepts ASCII for identifiers.
>
>
> --
> Martijn Pieters
Yuya Nishihara - Sept. 16, 2016, 1:46 p.m.
On Thu, 15 Sep 2016 23:59:59 +0530, Pulkit Goyal wrote:
> On Thu, Sep 15, 2016 at 7:06 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Wed, 14 Sep 2016 22:45:27 +0530, Pulkit Goyal wrote:
> >> # HG changeset patch
> >> # User Pulkit Goyal <7895pulkit@gmail.com>
> >> # Date 1473787789 -19800
> >> #      Tue Sep 13 22:59:49 2016 +0530
> >> # Node ID ec133d50af780e84a6a24825b52d433c10f9cd55
> >> # Parent  85bd31515225e7fdf9bd88edde054db2c74a33f8
> >> py3: have an utility function to return string
> >>
> >> There are cases when we need strings and can't use bytes in python 3.
> >> We need an utility function for these cases. I agree that this may not
> >> be the best possible way out. I will be happy if anybody else can suggest
> >> a better approach. We need this functions for os.path.join(),
> >
> > We should stick to bytes for filesystem API, and translate bytes to unicode
> > at VFS layer as necessary.
> >
> > https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
> >
> > (Also, we'll have to disable PEP 528 and 529 on Python 3.6, which will break
> > existing repositories.)
> >
> > https://docs.python.org/3.6/whatsnew/3.6.html
> >
> >> __slots__
> >
> > __slots__ can be considered private data, so just use u''.
> >
> >> and few more things.
> >
> > for instance?
> This function was motivated from Gregory's reply to
> https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-August/086704.html
> , unfortunately I see that he replied to me only so I pasted it here
> https://bpaste.net/show/ab0d3ea39749
> 
> I am going through python documentation and there are things like
> __slots__, is_frozen() which accepts str in both py2 and py3. Since
> they are not same, I made this function to get help in such cases. If
> we can use unicodes in __slots__ in py2, than thats good.

Python 2.6-2.7 accepts both str and unicode in general, but mixing them is
disaster so we've never used unicode whenever possible. Unfortunately, Python 3
solved that problem by forcing us to use unicode (named str) everywhere, which
doesn't work in Mercurial because we need to process binary data (including
unix paths) transparently. All inputs and outputs (except for future Windows
file API) should be bytes.

So, if is_frozen() of Py3 doesn't take bytes and Py2 doesn't take unicode,
we'll need a compatibility function like you proposed.

> >> +# This function converts its arguments to strings
> >> +# on the basis of python version. Strings in python 3
> >> +# are unicodes and our transformer converts everything to bytes
> >> +# in python 3. So we need to decode it to unicodes in
> >> +# py3.
> >> +
> >> +def coverttostr(word):
> >> +    if sys.version_info[0] < 3:
> >> +        assert isinstance(word, str), "Not a string in Python 2"
> >> +        return word
> >> +    # Checking word is bytes because we have the transformer, else
> >> +    # raising error
> >> +    assert isinstance(word, bytes), "Should be bytes because of transformer"
> >> +    return word.decode(sys.getfilesystemencoding())
> >
> > Can we assume 'word' was encoded in file-system codec?
> 
> Yeah because of the tranformer, we added b'' everywhere.

As Martijn said, that varies on how 'word' was encoded. Python sources would
be latin1 or utf-8 in most cases, but a string read from external world is
different. We assume it as encoding.encoding.
Pulkit Goyal - Oct. 2, 2016, 1:06 a.m.
On Fri, Sep 16, 2016 at 7:16 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Thu, 15 Sep 2016 23:59:59 +0530, Pulkit Goyal wrote:
>> On Thu, Sep 15, 2016 at 7:06 PM, Yuya Nishihara <yuya@tcha.org> wrote:
>> > On Wed, 14 Sep 2016 22:45:27 +0530, Pulkit Goyal wrote:
>> >> # HG changeset patch
>> >> # User Pulkit Goyal <7895pulkit@gmail.com>
>> >> # Date 1473787789 -19800
>> >> #      Tue Sep 13 22:59:49 2016 +0530
>> >> # Node ID ec133d50af780e84a6a24825b52d433c10f9cd55
>> >> # Parent  85bd31515225e7fdf9bd88edde054db2c74a33f8
>> >> py3: have an utility function to return string
>> >>
>> >> There are cases when we need strings and can't use bytes in python 3.
>> >> We need an utility function for these cases. I agree that this may not
>> >> be the best possible way out. I will be happy if anybody else can suggest
>> >> a better approach. We need this functions for os.path.join(),
>> >
>> > We should stick to bytes for filesystem API, and translate bytes to unicode
>> > at VFS layer as necessary.
>> >
>> > https://www.mercurial-scm.org/wiki/WindowsUTF8Plan
>> >
>> > (Also, we'll have to disable PEP 528 and 529 on Python 3.6, which will break
>> > existing repositories.)
>> >
>> > https://docs.python.org/3.6/whatsnew/3.6.html
>> >
>> >> __slots__
>> >
>> > __slots__ can be considered private data, so just use u''.
>> >
>> >> and few more things.
>> >
>> > for instance?
>> This function was motivated from Gregory's reply to
>> https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-August/086704.html
>> , unfortunately I see that he replied to me only so I pasted it here
>> https://bpaste.net/show/ab0d3ea39749
>>
>> I am going through python documentation and there are things like
>> __slots__, is_frozen() which accepts str in both py2 and py3. Since
>> they are not same, I made this function to get help in such cases. If
>> we can use unicodes in __slots__ in py2, than thats good.
>
> Python 2.6-2.7 accepts both str and unicode in general, but mixing them is
> disaster so we've never used unicode whenever possible. Unfortunately, Python 3
> solved that problem by forcing us to use unicode (named str) everywhere, which
> doesn't work in Mercurial because we need to process binary data (including
> unix paths) transparently. All inputs and outputs (except for future Windows
> file API) should be bytes.
>
> So, if is_frozen() of Py3 doesn't take bytes and Py2 doesn't take unicode,
> we'll need a compatibility function like you proposed.
>
>> >> +# This function converts its arguments to strings
>> >> +# on the basis of python version. Strings in python 3
>> >> +# are unicodes and our transformer converts everything to bytes
>> >> +# in python 3. So we need to decode it to unicodes in
>> >> +# py3.
>> >> +
>> >> +def coverttostr(word):
>> >> +    if sys.version_info[0] < 3:
>> >> +        assert isinstance(word, str), "Not a string in Python 2"
>> >> +        return word
>> >> +    # Checking word is bytes because we have the transformer, else
>> >> +    # raising error
>> >> +    assert isinstance(word, bytes), "Should be bytes because of transformer"
>> >> +    return word.decode(sys.getfilesystemencoding())
>> >
>> > Can we assume 'word' was encoded in file-system codec?
>>
>> Yeah because of the tranformer, we added b'' everywhere.
>
> As Martijn said, that varies on how 'word' was encoded. Python sources would
> be latin1 or utf-8 in most cases, but a string read from external world is
> different. We assume it as encoding.encoding.

Is encoding.encoding public or private. Can I convert it to unicode?
Yuya Nishihara - Oct. 2, 2016, 3:55 a.m.
On Sun, 2 Oct 2016 06:36:35 +0530, Pulkit Goyal wrote:
> Is encoding.encoding public or private. Can I convert it to unicode?

No. It's read/written freely. We could cache a unicode variant internally if
that matters, but we would need a setter function to invalidate the cache.

% grep encoding.encoding **/*.py
hgext/convert/convcmd.py:                # tolocal() because the encoding.encoding convert()
hgext/convert/convcmd.py:    orig_encoding = encoding.encoding
hgext/convert/convcmd.py:    encoding.encoding = 'UTF-8'
hgext/convert/cvs.py:        self.encoding = encoding.encoding
hgext/convert/gnuarch.py:        self.encoding = encoding.encoding
hgext/highlight/__init__.py:    mt = ''.join(tmpl('mimetype', encoding=encoding.encoding))
hgext/highlight/__init__.py:    mt = ''.join(tmpl('mimetype', encoding=encoding.encoding))
hgext/highlight/highlight.py:    text = text.decode(encoding.encoding, 'replace')
hgext/highlight/highlight.py:    coloriter = (s.encode(encoding.encoding, 'replace')
hgext/win32mbcs.py:By default, win32mbcs uses encoding.encoding decided by Mercurial.
hgext/win32mbcs.py:    _encoding = ui.config('win32mbcs', 'encoding', encoding.encoding)
hgext/zeroconf/__init__.py:            return name.encode(encoding.encoding)
mercurial/commands.py:    ('', 'encoding', encoding.encoding, _('set the charset encoding'),
mercurial/commands.py:    ('', 'encodingmode', encoding.encodingmode,
mercurial/commands.py:    fm.write('encoding', _("checking encoding (%s)...\n"), encoding.encoding)
mercurial/commandserver.py:        self.cresult.write(encoding.encoding)
mercurial/commandserver.py:        hellomsg += 'encoding: ' + encoding.encoding
mercurial/dispatch.py:                reason = reason.encode(encoding.encoding, 'replace')
mercurial/dispatch.py:        encoding.encoding = options["encoding"]
mercurial/dispatch.py:        encoding.encodingmode = options["encodingmode"]
mercurial/encoding.py:    >>> encoding.encoding = 'utf-8'
mercurial/encoding.py:    >>> t = u.encode(encoding.encoding)
mercurial/hgweb/hgweb_mod.py:            'encoding': encoding.encoding,
mercurial/hgweb/hgweb_mod.py:        encoding.encoding = rctx.config('web', 'encoding', encoding.encoding)
mercurial/hgweb/hgweb_mod.py:            ctype = tmpl('mimetype', encoding=encoding.encoding)
mercurial/hgweb/hgwebdir_mod.py:        encoding.encoding = self.ui.config('web', 'encoding',
mercurial/hgweb/hgwebdir_mod.py:                                           encoding.encoding)
mercurial/hgweb/hgwebdir_mod.py:            ctype = tmpl('mimetype', encoding=encoding.encoding)
mercurial/hgweb/hgwebdir_mod.py:            "encoding": encoding.encoding,
mercurial/hgweb/webcommands.py:        mt += '; charset="%s"' % encoding.encoding
mercurial/i18n.py:            _msgcache[message] = u.encode(encoding.encoding, "replace")
mercurial/mail.py:                 encoding.encoding.lower(), 'utf-8']
mercurial/mail.py:        for ics in (encoding.encoding, encoding.fallbackencoding):
mercurial/mail.py:        dom = dom.decode(encoding.encoding).encode('idna')
mercurial/minirst.py:    >>> encoding.encoding = 'latin1'
mercurial/minirst.py:    >>> encoding.encoding = 'shiftjis'
mercurial/minirst.py:    utext = text.decode(encoding.encoding)
mercurial/minirst.py:    return utext.encode(encoding.encoding)
mercurial/templatefilters.py:                uctext = unicode(text[start:], encoding.encoding)
mercurial/templatefilters.py:                yield (uctext[:w].encode(encoding.encoding),
mercurial/templatefilters.py:                       uctext[w:].encode(encoding.encoding))
mercurial/templatefilters.py:    text = unicode(text, encoding.encoding, 'replace')
mercurial/util.py:    line = line.decode(encoding.encoding, encoding.encodingmode)
mercurial/util.py:    initindent = initindent.decode(encoding.encoding, encoding.encodingmode)
mercurial/util.py:    hangindent = hangindent.decode(encoding.encoding, encoding.encodingmode)
mercurial/util.py:    return wrapper.fill(line).encode(encoding.encoding)
tests/test-context.py:    encoding.encoding = enc

Patch

diff -r 85bd31515225 -r ec133d50af78 mercurial/pycompat.py
--- a/mercurial/pycompat.py	Sun Aug 21 13:16:21 2016 +0900
+++ b/mercurial/pycompat.py	Tue Sep 13 22:59:49 2016 +0530
@@ -164,3 +164,18 @@ 
         "SimpleHTTPRequestHandler",
         "CGIHTTPRequestHandler",
     ))
+
+# This function converts its arguments to strings
+# on the basis of python version. Strings in python 3
+# are unicodes and our transformer converts everything to bytes
+# in python 3. So we need to decode it to unicodes in
+# py3.
+
+def coverttostr(word):
+    if sys.version_info[0] < 3:
+        assert isinstance(word, str), "Not a string in Python 2"
+        return word
+    # Checking word is bytes because we have the transformer, else
+    # raising error
+    assert isinstance(word, bytes), "Should be bytes because of transformer"
+    return word.decode(sys.getfilesystemencoding())