Patchwork [01,of,10] py3: use unicode literals in encoding.py

login
register
mail settings
Submitter Katsunori FUJIWARA
Date Aug. 3, 2016, 3:18 p.m.
Message ID <uy44dank4.wl%foozy@lares.dti.ne.jp>
Download mbox | patch
Permalink /patch/16065/
State Not Applicable
Headers show

Comments

Katsunori FUJIWARA - Aug. 3, 2016, 3:18 p.m.
At Wed, 3 Aug 2016 13:33:12 +0100,
Jun Wu wrote:
> 
> I think we may want special handling things like os.environ in the
> transformer instead. IIUC the decision about using the transformer approach
> is to reduce the need of these kinds of fixups.

As a part of enabling demandimport on Python 3.x, I'm working to omit
code transformation for demandimport.py by changes like below:



If (almost) all of operations with string literal in target source
code requires unicode-ness on Python 3.x, this omitting can reduce
adding explicit 'u' prefix to existing string literals.

For example, all operations with string literal in demandimport.py are
related to APIs below, which accept only unicode (as str) on Python
3.x.

  - manipulate module name
    split(), formatting with "%s", __contains__(), and so on
  - access to attributes by name
  - access to values in os.environ
  - access to values in sys.builtin_module_names

pycompat.py and i18n.py also seem to work with this omitting. At short
glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
prefix might be needed, though)

How about this omitting ?


> Excerpts from Pulkit Goyal's message of 2016-08-03 01:57:23 +0530:
> > # HG changeset patch
> > # User Pulkit Goyal <7895pulkit@gmail.com>
> > # Date 1470161385 -19800
> > #      Tue Aug 02 23:39:45 2016 +0530
> > # Node ID c03543a126719097a1a61c8e5ef5fcb222262315
> > # Parent  73ff159923c1f05899c27238409ca398342d9ae0
> > py3: use unicode literals in encoding.py
> > 
> > The custom module loader adds a b'' everywhere and hence making everything bytes. There are some instances
> > where we need to have unicodes. This patch deals with such instances in encoding.py. Moreover this patch also
> > updates the output of test-check-py3-compat.t at some places which was left unchanged.
> > 
> > This series of patches is work of Gregory Szorc and are taken from https://hg.mozilla.org/users/gszorc_mozilla.com/hg/shortlog/py3 .
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Pulkit Goyal - Aug. 3, 2016, 3:27 p.m.
> diff -r cf6739a27b8f mercurial/__init__.py
> --- a/mercurial/__init__.py     Wed Aug 03 22:34:54 2016 +0900
> +++ b/mercurial/__init__.py     Wed Aug 03 22:47:17 2016 +0900
> @@ -310,6 +310,10 @@
>          The added header has the form ``HG<VERSION>``. That is a literal
>          ``HG`` with 2 binary bytes indicating the transformation version.
>          """
> +        _notransform = set([
> +            'mercurial.demandimport',
> +        ])
> +
>          def get_data(self, path):
>              data = super(hgloader, self).get_data(path)
>
> @@ -336,9 +340,10 @@
>
>          def source_to_code(self, data, path):
>              """Perform token transformation before compilation."""
> -            buf = io.BytesIO(data)
> -            tokens = tokenize.tokenize(buf.readline)
> -            data = tokenize.untokenize(replacetokens(list(tokens)))
> +            if self.name not in self._notransform:
> +                buf = io.BytesIO(data)
> +                tokens = tokenize.tokenize(buf.readline)
> +                data = tokenize.untokenize(replacetokens(list(tokens)))
>              # Python's built-in importer strips frames from exceptions raised
>              # for this code. Unfortunately, that mechanism isn't extensible
>              # and our frame will be blamed for the import failure. There
>
>
> If (almost) all of operations with string literal in target source
> code requires unicode-ness on Python 3.x, this omitting can reduce
> adding explicit 'u' prefix to existing string literals.
>
> For example, all operations with string literal in demandimport.py are
> related to APIs below, which accept only unicode (as str) on Python
> 3.x.
>
>   - manipulate module name
>     split(), formatting with "%s", __contains__(), and so on
>   - access to attributes by name
>   - access to values in os.environ
>   - access to values in sys.builtin_module_names
>
> pycompat.py and i18n.py also seem to work with this omitting. At short
> glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
> prefix might be needed, though)
>
> How about this omitting ?

I think this is good as it increase flexibility to use the module
loader and we can omit files where module loader is creating problems.
Gregory Szorc - Aug. 3, 2016, 3:31 p.m.
> On Aug 3, 2016, at 08:18, FUJIWARA Katsunori <foozy@lares.dti.ne.jp> wrote:
> 
> At Wed, 3 Aug 2016 13:33:12 +0100,
> Jun Wu wrote:
>> 
>> I think we may want special handling things like os.environ in the
>> transformer instead. IIUC the decision about using the transformer approach
>> is to reduce the need of these kinds of fixups.
> 
> As a part of enabling demandimport on Python 3.x, I'm working to omit
> code transformation for demandimport.py by changes like below:
> 
> diff -r cf6739a27b8f mercurial/__init__.py
> --- a/mercurial/__init__.py     Wed Aug 03 22:34:54 2016 +0900
> +++ b/mercurial/__init__.py     Wed Aug 03 22:47:17 2016 +0900
> @@ -310,6 +310,10 @@
>         The added header has the form ``HG<VERSION>``. That is a literal
>         ``HG`` with 2 binary bytes indicating the transformation version.
>         """
> +        _notransform = set([
> +            'mercurial.demandimport',
> +        ])
> +
>         def get_data(self, path):
>             data = super(hgloader, self).get_data(path)
> 
> @@ -336,9 +340,10 @@
> 
>         def source_to_code(self, data, path):
>             """Perform token transformation before compilation."""
> -            buf = io.BytesIO(data)
> -            tokens = tokenize.tokenize(buf.readline)
> -            data = tokenize.untokenize(replacetokens(list(tokens)))
> +            if self.name not in self._notransform:
> +                buf = io.BytesIO(data)
> +                tokens = tokenize.tokenize(buf.readline)
> +                data = tokenize.untokenize(replacetokens(list(tokens)))
>             # Python's built-in importer strips frames from exceptions raised
>             # for this code. Unfortunately, that mechanism isn't extensible
>             # and our frame will be blamed for the import failure. There
> 
> 
> If (almost) all of operations with string literal in target source
> code requires unicode-ness on Python 3.x, this omitting can reduce
> adding explicit 'u' prefix to existing string literals.
> 
> For example, all operations with string literal in demandimport.py are
> related to APIs below, which accept only unicode (as str) on Python
> 3.x.
> 
>  - manipulate module name
>    split(), formatting with "%s", __contains__(), and so on
>  - access to attributes by name
>  - access to values in os.environ
>  - access to values in sys.builtin_module_names
> 
> pycompat.py and i18n.py also seem to work with this omitting. At short
> glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
> prefix might be needed, though)
> 
> How about this omitting ?

I can go both ways. On one hand, not doing the transformation is ideal: the transforming is a giant hack to make porting more manageable. On the other, consistency is also good. Having to remember which modules are transformed and which aren't could be painful.

I like the idea of something in the file that would tell the loader not to transform. And I think we have something already: "from __future__ import unicode_literals." Although that would use Unicode types everywhere, which isn't wanted when interfacing with certain Python APIs. So maybe we could throw a special comment at the top of the file? "# hgnotransform" or some such.

> 
> 
>> Excerpts from Pulkit Goyal's message of 2016-08-03 01:57:23 +0530:
>>> # HG changeset patch
>>> # User Pulkit Goyal <7895pulkit@gmail.com>
>>> # Date 1470161385 -19800
>>> #      Tue Aug 02 23:39:45 2016 +0530
>>> # Node ID c03543a126719097a1a61c8e5ef5fcb222262315
>>> # Parent  73ff159923c1f05899c27238409ca398342d9ae0
>>> py3: use unicode literals in encoding.py
>>> 
>>> The custom module loader adds a b'' everywhere and hence making everything bytes. There are some instances
>>> where we need to have unicodes. This patch deals with such instances in encoding.py. Moreover this patch also
>>> updates the output of test-check-py3-compat.t at some places which was left unchanged.
>>> 
>>> This series of patches is work of Gregory Szorc and are taken from https://hg.mozilla.org/users/gszorc_mozilla.com/hg/shortlog/py3 .
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> 
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Katsunori FUJIWARA - Aug. 3, 2016, 5:25 p.m.
At Wed, 3 Aug 2016 08:31:26 -0700,
Gregory Szorc wrote:
> 
> > On Aug 3, 2016, at 08:18, FUJIWARA Katsunori <foozy@lares.dti.ne.jp> wrote:
> > 
> > At Wed, 3 Aug 2016 13:33:12 +0100,
> > Jun Wu wrote:
> >> 
> >> I think we may want special handling things like os.environ in the
> >> transformer instead. IIUC the decision about using the transformer approach
> >> is to reduce the need of these kinds of fixups.
> > 
> > As a part of enabling demandimport on Python 3.x, I'm working to omit
> > code transformation for demandimport.py by changes like below:
> > 
> > diff -r cf6739a27b8f mercurial/__init__.py
> > --- a/mercurial/__init__.py     Wed Aug 03 22:34:54 2016 +0900
> > +++ b/mercurial/__init__.py     Wed Aug 03 22:47:17 2016 +0900
> > @@ -310,6 +310,10 @@
> >         The added header has the form ``HG<VERSION>``. That is a literal
> >         ``HG`` with 2 binary bytes indicating the transformation version.
> >         """
> > +        _notransform = set([
> > +            'mercurial.demandimport',
> > +        ])
> > +
> >         def get_data(self, path):
> >             data = super(hgloader, self).get_data(path)
> > 
> > @@ -336,9 +340,10 @@
> > 
> >         def source_to_code(self, data, path):
> >             """Perform token transformation before compilation."""
> > -            buf = io.BytesIO(data)
> > -            tokens = tokenize.tokenize(buf.readline)
> > -            data = tokenize.untokenize(replacetokens(list(tokens)))
> > +            if self.name not in self._notransform:
> > +                buf = io.BytesIO(data)
> > +                tokens = tokenize.tokenize(buf.readline)
> > +                data = tokenize.untokenize(replacetokens(list(tokens)))
> >             # Python's built-in importer strips frames from exceptions raised
> >             # for this code. Unfortunately, that mechanism isn't extensible
> >             # and our frame will be blamed for the import failure. There
> > 
> > 
> > If (almost) all of operations with string literal in target source
> > code requires unicode-ness on Python 3.x, this omitting can reduce
> > adding explicit 'u' prefix to existing string literals.
> > 
> > For example, all operations with string literal in demandimport.py are
> > related to APIs below, which accept only unicode (as str) on Python
> > 3.x.
> > 
> >  - manipulate module name
> >    split(), formatting with "%s", __contains__(), and so on
> >  - access to attributes by name
> >  - access to values in os.environ
> >  - access to values in sys.builtin_module_names
> > 
> > pycompat.py and i18n.py also seem to work with this omitting. At short
> > glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
> > prefix might be needed, though)
> > 
> > How about this omitting ?
> 
> I can go both ways. On one hand, not doing the transformation is
> ideal: the transforming is a giant hack to make porting more
> manageable. On the other, consistency is also good. Having to
> remember which modules are transformed and which aren't could be
> painful.
> 
> I like the idea of something in the file that would tell the loader
> not to transform. And I think we have something already: "from
> __future__ import unicode_literals." Although that would use Unicode
> types everywhere, which isn't wanted when interfacing with certain
> Python APIs. So maybe we could throw a special comment at the top of
> the file? "# hgnotransform" or some such.
> 

Yeah, marking on file side is better than black (white?) list !

I'll try to work in that direction.


> > 
> > 
> >> Excerpts from Pulkit Goyal's message of 2016-08-03 01:57:23 +0530:
> >>> # HG changeset patch
> >>> # User Pulkit Goyal <7895pulkit@gmail.com>
> >>> # Date 1470161385 -19800
> >>> #      Tue Aug 02 23:39:45 2016 +0530
> >>> # Node ID c03543a126719097a1a61c8e5ef5fcb222262315
> >>> # Parent  73ff159923c1f05899c27238409ca398342d9ae0
> >>> py3: use unicode literals in encoding.py
> >>> 
> >>> The custom module loader adds a b'' everywhere and hence making everything bytes. There are some instances
> >>> where we need to have unicodes. This patch deals with such instances in encoding.py. Moreover this patch also
> >>> updates the output of test-check-py3-compat.t at some places which was left unchanged.
> >>> 
> >>> This series of patches is work of Gregory Szorc and are taken from https://hg.mozilla.org/users/gszorc_mozilla.com/hg/shortlog/py3 .
> >> _______________________________________________
> >> Mercurial-devel mailing list
> >> Mercurial-devel@mercurial-scm.org
> >> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> > 
> > ----------------------------------------------------------------------
> > [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Yuya Nishihara - Aug. 4, 2016, 12:42 p.m.
On Thu, 04 Aug 2016 00:18:19 +0900, FUJIWARA Katsunori wrote:
> If (almost) all of operations with string literal in target source
> code requires unicode-ness on Python 3.x, this omitting can reduce
> adding explicit 'u' prefix to existing string literals.
> 
> For example, all operations with string literal in demandimport.py are
> related to APIs below, which accept only unicode (as str) on Python
> 3.x.
> 
>   - manipulate module name
>     split(), formatting with "%s", __contains__(), and so on
>   - access to attributes by name
>   - access to values in os.environ
>   - access to values in sys.builtin_module_names
> 
> pycompat.py and i18n.py also seem to work with this omitting. At short
> glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
> prefix might be needed, though)

i18n.setdatapath() would need b'locale', and pure.osutil.posixfile() would
need b'r', b'w', etc. because u'r' != b'r' on Python 3.

I don't think it's good idea to switch b''/u'' requirement per file.
Jun Wu - Aug. 4, 2016, 12:59 p.m.
Excerpts from Yuya Nishihara's message of 2016-08-04 21:42:22 +0900:
> I don't think it's good idea to switch b''/u'' requirement per file.

+1. This may lead us to split code into different files while it's actually
unnecessary.

I'd like a smarter transformer handling cases like os.environ automatically.

If we are going to have switches in file comments, I think it's better to
make it possible at chunk-level instead of just file-level.
Pierre-Yves David - Aug. 4, 2016, 1:11 p.m.
On 08/04/2016 02:59 PM, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2016-08-04 21:42:22 +0900:
>> I don't think it's good idea to switch b''/u'' requirement per file.
>
> +1. This may lead us to split code into different files while it's actually
> unnecessary.

yep, keeping different behavior from one file to another will create fun 
situation when moving code around. I think we should avoid that.

> I'd like a smarter transformer handling cases like os.environ automatically.

from previous discussion (and environ/environb plateform madness) I was 
under the impression we needed an utility wrapper to access environ 
anyway. Could the complexity lives here instead?

> If we are going to have switches in file comments, I think it's better to
> make it possible at chunk-level instead of just file-level.

At that point I feel like we are probably better off using prefix.
Martijn Pieters - Aug. 4, 2016, 2:30 p.m.
On 3 August 2016 at 16:31, Gregory Szorc <gregory.szorc@gmail.com> wrote:
> I like the idea of something in the file that would tell the loader not to transform. And I think we have something already: "from __future__ import unicode_literals." Although that would use Unicode types everywhere, which isn't wanted when interfacing with certain Python APIs. So maybe we could throw a special comment at the top of the file? "# hgnotransform" or some such.

I like this better too; either per file or per chunk (although that'd
require start and end markers, not that nice either). At least it'd be
clear where transformation isn't being applied *in the file itself*.

If we combined this with a custom os.environ wrapper, that wrapper
could live in a module that had transformations disabled, localising
the issue.
Gregory Szorc - Aug. 4, 2016, 4:23 p.m.
> On Aug 4, 2016, at 06:11, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> 
> 
> 
>> On 08/04/2016 02:59 PM, Jun Wu wrote:
>> Excerpts from Yuya Nishihara's message of 2016-08-04 21:42:22 +0900:
>>> I don't think it's good idea to switch b''/u'' requirement per file.
>> 
>> +1. This may lead us to split code into different files while it's actually
>> unnecessary.
> 
> yep, keeping different behavior from one file to another will create fun situation when moving code around. I think we should avoid that.
> 
>> I'd like a smarter transformer handling cases like os.environ automatically.
> 
> from previous discussion (and environ/environb plateform madness) I was under the impression we needed an utility wrapper to access environ anyway. Could the complexity lives here instead?

Don't we already hang the environ off ui somewhere? That seems like the ideal place to make these changes.

A number of the bytes/str issues we're seeing now occur at *import time*. It's a best practice for module imports to be side-effect free. It is scope bloat, but refactoring this code so it is called after import - by some kind of global hginit() would be a step in the right direction IMO. We could then pass some kind of context object around instead of relying on global state. Remember: global variables are usually considered evil and os.environ, sys.args, pwd, etc are al global variables.

> 
>> If we are going to have switches in file comments, I think it's better to
>> make it possible at chunk-level instead of just file-level.
> 
> At that point I feel like we are probably better off using prefix.
> 
> -- 
> Pierre-Yves David
Katsunori FUJIWARA - Aug. 4, 2016, 11:16 p.m.
At Thu, 4 Aug 2016 21:42:22 +0900,
Yuya Nishihara wrote:
> 
> On Thu, 04 Aug 2016 00:18:19 +0900, FUJIWARA Katsunori wrote:
> > If (almost) all of operations with string literal in target source
> > code requires unicode-ness on Python 3.x, this omitting can reduce
> > adding explicit 'u' prefix to existing string literals.
> > 
> > For example, all operations with string literal in demandimport.py are
> > related to APIs below, which accept only unicode (as str) on Python
> > 3.x.
> > 
> >   - manipulate module name
> >     split(), formatting with "%s", __contains__(), and so on
> >   - access to attributes by name
> >   - access to values in os.environ
> >   - access to values in sys.builtin_module_names
> > 
> > pycompat.py and i18n.py also seem to work with this omitting. At short
> > glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
> > prefix might be needed, though)
> 
> i18n.setdatapath() would need b'locale', and pure.osutil.posixfile() would
> need b'r', b'w', etc. because u'r' != b'r' on Python 3.

I think that 'locale' in i18n.setdatapath() should be 'str' of Python
at runtime because:

  (1) i18n.setdatapath() is invoked with os.path.dirname()-ed one of
      below in util.py:

      - sys.executable => str
      - __file__ => str

  (2) os.path.join() doesn't accept u'' and b'' at once on Python 3.x,
      as mentioned in 02/10 of this series

What do I overlook the thing expecting 'locale' to be bytes on every
Python ?

> I don't think it's good idea to switch b''/u'' requirement per file.
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Yuya Nishihara - Aug. 5, 2016, 2:23 p.m.
On Fri, 05 Aug 2016 08:16:50 +0900, FUJIWARA Katsunori wrote:
> At Thu, 4 Aug 2016 21:42:22 +0900,
> Yuya Nishihara wrote:
> > 
> > On Thu, 04 Aug 2016 00:18:19 +0900, FUJIWARA Katsunori wrote:
> > > If (almost) all of operations with string literal in target source
> > > code requires unicode-ness on Python 3.x, this omitting can reduce
> > > adding explicit 'u' prefix to existing string literals.
> > > 
> > > For example, all operations with string literal in demandimport.py are
> > > related to APIs below, which accept only unicode (as str) on Python
> > > 3.x.
> > > 
> > >   - manipulate module name
> > >     split(), formatting with "%s", __contains__(), and so on
> > >   - access to attributes by name
> > >   - access to values in os.environ
> > >   - access to values in sys.builtin_module_names
> > > 
> > > pycompat.py and i18n.py also seem to work with this omitting. At short
> > > glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
> > > prefix might be needed, though)
> > 
> > i18n.setdatapath() would need b'locale', and pure.osutil.posixfile() would
> > need b'r', b'w', etc. because u'r' != b'r' on Python 3.
> 
> I think that 'locale' in i18n.setdatapath() should be 'str' of Python
> at runtime because:
> 
>   (1) i18n.setdatapath() is invoked with os.path.dirname()-ed one of
>       below in util.py:
> 
>       - sys.executable => str
>       - __file__ => str
> 
>   (2) os.path.join() doesn't accept u'' and b'' at once on Python 3.x,
>       as mentioned in 02/10 of this series

Since both i18n.setdatapath() and util.datapath are public, they should be
bytes-API. Otherwise, we would have to remember which function/constant is
which.
Katsunori FUJIWARA - Aug. 5, 2016, 4:15 p.m.
At Fri, 5 Aug 2016 23:23:39 +0900,
Yuya Nishihara wrote:
> 
> On Fri, 05 Aug 2016 08:16:50 +0900, FUJIWARA Katsunori wrote:
> > At Thu, 4 Aug 2016 21:42:22 +0900,
> > Yuya Nishihara wrote:
> > > 
> > > On Thu, 04 Aug 2016 00:18:19 +0900, FUJIWARA Katsunori wrote:
> > > > If (almost) all of operations with string literal in target source
> > > > code requires unicode-ness on Python 3.x, this omitting can reduce
> > > > adding explicit 'u' prefix to existing string literals.
> > > > 
> > > > For example, all operations with string literal in demandimport.py are
> > > > related to APIs below, which accept only unicode (as str) on Python
> > > > 3.x.
> > > > 
> > > >   - manipulate module name
> > > >     split(), formatting with "%s", __contains__(), and so on
> > > >   - access to attributes by name
> > > >   - access to values in os.environ
> > > >   - access to values in sys.builtin_module_names
> > > > 
> > > > pycompat.py and i18n.py also seem to work with this omitting. At short
> > > > glance, maybe, pure/osutil.py does, too ? (a few extra explicit 'b'
> > > > prefix might be needed, though)
> > > 
> > > i18n.setdatapath() would need b'locale', and pure.osutil.posixfile() would
> > > need b'r', b'w', etc. because u'r' != b'r' on Python 3.
> > 
> > I think that 'locale' in i18n.setdatapath() should be 'str' of Python
> > at runtime because:
> > 
> >   (1) i18n.setdatapath() is invoked with os.path.dirname()-ed one of
> >       below in util.py:
> > 
> >       - sys.executable => str
> >       - __file__ => str
> > 
> >   (2) os.path.join() doesn't accept u'' and b'' at once on Python 3.x,
> >       as mentioned in 02/10 of this series
> 
> Since both i18n.setdatapath() and util.datapath are public, they should be
> bytes-API. Otherwise, we would have to remember which function/constant is
> which.
> 

Oh, I overlooked that util.datapath is used also by other than
invocation of i18n.setdatapath(). Thank you for pointing it out!

Then, for consistency, we should add the code to ensure bytes-ness of
datapath in util.py before changing i18n.setdatapath(), shouldn't we ?

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Yuya Nishihara - Aug. 6, 2016, 3:04 a.m.
On Sat, 06 Aug 2016 01:15:35 +0900, FUJIWARA Katsunori wrote:
> At Fri, 5 Aug 2016 23:23:39 +0900,
> Yuya Nishihara wrote:
> > On Fri, 05 Aug 2016 08:16:50 +0900, FUJIWARA Katsunori wrote:
> > > I think that 'locale' in i18n.setdatapath() should be 'str' of Python
> > > at runtime because:
> > > 
> > >   (1) i18n.setdatapath() is invoked with os.path.dirname()-ed one of
> > >       below in util.py:
> > > 
> > >       - sys.executable => str
> > >       - __file__ => str
> > > 
> > >   (2) os.path.join() doesn't accept u'' and b'' at once on Python 3.x,
> > >       as mentioned in 02/10 of this series
> > 
> > Since both i18n.setdatapath() and util.datapath are public, they should be
> > bytes-API. Otherwise, we would have to remember which function/constant is
> > which.
> > 
> 
> Oh, I overlooked that util.datapath is used also by other than
> invocation of i18n.setdatapath(). Thank you for pointing it out!
> 
> Then, for consistency, we should add the code to ensure bytes-ness of
> datapath in util.py before changing i18n.setdatapath(), shouldn't we ?

Maybe we'll need utility functions for way around Py3's unicode-oriented APIs.
I don't know how we can reliably do reverse conversion from unicode to bytes
executable path.

Patch

diff -r cf6739a27b8f mercurial/__init__.py
--- a/mercurial/__init__.py     Wed Aug 03 22:34:54 2016 +0900
+++ b/mercurial/__init__.py     Wed Aug 03 22:47:17 2016 +0900
@@ -310,6 +310,10 @@ 
         The added header has the form ``HG<VERSION>``. That is a literal
         ``HG`` with 2 binary bytes indicating the transformation version.
         """
+        _notransform = set([
+            'mercurial.demandimport',
+        ])
+
         def get_data(self, path):
             data = super(hgloader, self).get_data(path)

@@ -336,9 +340,10 @@ 

         def source_to_code(self, data, path):
             """Perform token transformation before compilation."""
-            buf = io.BytesIO(data)
-            tokens = tokenize.tokenize(buf.readline)
-            data = tokenize.untokenize(replacetokens(list(tokens)))
+            if self.name not in self._notransform:
+                buf = io.BytesIO(data)
+                tokens = tokenize.tokenize(buf.readline)
+                data = tokenize.untokenize(replacetokens(list(tokens)))
             # Python's built-in importer strips frames from exceptions raised
             # for this code. Unfortunately, that mechanism isn't extensible
             # and our frame will be blamed for the import failure. There