Patchwork [8,of,8] py3: have bytes version of sys.argv

login
register
mail settings
Submitter Pulkit Goyal
Date Nov. 5, 2016, 11:16 p.m.
Message ID <b5fc4e71286dd4f33336.1478387785@pulkit-goyal>
Download mbox | patch
Permalink /patch/17360/
State Accepted
Headers show

Comments

Pulkit Goyal - Nov. 5, 2016, 11:16 p.m.
# HG changeset patch
# User Pulkit Goyal <7895pulkit@gmail.com>
# Date 1478387186 -19800
#      Sun Nov 06 04:36:26 2016 +0530
# Node ID b5fc4e71286dd4f33336e4f38e0b9fb17f51f1e3
# Parent  6eed3ee0df425da61d03bfe024dd082f3176ce5d
py3: have bytes version of sys.argv

sys.argv returns unicodes on Python 3. We need a bytes version for us.
There was also a python bug/feature request which wanted then to implement
one. They rejected and it is quoted in one of the comments that we can use
fsencode() to get a bytes version of sys.argv. Though not sure about its
correctness.

Link to the comment: http://bugs.python.org/issue8776#msg217416

After this patch we will have pycompat.sysargv which will return us bytes
version of sys.argv. If this patch goes in, i will like to make transformer
rewrite sys.argv with pycompat.argv because there are lot of occurences.
Yuya Nishihara - Nov. 6, 2016, 3:34 a.m.
On Sun, 06 Nov 2016 04:46:25 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1478387186 -19800
> #      Sun Nov 06 04:36:26 2016 +0530
> # Node ID b5fc4e71286dd4f33336e4f38e0b9fb17f51f1e3
> # Parent  6eed3ee0df425da61d03bfe024dd082f3176ce5d
> py3: have bytes version of sys.argv
> 
> sys.argv returns unicodes on Python 3. We need a bytes version for us.
> There was also a python bug/feature request which wanted then to implement
> one. They rejected and it is quoted in one of the comments that we can use
> fsencode() to get a bytes version of sys.argv. Though not sure about its
> correctness.
> 
> Link to the comment: http://bugs.python.org/issue8776#msg217416
> 
> After this patch we will have pycompat.sysargv which will return us bytes
> version of sys.argv. If this patch goes in, i will like to make transformer
> rewrite sys.argv with pycompat.argv because there are lot of occurences.
> 
> diff -r 6eed3ee0df42 -r b5fc4e71286d mercurial/pycompat.py
> --- a/mercurial/pycompat.py	Sun Nov 06 04:17:19 2016 +0530
> +++ b/mercurial/pycompat.py	Sun Nov 06 04:36:26 2016 +0530
> @@ -41,6 +41,7 @@
>      osname = os.name.encode('ascii')
>      ospathsep = os.pathsep.encode('ascii')
>      ossep = os.sep.encode('ascii')
> +    sysargv = list(map(os.fsencode, sys.argv))

Looks good to me. Can you add a comment why we can use os.fsencode() here
(and the weirdness of Python 3 on Unix.) We might need a Windows workaround
because the situation is slightly different, but we wouldn't want to care
for now.
Pulkit Goyal - Nov. 6, 2016, 6:25 p.m.
On Sun, Nov 6, 2016 at 9:04 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Sun, 06 Nov 2016 04:46:25 +0530, Pulkit Goyal wrote:
>> # HG changeset patch
>> # User Pulkit Goyal <7895pulkit@gmail.com>
>> # Date 1478387186 -19800
>> #      Sun Nov 06 04:36:26 2016 +0530
>> # Node ID b5fc4e71286dd4f33336e4f38e0b9fb17f51f1e3
>> # Parent  6eed3ee0df425da61d03bfe024dd082f3176ce5d
>> py3: have bytes version of sys.argv
>>
>> sys.argv returns unicodes on Python 3. We need a bytes version for us.
>> There was also a python bug/feature request which wanted then to implement
>> one. They rejected and it is quoted in one of the comments that we can use
>> fsencode() to get a bytes version of sys.argv. Though not sure about its
>> correctness.
>>
>> Link to the comment: http://bugs.python.org/issue8776#msg217416
>>
>> After this patch we will have pycompat.sysargv which will return us bytes
>> version of sys.argv. If this patch goes in, i will like to make transformer
>> rewrite sys.argv with pycompat.argv because there are lot of occurences.
>>
>> diff -r 6eed3ee0df42 -r b5fc4e71286d mercurial/pycompat.py
>> --- a/mercurial/pycompat.py   Sun Nov 06 04:17:19 2016 +0530
>> +++ b/mercurial/pycompat.py   Sun Nov 06 04:36:26 2016 +0530
>> @@ -41,6 +41,7 @@
>>      osname = os.name.encode('ascii')
>>      ospathsep = os.pathsep.encode('ascii')
>>      ossep = os.sep.encode('ascii')
>> +    sysargv = list(map(os.fsencode, sys.argv))
>
> Looks good to me. Can you add a comment why we can use os.fsencode() here
> (and the weirdness of Python 3 on Unix.) We might need a Windows workaround
> because the situation is slightly different, but we wouldn't want to care
> for now.

Well I will resend this patch because I am not sure about its
correctness still. I followed that issue where Victor Stinner, one who
wrote os.environb commented this. There are few doubts/confusions or
maybe I want to just confirm it with MJ once.
Yuya Nishihara - Nov. 7, 2016, 12:56 p.m.
On Sun, 6 Nov 2016 23:55:50 +0530, Pulkit Goyal wrote:
> On Sun, Nov 6, 2016 at 9:04 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Sun, 06 Nov 2016 04:46:25 +0530, Pulkit Goyal wrote:
> >> # HG changeset patch
> >> # User Pulkit Goyal <7895pulkit@gmail.com>
> >> # Date 1478387186 -19800
> >> #      Sun Nov 06 04:36:26 2016 +0530
> >> # Node ID b5fc4e71286dd4f33336e4f38e0b9fb17f51f1e3
> >> # Parent  6eed3ee0df425da61d03bfe024dd082f3176ce5d
> >> py3: have bytes version of sys.argv
> >>
> >> sys.argv returns unicodes on Python 3. We need a bytes version for us.
> >> There was also a python bug/feature request which wanted then to implement
> >> one. They rejected and it is quoted in one of the comments that we can use
> >> fsencode() to get a bytes version of sys.argv. Though not sure about its
> >> correctness.
> >>
> >> Link to the comment: http://bugs.python.org/issue8776#msg217416
> >>
> >> After this patch we will have pycompat.sysargv which will return us bytes
> >> version of sys.argv. If this patch goes in, i will like to make transformer
> >> rewrite sys.argv with pycompat.argv because there are lot of occurences.
> >>
> >> diff -r 6eed3ee0df42 -r b5fc4e71286d mercurial/pycompat.py
> >> --- a/mercurial/pycompat.py   Sun Nov 06 04:17:19 2016 +0530
> >> +++ b/mercurial/pycompat.py   Sun Nov 06 04:36:26 2016 +0530
> >> @@ -41,6 +41,7 @@
> >>      osname = os.name.encode('ascii')
> >>      ospathsep = os.pathsep.encode('ascii')
> >>      ossep = os.sep.encode('ascii')
> >> +    sysargv = list(map(os.fsencode, sys.argv))
> >
> > Looks good to me. Can you add a comment why we can use os.fsencode() here
> > (and the weirdness of Python 3 on Unix.) We might need a Windows workaround
> > because the situation is slightly different, but we wouldn't want to care
> > for now.
> 
> Well I will resend this patch because I am not sure about its
> correctness still. I followed that issue where Victor Stinner, one who
> wrote os.environb commented this. There are few doubts/confusions or
> maybe I want to just confirm it with MJ once.

It's generally wrong to assume argv is in filesystem encoding, but I think
that's okay for Python 3 on Unix. They builds "wchar_t argv" from "char argv"
by Py_DecodeLocale(), which would be identical to fsdecode() on Unix.

https://hg.python.org/cpython/file/v3.5.1/Programs/python.c#l55

On Windows, the native argv appears to be wchar_t, so we'll need a different
hack to simulate the Python 2 (i.e. ANSI Win32 API) behavior.
Yuya Nishihara - Nov. 9, 2016, 12:55 p.m.
(resend with CC the list)

On Mon, 7 Nov 2016 20:40:19 +0530, Pulkit Goyal wrote:
> > On Windows, the native argv appears to be wchar_t, so we'll need a different
> > hack to simulate the Python 2 (i.e. ANSI Win32 API) behavior.
> 
> Well we are not touching the Python 2 behaviour, it remains the same
> with the patch. And it will be great if you can push this one with
> your explanation appended :)

Ok, I'll queue this and send a follow-up patch to add comment.

> After this patch we will have pycompat.sysargv which will return us bytes
> version of sys.argv. If this patch goes in, i will like to make transformer
> rewrite sys.argv with pycompat.argv because there are lot of occurences.

Maybe I said at the sprint, I'm not a big fan of rewriting complex nodes by
the transformer. And some of sys.argv might not work if replaced to bytes argv.

Patch

diff -r 6eed3ee0df42 -r b5fc4e71286d mercurial/pycompat.py
--- a/mercurial/pycompat.py	Sun Nov 06 04:17:19 2016 +0530
+++ b/mercurial/pycompat.py	Sun Nov 06 04:36:26 2016 +0530
@@ -41,6 +41,7 @@ 
     osname = os.name.encode('ascii')
     ospathsep = os.pathsep.encode('ascii')
     ossep = os.sep.encode('ascii')
+    sysargv = list(map(os.fsencode, sys.argv))
 
     def sysstr(s):
         """Return a keyword str to be passed to Python functions such as
@@ -89,6 +90,7 @@ 
     osname = os.name
     ospathsep = os.pathsep
     ossep = os.sep
+    sysargv = sys.argv
 
 stringio = io.StringIO
 empty = _queue.Empty