Patchwork [07,of,11,py3] dispatch: translate argv back to bytes on Python 3

login
register
mail settings
Submitter Augie Fackler
Date Oct. 9, 2016, 2:16 p.m.
Message ID <f2359d649c2164ba5efb.1476022609@augie-macbookair2.roam.corp.google.com>
Download mbox | patch
Permalink /patch/17003/
State Accepted
Headers show

Comments

Augie Fackler - Oct. 9, 2016, 2:16 p.m.
# HG changeset patch
# User Augie Fackler <augie@google.com>
# Date 1476018603 14400
#      Sun Oct 09 09:10:03 2016 -0400
# Node ID f2359d649c2164ba5efb3c202850064c7d777848
# Parent  88a5fecb60831eea7c44c6d6025ee23513528501
dispatch: translate argv back to bytes on Python 3
Yuya Nishihara - Oct. 11, 2016, 3:20 p.m.
On Sun, 09 Oct 2016 10:16:49 -0400, Augie Fackler wrote:
> # HG changeset patch
> # User Augie Fackler <augie@google.com>
> # Date 1476018603 14400
> #      Sun Oct 09 09:10:03 2016 -0400
> # Node ID f2359d649c2164ba5efb3c202850064c7d777848
> # Parent  88a5fecb60831eea7c44c6d6025ee23513528501
> dispatch: translate argv back to bytes on Python 3
> 
> diff --git a/mercurial/dispatch.py b/mercurial/dispatch.py
> --- a/mercurial/dispatch.py
> +++ b/mercurial/dispatch.py
> @@ -57,7 +57,11 @@ class request(object):
>  
>  def run():
>      "run the command in sys.argv"
> -    sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255)
> +    if sys.version_info > (3,4):
> +        argv = list(map(os.fsencode, sys.argv))
> +    else:
> +        argv = sys.argv

argv may contain arbitrary strings (e.g. commit message, branch, etc.)
unrelated to fsencode. So I think HGENCODING is the best guess here.
Augie Fackler - Oct. 11, 2016, 3:58 p.m.
> On Oct 11, 2016, at 11:20 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> 
> On Sun, 09 Oct 2016 10:16:49 -0400, Augie Fackler wrote:
>> # HG changeset patch
>> # User Augie Fackler <augie@google.com>
>> # Date 1476018603 14400
>> #      Sun Oct 09 09:10:03 2016 -0400
>> # Node ID f2359d649c2164ba5efb3c202850064c7d777848
>> # Parent  88a5fecb60831eea7c44c6d6025ee23513528501
>> dispatch: translate argv back to bytes on Python 3
>> 
>> diff --git a/mercurial/dispatch.py b/mercurial/dispatch.py
>> --- a/mercurial/dispatch.py
>> +++ b/mercurial/dispatch.py
>> @@ -57,7 +57,11 @@ class request(object):
>> 
>> def run():
>>     "run the command in sys.argv"
>> -    sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255)
>> +    if sys.version_info > (3,4):
>> +        argv = list(map(os.fsencode, sys.argv))
>> +    else:
>> +        argv = sys.argv
> 
> argv may contain arbitrary strings (e.g. commit message, branch, etc.)
> unrelated to fsencode. So I think HGENCODING is the best guess here.

I’ll have to find the link later, but on Sunday what I found was some documentation that the args were decoded using fsdecode.
Yuya Nishihara - Oct. 12, 2016, 2:22 p.m.
On Tue, 11 Oct 2016 11:58:54 -0400, Augie Fackler wrote:
> > On Oct 11, 2016, at 11:20 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Sun, 09 Oct 2016 10:16:49 -0400, Augie Fackler wrote:
> >> # HG changeset patch
> >> # User Augie Fackler <augie@google.com>
> >> # Date 1476018603 14400
> >> #      Sun Oct 09 09:10:03 2016 -0400
> >> # Node ID f2359d649c2164ba5efb3c202850064c7d777848
> >> # Parent  88a5fecb60831eea7c44c6d6025ee23513528501
> >> dispatch: translate argv back to bytes on Python 3
> >> 
> >> diff --git a/mercurial/dispatch.py b/mercurial/dispatch.py
> >> --- a/mercurial/dispatch.py
> >> +++ b/mercurial/dispatch.py
> >> @@ -57,7 +57,11 @@ class request(object):
> >> 
> >> def run():
> >>     "run the command in sys.argv"
> >> -    sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255)
> >> +    if sys.version_info > (3,4):
> >> +        argv = list(map(os.fsencode, sys.argv))
> >> +    else:
> >> +        argv = sys.argv
> > 
> > argv may contain arbitrary strings (e.g. commit message, branch, etc.)
> > unrelated to fsencode. So I think HGENCODING is the best guess here.
> 
> I’ll have to find the link later, but on Sunday what I found was some documentation that the args were decoded using fsdecode.

Ugh, I never though Python 3 provides no way to get bytes argv on Unix.

On Unix, they appear to call Py_DecodeLocale(), which I think identical to
os.fsdecode() (though the implementation is different.) On Windows, wchar
argv is directly passed from the OS. I don't know whether it is the same
as the fsencode or not.

https://hg.python.org/cpython/file/v3.5.1/Programs/python.c#l55

Patch

diff --git a/mercurial/dispatch.py b/mercurial/dispatch.py
--- a/mercurial/dispatch.py
+++ b/mercurial/dispatch.py
@@ -57,7 +57,11 @@  class request(object):
 
 def run():
     "run the command in sys.argv"
-    sys.exit((dispatch(request(sys.argv[1:])) or 0) & 255)
+    if sys.version_info > (3,4):
+        argv = list(map(os.fsencode, sys.argv))
+    else:
+        argv = sys.argv
+    sys.exit((dispatch(request(argv[1:])) or 0) & 255)
 
 def _getsimilar(symbols, value):
     sim = lambda x: difflib.SequenceMatcher(None, value, x).ratio()