Patchwork [3,of,3] py3: make a bytes version of getopt.getopt()

login
register
mail settings
Submitter Pulkit Goyal
Date Dec. 6, 2016, 1:56 a.m.
Message ID <ef3b7f10f4ade315a42f.1480989401@pulkit-goyal>
Download mbox | patch
Permalink /patch/17826/
State Accepted
Headers show

Comments

Pulkit Goyal - Dec. 6, 2016, 1:56 a.m.
# HG changeset patch
# User Pulkit Goyal <7895pulkit@gmail.com>
# Date 1480986396 -19800
#      Tue Dec 06 06:36:36 2016 +0530
# Node ID ef3b7f10f4ade315a42f5f9383a8ad1794bb1f01
# Parent  559b73b5d7b919da68bf2ce5b05dd9677ddc1c2d
py3: make a bytes version of getopt.getopt()

getopt.getopt() deals with unicodes on Python 3 internally and if bytes
arguments are passed, then it will return TypeError. So we have now
pycompat.getoptb() which takes bytes arguments, convert them to unicode, call
getopt.getopt() and then convert the returned value back to bytes and then
return those value. All these decoding and encoding is done using fsdecode()
and fsencode().
All the instances of getopt.getopt() are replaced with pycompat.getoptb().
Mateusz Kwapich - Dec. 6, 2016, 5:09 p.m.
The whole series looks good to me.

Best,
Mateusz
Augie Fackler - Dec. 6, 2016, 9:20 p.m.
On Tue, Dec 06, 2016 at 05:09:19PM +0000, Mateusz Kwapich wrote:
> The whole series looks good to me.

Agreed, queued, thanks.

>
> Best,
> Mateusz
>
> --
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - Dec. 7, 2016, 2:07 p.m.
On Tue, 06 Dec 2016 07:26:41 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1480986396 -19800
> #      Tue Dec 06 06:36:36 2016 +0530
> # Node ID ef3b7f10f4ade315a42f5f9383a8ad1794bb1f01
> # Parent  559b73b5d7b919da68bf2ce5b05dd9677ddc1c2d
> py3: make a bytes version of getopt.getopt()

> +    # getopt.getopt() on Python 3 deals with unicodes internally so we cannot
> +    # pass bytes there. Passing unicodes will result in unicodes as return
> +    # values which we need to convert again to bytes. This does all these
> +    # decoding and encoding using fsdecode() and fsencode().
> +    def getoptb(args, shortlist, namelist):
> +        args = [fsdecode(a) for a in args]
> +        shortlist = fsdecode(shortlist)
> +        namelist = fsdecode(shortlist)
> +        opts, args = getopt.getopt(args, shortlist, namelist)
> +        opts = [fsencode(a) for a in opts]
> +        args = [fsencode(a) for a in args]

"opts" is a list of (option, value) pairs.

I don't think using fsdecode/fsencode here is appropriate. Encoding conversion
is lossy in general even if no error occurred. There's n:m mapping between
some crazy encodings (read: Shift_JIS variants) and unicode, for example.

Instead, maybe we can use 'latin1' to convince Python3 by abusing unicode as
a fat bytes?
Pulkit Goyal - Dec. 7, 2016, 2:35 p.m.
On Wed, Dec 7, 2016 at 7:37 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Tue, 06 Dec 2016 07:26:41 +0530, Pulkit Goyal wrote:
>> # HG changeset patch
>> # User Pulkit Goyal <7895pulkit@gmail.com>
>> # Date 1480986396 -19800
>> #      Tue Dec 06 06:36:36 2016 +0530
>> # Node ID ef3b7f10f4ade315a42f5f9383a8ad1794bb1f01
>> # Parent  559b73b5d7b919da68bf2ce5b05dd9677ddc1c2d
>> py3: make a bytes version of getopt.getopt()
>
>> +    # getopt.getopt() on Python 3 deals with unicodes internally so we cannot
>> +    # pass bytes there. Passing unicodes will result in unicodes as return
>> +    # values which we need to convert again to bytes. This does all these
>> +    # decoding and encoding using fsdecode() and fsencode().
>> +    def getoptb(args, shortlist, namelist):
>> +        args = [fsdecode(a) for a in args]
>> +        shortlist = fsdecode(shortlist)
>> +        namelist = fsdecode(shortlist)
>> +        opts, args = getopt.getopt(args, shortlist, namelist)
>> +        opts = [fsencode(a) for a in opts]
>> +        args = [fsencode(a) for a in args]
>
> "opts" is a list of (option, value) pairs.

Yeah, drop this patch, I will send a new version of this one.
>
> I don't think using fsdecode/fsencode here is appropriate. Encoding conversion
> is lossy in general even if no error occurred. There's n:m mapping between
> some crazy encodings (read: Shift_JIS variants) and unicode, for example.
>
> Instead, maybe we can use 'latin1' to convince Python3 by abusing unicode as
> a fat bytes?

In that case pycompat.sysstr() is okay for decoding and encoding using
.encode('latin-1') then.
Pulkit Goyal - Dec. 7, 2016, 3:47 p.m.
On Wed, Dec 7, 2016 at 8:05 PM, Pulkit Goyal <7895pulkit@gmail.com> wrote:
> On Wed, Dec 7, 2016 at 7:37 PM, Yuya Nishihara <yuya@tcha.org> wrote:
>> On Tue, 06 Dec 2016 07:26:41 +0530, Pulkit Goyal wrote:
>>> # HG changeset patch
>>> # User Pulkit Goyal <7895pulkit@gmail.com>
>>> # Date 1480986396 -19800
>>> #      Tue Dec 06 06:36:36 2016 +0530
>>> # Node ID ef3b7f10f4ade315a42f5f9383a8ad1794bb1f01
>>> # Parent  559b73b5d7b919da68bf2ce5b05dd9677ddc1c2d
>>> py3: make a bytes version of getopt.getopt()
>>
>>> +    # getopt.getopt() on Python 3 deals with unicodes internally so we cannot
>>> +    # pass bytes there. Passing unicodes will result in unicodes as return
>>> +    # values which we need to convert again to bytes. This does all these
>>> +    # decoding and encoding using fsdecode() and fsencode().
>>> +    def getoptb(args, shortlist, namelist):
>>> +        args = [fsdecode(a) for a in args]
>>> +        shortlist = fsdecode(shortlist)
>>> +        namelist = fsdecode(shortlist)
Ah, this is wrong, sorry for this, I just found it, please drop this version.
>>> +        opts, args = getopt.getopt(args, shortlist, namelist)
>>> +        opts = [fsencode(a) for a in opts]
>>> +        args = [fsencode(a) for a in args]
>>
>> "opts" is a list of (option, value) pairs.
>
> Yeah, drop this patch, I will send a new version of this one.
>>
>> I don't think using fsdecode/fsencode here is appropriate. Encoding conversion
>> is lossy in general even if no error occurred. There's n:m mapping between
>> some crazy encodings (read: Shift_JIS variants) and unicode, for example.
>>
>> Instead, maybe we can use 'latin1' to convince Python3 by abusing unicode as
>> a fat bytes?
>
> In that case pycompat.sysstr() is okay for decoding and encoding using
> .encode('latin-1') then.
Yuya Nishihara - Dec. 8, 2016, 3:02 p.m.
On Wed, 7 Dec 2016 20:05:22 +0530, Pulkit Goyal wrote:
> On Wed, Dec 7, 2016 at 7:37 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Tue, 06 Dec 2016 07:26:41 +0530, Pulkit Goyal wrote:
> >> # HG changeset patch
> >> # User Pulkit Goyal <7895pulkit@gmail.com>
> >> # Date 1480986396 -19800
> >> #      Tue Dec 06 06:36:36 2016 +0530
> >> # Node ID ef3b7f10f4ade315a42f5f9383a8ad1794bb1f01
> >> # Parent  559b73b5d7b919da68bf2ce5b05dd9677ddc1c2d
> >> py3: make a bytes version of getopt.getopt()
> >
> >> +    # getopt.getopt() on Python 3 deals with unicodes internally so we cannot
> >> +    # pass bytes there. Passing unicodes will result in unicodes as return
> >> +    # values which we need to convert again to bytes. This does all these
> >> +    # decoding and encoding using fsdecode() and fsencode().
> >> +    def getoptb(args, shortlist, namelist):
> >> +        args = [fsdecode(a) for a in args]
> >> +        shortlist = fsdecode(shortlist)
> >> +        namelist = fsdecode(shortlist)
> >> +        opts, args = getopt.getopt(args, shortlist, namelist)
> >> +        opts = [fsencode(a) for a in opts]
> >> +        args = [fsencode(a) for a in args]
> >
> > "opts" is a list of (option, value) pairs.
> 
> Yeah, drop this patch, I will send a new version of this one.

Ok, drops this soon. Running tests.

Patch

diff -r 559b73b5d7b9 -r ef3b7f10f4ad mercurial/fancyopts.py
--- a/mercurial/fancyopts.py	Tue Dec 06 06:27:58 2016 +0530
+++ b/mercurial/fancyopts.py	Tue Dec 06 06:36:36 2016 +0530
@@ -7,10 +7,11 @@ 
 
 from __future__ import absolute_import
 
-import getopt
-
 from .i18n import _
-from . import error
+from . import (
+    error,
+    pycompat,
+)
 
 # Set of flags to not apply boolean negation logic on
 nevernegate = set([
@@ -34,13 +35,14 @@ 
         stopindex = args.index('--')
         extraargs = args[stopindex + 1:]
         args = args[:stopindex]
-    opts, parseargs = getopt.getopt(args, options, longoptions)
+    opts, parseargs = pycompat.getoptb(args, options, longoptions)
     args = []
     while parseargs:
         arg = parseargs.pop(0)
         if arg and arg[0] == '-' and len(arg) > 1:
             parseargs.insert(0, arg)
-            topts, newparseargs = getopt.getopt(parseargs, options, longoptions)
+            topts, newparseargs = pycompat.getoptb(parseargs,\
+                                            options, longoptions)
             opts = opts + topts
             parseargs = newparseargs
         else:
@@ -125,7 +127,7 @@ 
     if gnu:
         parse = gnugetopt
     else:
-        parse = getopt.getopt
+        parse = pycompat.getoptb
     opts, args = parse(args, shortlist, namelist)
 
     # transfer result to state
diff -r 559b73b5d7b9 -r ef3b7f10f4ad mercurial/pycompat.py
--- a/mercurial/pycompat.py	Tue Dec 06 06:27:58 2016 +0530
+++ b/mercurial/pycompat.py	Tue Dec 06 06:36:36 2016 +0530
@@ -10,6 +10,7 @@ 
 
 from __future__ import absolute_import
 
+import getopt
 import os
 import sys
 
@@ -87,6 +88,19 @@ 
     setattr = _wrapattrfunc(builtins.setattr)
     xrange = builtins.range
 
+    # getopt.getopt() on Python 3 deals with unicodes internally so we cannot
+    # pass bytes there. Passing unicodes will result in unicodes as return
+    # values which we need to convert again to bytes. This does all these
+    # decoding and encoding using fsdecode() and fsencode().
+    def getoptb(args, shortlist, namelist):
+        args = [fsdecode(a) for a in args]
+        shortlist = fsdecode(shortlist)
+        namelist = fsdecode(shortlist)
+        opts, args = getopt.getopt(args, shortlist, namelist)
+        opts = [fsencode(a) for a in opts]
+        args = [fsencode(a) for a in args]
+        return opts, args
+
 else:
     def sysstr(s):
         return s
@@ -106,6 +120,9 @@ 
     def fsdecode(filename):
         return filename
 
+    def getoptb(args, shortlist, namelist):
+        return getopt.getopt(args, shortlist, namelist)
+
     osname = os.name
     ospathsep = os.pathsep
     ossep = os.sep
diff -r 559b73b5d7b9 -r ef3b7f10f4ad mercurial/statprof.py
--- a/mercurial/statprof.py	Tue Dec 06 06:27:58 2016 +0530
+++ b/mercurial/statprof.py	Tue Dec 06 06:36:36 2016 +0530
@@ -116,6 +116,10 @@ 
 import threading
 import time
 
+from . import (
+    pycompat,
+)
+
 defaultdict = collections.defaultdict
 contextmanager = contextlib.contextmanager
 
@@ -771,7 +775,7 @@ 
 
     # process options
     try:
-        opts, args = getopt.getopt(sys.argv[optstart:], "hl:f:o:p:",
+        opts, args = pycompat.getoptb(sys.argv[optstart:], "hl:f:o:p:",
                                    ["help", "limit=", "file=", "output-file=", "script-path="])
     except getopt.error as msg:
         print(msg)