Patchwork [2,of,3,V2] py3: utility functions to convert keys of kwargs to bytes/unicodes

login
register
mail settings
Submitter Pulkit Goyal
Date Dec. 7, 2016, 6:36 p.m.
Message ID <85d610c83bda09dea239.1481135800@pulkit-goyal>
Download mbox | patch
Permalink /patch/17848/
State Superseded
Headers show

Comments

Pulkit Goyal - Dec. 7, 2016, 6:36 p.m.
# HG changeset patch
# User Pulkit Goyal <7895pulkit@gmail.com>
# Date 1481127783 -19800
#      Wed Dec 07 21:53:03 2016 +0530
# Node ID 85d610c83bda09dea2393c22e415dd9656f5a7f2
# Parent  ced854b9dfaa7298b241ac085627b12ecb796dcd
py3: utility functions to convert keys of kwargs to bytes/unicodes

Keys of keyword arguments need to be str(unicodes) on Python 3. We have a lot
of function where we pass keyword arguments. Having utility functions to help
converting keys to unicodes before passing and convert back them to bytes once
passed into the function will be helpful. We now have functions named
pycompat.strkwargs(dic) and pycompat.byteskwargs(dic) to help us.
Pulkit Goyal - Dec. 7, 2016, 6:43 p.m.
This is V1 sandwiched between two V2 patches. I found no better way to
send the series.

On Thu, Dec 8, 2016 at 12:06 AM, Pulkit Goyal <7895pulkit@gmail.com> wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1481127783 -19800
> #      Wed Dec 07 21:53:03 2016 +0530
> # Node ID 85d610c83bda09dea2393c22e415dd9656f5a7f2
> # Parent  ced854b9dfaa7298b241ac085627b12ecb796dcd
> py3: utility functions to convert keys of kwargs to bytes/unicodes
>
> Keys of keyword arguments need to be str(unicodes) on Python 3. We have a lot
> of function where we pass keyword arguments. Having utility functions to help
> converting keys to unicodes before passing and convert back them to bytes once
> passed into the function will be helpful. We now have functions named
> pycompat.strkwargs(dic) and pycompat.byteskwargs(dic) to help us.
>
> diff -r ced854b9dfaa -r 85d610c83bda mercurial/pycompat.py
> --- a/mercurial/pycompat.py     Tue Dec 06 06:36:36 2016 +0530
> +++ b/mercurial/pycompat.py     Wed Dec 07 21:53:03 2016 +0530
> @@ -103,6 +103,22 @@
>          args = [a.encode('latin-1') for a in args]
>          return opts, args
>
> +    # keys of keyword arguments in Python need to be strings which are unicodes
> +    # Python 3. This function take keyword arguments, convert the keys to str
> +    # if they are in bytes.
> +    def strkwargs(dic):
> +        dic = {(k.decode('latin-1') if isinstance(k, bytes) else k): v
> +                                                    for k, v in dic.items()}
> +        return dic
> +
> +    # keys of keyword arguments need to be unicode while passing into a
> +    # a function. This function helps us to convert those keys back to bytes
> +    # again as we need to deal with bytes.
> +    def byteskwargs(dic):
> +        dic = {(k.encode('latin-1') if isinstance(k, str) else k): v
> +                                                    for k, v in dic.items()}
> +        return dic
> +
>  else:
>      def sysstr(s):
>          return s
> @@ -125,6 +141,12 @@
>      def getoptb(args, shortlist, namelist):
>          return getopt.getopt(args, shortlist, namelist)
>
> +    def strkwargs(dic):
> +        return dic
> +
> +    def byteskwargs(dic):
> +        return dic
> +
>      osname = os.name
>      ospathsep = os.pathsep
>      ossep = os.sep
Yuya Nishihara - Dec. 8, 2016, 3:02 p.m.
On Thu, 08 Dec 2016 00:06:40 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1481127783 -19800
> #      Wed Dec 07 21:53:03 2016 +0530
> # Node ID 85d610c83bda09dea2393c22e415dd9656f5a7f2
> # Parent  ced854b9dfaa7298b241ac085627b12ecb796dcd
> py3: utility functions to convert keys of kwargs to bytes/unicodes

> --- a/mercurial/pycompat.py	Tue Dec 06 06:36:36 2016 +0530
> +++ b/mercurial/pycompat.py	Wed Dec 07 21:53:03 2016 +0530
> @@ -103,6 +103,22 @@
>          args = [a.encode('latin-1') for a in args]
>          return opts, args
>  
> +    # keys of keyword arguments in Python need to be strings which are unicodes
> +    # Python 3. This function take keyword arguments, convert the keys to str
> +    # if they are in bytes.
> +    def strkwargs(dic):
> +        dic = {(k.decode('latin-1') if isinstance(k, bytes) else k): v
> +                                                    for k, v in dic.items()}
> +        return dic
> +
> +    # keys of keyword arguments need to be unicode while passing into a
> +    # a function. This function helps us to convert those keys back to bytes
> +    # again as we need to deal with bytes.
> +    def byteskwargs(dic):
> +        dic = {(k.encode('latin-1') if isinstance(k, str) else k): v
> +                                                    for k, v in dic.items()}
> +        return dic

I think we can assume the type of keys must be either bytes or unicode, so
we won't need isinstance() checks.

And no dict comprehension. The code must be parseable by Python 2.6.
Pulkit Goyal - Dec. 8, 2016, 3:07 p.m.
On Thu, Dec 8, 2016 at 8:32 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Thu, 08 Dec 2016 00:06:40 +0530, Pulkit Goyal wrote:
>> # HG changeset patch
>> # User Pulkit Goyal <7895pulkit@gmail.com>
>> # Date 1481127783 -19800
>> #      Wed Dec 07 21:53:03 2016 +0530
>> # Node ID 85d610c83bda09dea2393c22e415dd9656f5a7f2
>> # Parent  ced854b9dfaa7298b241ac085627b12ecb796dcd
>> py3: utility functions to convert keys of kwargs to bytes/unicodes
>
>> --- a/mercurial/pycompat.py   Tue Dec 06 06:36:36 2016 +0530
>> +++ b/mercurial/pycompat.py   Wed Dec 07 21:53:03 2016 +0530
>> @@ -103,6 +103,22 @@
>>          args = [a.encode('latin-1') for a in args]
>>          return opts, args
>>
>> +    # keys of keyword arguments in Python need to be strings which are unicodes
>> +    # Python 3. This function take keyword arguments, convert the keys to str
>> +    # if they are in bytes.
>> +    def strkwargs(dic):
>> +        dic = {(k.decode('latin-1') if isinstance(k, bytes) else k): v
>> +                                                    for k, v in dic.items()}
>> +        return dic
>> +
>> +    # keys of keyword arguments need to be unicode while passing into a
>> +    # a function. This function helps us to convert those keys back to bytes
>> +    # again as we need to deal with bytes.
>> +    def byteskwargs(dic):
>> +        dic = {(k.encode('latin-1') if isinstance(k, str) else k): v
>> +                                                    for k, v in dic.items()}
>> +        return dic
>
> I think we can assume the type of keys must be either bytes or unicode, so
> we won't need isinstance() checks.

On python 3, if passed value is bytes, then .encode() will result in
error. So have to specific on Python 3.

> And no dict comprehension. The code must be parseable by Python 2.6.

Yeah I remember, actually this code is under a if statement which is
executed if sys.version >= 3.
Yuya Nishihara - Dec. 8, 2016, 3:22 p.m.
On Thu, 8 Dec 2016 20:37:07 +0530, Pulkit Goyal wrote:
> On Thu, Dec 8, 2016 at 8:32 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Thu, 08 Dec 2016 00:06:40 +0530, Pulkit Goyal wrote:
> >> # HG changeset patch
> >> # User Pulkit Goyal <7895pulkit@gmail.com>
> >> # Date 1481127783 -19800
> >> #      Wed Dec 07 21:53:03 2016 +0530
> >> # Node ID 85d610c83bda09dea2393c22e415dd9656f5a7f2
> >> # Parent  ced854b9dfaa7298b241ac085627b12ecb796dcd
> >> py3: utility functions to convert keys of kwargs to bytes/unicodes
> >
> >> --- a/mercurial/pycompat.py   Tue Dec 06 06:36:36 2016 +0530
> >> +++ b/mercurial/pycompat.py   Wed Dec 07 21:53:03 2016 +0530
> >> @@ -103,6 +103,22 @@
> >>          args = [a.encode('latin-1') for a in args]
> >>          return opts, args
> >>
> >> +    # keys of keyword arguments in Python need to be strings which are unicodes
> >> +    # Python 3. This function take keyword arguments, convert the keys to str
> >> +    # if they are in bytes.
> >> +    def strkwargs(dic):
> >> +        dic = {(k.decode('latin-1') if isinstance(k, bytes) else k): v
> >> +                                                    for k, v in dic.items()}
> >> +        return dic
> >> +
> >> +    # keys of keyword arguments need to be unicode while passing into a
> >> +    # a function. This function helps us to convert those keys back to bytes
> >> +    # again as we need to deal with bytes.
> >> +    def byteskwargs(dic):
> >> +        dic = {(k.encode('latin-1') if isinstance(k, str) else k): v
> >> +                                                    for k, v in dic.items()}
> >> +        return dic
> >
> > I think we can assume the type of keys must be either bytes or unicode, so
> > we won't need isinstance() checks.
> 
> On python 3, if passed value is bytes, then .encode() will result in
> error. So have to specific on Python 3.

IMHO, it's a kind of programming error. Unlike pycompat.getattr(), strkwargs()
isn't a drop-in replacement for a standard function, so we can say strkwargs()
only takes a dict of bytes keys.

> > And no dict comprehension. The code must be parseable by Python 2.6.
> 
> Yeah I remember, actually this code is under a if statement which is
> executed if sys.version >= 3.

Dict comprehension is syntax. Runtime dispatch can't stop parsing of the
ispy3 block.
Pulkit Goyal - Dec. 8, 2016, 3:27 p.m.
On Thu, Dec 8, 2016 at 8:52 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Thu, 8 Dec 2016 20:37:07 +0530, Pulkit Goyal wrote:
>> On Thu, Dec 8, 2016 at 8:32 PM, Yuya Nishihara <yuya@tcha.org> wrote:
>> > On Thu, 08 Dec 2016 00:06:40 +0530, Pulkit Goyal wrote:
>> >> # HG changeset patch
>> >> # User Pulkit Goyal <7895pulkit@gmail.com>
>> >> # Date 1481127783 -19800
>> >> #      Wed Dec 07 21:53:03 2016 +0530
>> >> # Node ID 85d610c83bda09dea2393c22e415dd9656f5a7f2
>> >> # Parent  ced854b9dfaa7298b241ac085627b12ecb796dcd
>> >> py3: utility functions to convert keys of kwargs to bytes/unicodes
>> >
>> >> --- a/mercurial/pycompat.py   Tue Dec 06 06:36:36 2016 +0530
>> >> +++ b/mercurial/pycompat.py   Wed Dec 07 21:53:03 2016 +0530
>> >> @@ -103,6 +103,22 @@
>> >>          args = [a.encode('latin-1') for a in args]
>> >>          return opts, args
>> >>
>> >> +    # keys of keyword arguments in Python need to be strings which are unicodes
>> >> +    # Python 3. This function take keyword arguments, convert the keys to str
>> >> +    # if they are in bytes.
>> >> +    def strkwargs(dic):
>> >> +        dic = {(k.decode('latin-1') if isinstance(k, bytes) else k): v
>> >> +                                                    for k, v in dic.items()}
>> >> +        return dic
>> >> +
>> >> +    # keys of keyword arguments need to be unicode while passing into a
>> >> +    # a function. This function helps us to convert those keys back to bytes
>> >> +    # again as we need to deal with bytes.
>> >> +    def byteskwargs(dic):
>> >> +        dic = {(k.encode('latin-1') if isinstance(k, str) else k): v
>> >> +                                                    for k, v in dic.items()}
>> >> +        return dic
>> >
>> > I think we can assume the type of keys must be either bytes or unicode, so
>> > we won't need isinstance() checks.
>>
>> On python 3, if passed value is bytes, then .encode() will result in
>> error. So have to specific on Python 3.
>
> IMHO, it's a kind of programming error. Unlike pycompat.getattr(), strkwargs()
> isn't a drop-in replacement for a standard function, so we can say strkwargs()
> only takes a dict of bytes keys.

Okay I will remove the isinstances. I am wondering is a dict where
keys will be both bytes and unicodes can exist in our codebase or not,
just to make sure those cases are catched too.
>
>> > And no dict comprehension. The code must be parseable by Python 2.6.
>>
>> Yeah I remember, actually this code is under a if statement which is
>> executed if sys.version >= 3.
>
> Dict comprehension is syntax. Runtime dispatch can't stop parsing of the
> ispy3 block.

Okay I will send a V3.

Patch

diff -r ced854b9dfaa -r 85d610c83bda mercurial/pycompat.py
--- a/mercurial/pycompat.py	Tue Dec 06 06:36:36 2016 +0530
+++ b/mercurial/pycompat.py	Wed Dec 07 21:53:03 2016 +0530
@@ -103,6 +103,22 @@ 
         args = [a.encode('latin-1') for a in args]
         return opts, args
 
+    # keys of keyword arguments in Python need to be strings which are unicodes
+    # Python 3. This function take keyword arguments, convert the keys to str
+    # if they are in bytes.
+    def strkwargs(dic):
+        dic = {(k.decode('latin-1') if isinstance(k, bytes) else k): v
+                                                    for k, v in dic.items()}
+        return dic
+
+    # keys of keyword arguments need to be unicode while passing into a
+    # a function. This function helps us to convert those keys back to bytes
+    # again as we need to deal with bytes.
+    def byteskwargs(dic):
+        dic = {(k.encode('latin-1') if isinstance(k, str) else k): v
+                                                    for k, v in dic.items()}
+        return dic
+
 else:
     def sysstr(s):
         return s
@@ -125,6 +141,12 @@ 
     def getoptb(args, shortlist, namelist):
         return getopt.getopt(args, shortlist, namelist)
 
+    def strkwargs(dic):
+        return dic
+
+    def byteskwargs(dic):
+        return dic
+
     osname = os.name
     ospathsep = os.pathsep
     ossep = os.sep