Patchwork [2,of,2] py3: handle multiple arguments in .encode() and .decode()

login
register
mail settings
Submitter Pulkit Goyal
Date Oct. 5, 2016, 2:35 p.m.
Message ID <535c77a356a09c0319c9.1475678118@pulkit-goyal>
Download mbox | patch
Permalink /patch/16867/
State Accepted
Headers show

Comments

Pulkit Goyal - Oct. 5, 2016, 2:35 p.m.
# HG changeset patch
# User Pulkit Goyal <7895pulkit@gmail.com>
# Date 1475596407 -19800
#      Tue Oct 04 21:23:27 2016 +0530
# Node ID 535c77a356a09c0319c9a794bdbec18e9ebb57b2
# Parent  51e49c041614b463953b3973d5b58d8bbdcbbab3
py3: handle multiple arguments in .encode() and .decode()

There is a case and more can be present where these functions have
multiple arguments. Our transformer used to handle the first argument, so
added a loop to handle more arguments if present.
Yuya Nishihara - Oct. 7, 2016, 4:58 a.m.
On Wed, 05 Oct 2016 20:05:18 +0530, Pulkit Goyal wrote:
> # HG changeset patch
> # User Pulkit Goyal <7895pulkit@gmail.com>
> # Date 1475596407 -19800
> #      Tue Oct 04 21:23:27 2016 +0530
> # Node ID 535c77a356a09c0319c9a794bdbec18e9ebb57b2
> # Parent  51e49c041614b463953b3973d5b58d8bbdcbbab3
> py3: handle multiple arguments in .encode() and .decode()
> 
> There is a case and more can be present where these functions have
> multiple arguments. Our transformer used to handle the first argument, so
> added a loop to handle more arguments if present.
> 
> diff -r 51e49c041614 -r 535c77a356a0 mercurial/__init__.py
> --- a/mercurial/__init__.py	Tue Oct 04 20:56:03 2016 +0530
> +++ b/mercurial/__init__.py	Tue Oct 04 21:23:27 2016 +0530
> @@ -278,24 +278,30 @@
>                  # .encode() and .decode() on str/bytes/unicode don't accept
>                  # byte strings on Python 3. Rewrite the token to include the
>                  # unicode literal prefix so the string transformer above doesn't
> -                # add the byte prefix.
> +                # add the byte prefix. The loop helps in handling multiple
> +                # arguments to them.
>                  if (fn in ('encode', 'decode') and
>                      prevtoken.type == token.OP and prevtoken.string == '.'):
>                      # (OP, '.')
>                      # (NAME, 'encode')
>                      # (OP, '(')
>                      # (STRING, 'utf-8')
> +                    # [(OP, ',')]
> +                    # [(STRING, 'ascii')]
>                      # (OP, ')')
> -                    try:
> -                        st = tokens[i + 2]
> -                        if (st.type == token.STRING and
> -                            st.string[0] in ("'", '"')):
> -                            rt = tokenize.TokenInfo(st.type, 'u%s' % st.string,
> -                                                    st.start, st.end, st.line)
> -                            tokens[i + 2] = rt
> -                    except IndexError:
> -                        pass
> -
> +                    j = i
> +                    while (tokens[j + 1].string != ')'):
> +                        try:
> +                            st = tokens[j + 2]
> +                            if (st.type == token.STRING and
> +                                st.string[0] in ("'", '"')):
> +                                rt = tokenize.TokenInfo(st.type,
> +                                    'u%s' % st.string,
> +                                        st.start, st.end, st.line)
> +                                tokens[j + 2] = rt
> +                        except IndexError:
> +                            pass

Perhaps IndexError could be raised at the first tokens[j + 1]. Since we have
"while", it could be written as j + 2 < len(tokens).

Also, we'll need to check the existence of ',' token.

>      # ``replacetoken`` or any mechanism that changes semantics of module
>      # loading is changed. Otherwise cached bytecode may get loaded without
>      # the new transformation mechanisms applied.
> -    BYTECODEHEADER = b'HG\x00\x02'
> +    BYTECODEHEADER = b'HG\x00\x04'

Just curious, why not '\x03' ?

Patch

diff -r 51e49c041614 -r 535c77a356a0 mercurial/__init__.py
--- a/mercurial/__init__.py	Tue Oct 04 20:56:03 2016 +0530
+++ b/mercurial/__init__.py	Tue Oct 04 21:23:27 2016 +0530
@@ -278,24 +278,30 @@ 
                 # .encode() and .decode() on str/bytes/unicode don't accept
                 # byte strings on Python 3. Rewrite the token to include the
                 # unicode literal prefix so the string transformer above doesn't
-                # add the byte prefix.
+                # add the byte prefix. The loop helps in handling multiple
+                # arguments to them.
                 if (fn in ('encode', 'decode') and
                     prevtoken.type == token.OP and prevtoken.string == '.'):
                     # (OP, '.')
                     # (NAME, 'encode')
                     # (OP, '(')
                     # (STRING, 'utf-8')
+                    # [(OP, ',')]
+                    # [(STRING, 'ascii')]
                     # (OP, ')')
-                    try:
-                        st = tokens[i + 2]
-                        if (st.type == token.STRING and
-                            st.string[0] in ("'", '"')):
-                            rt = tokenize.TokenInfo(st.type, 'u%s' % st.string,
-                                                    st.start, st.end, st.line)
-                            tokens[i + 2] = rt
-                    except IndexError:
-                        pass
-
+                    j = i
+                    while (tokens[j + 1].string != ')'):
+                        try:
+                            st = tokens[j + 2]
+                            if (st.type == token.STRING and
+                                st.string[0] in ("'", '"')):
+                                rt = tokenize.TokenInfo(st.type,
+                                    'u%s' % st.string,
+                                        st.start, st.end, st.line)
+                                tokens[j + 2] = rt
+                        except IndexError:
+                            pass
+                        j = j + 2
             # Emit unmodified token.
             yield t
 
@@ -303,7 +309,7 @@ 
     # ``replacetoken`` or any mechanism that changes semantics of module
     # loading is changed. Otherwise cached bytecode may get loaded without
     # the new transformation mechanisms applied.
-    BYTECODEHEADER = b'HG\x00\x02'
+    BYTECODEHEADER = b'HG\x00\x04'
 
     class hgloader(importlib.machinery.SourceFileLoader):
         """Custom module loader that transforms source code.
diff -r 51e49c041614 -r 535c77a356a0 tests/test-check-py3-compat.t
--- a/tests/test-check-py3-compat.t	Tue Oct 04 20:56:03 2016 +0530
+++ b/tests/test-check-py3-compat.t	Tue Oct 04 21:23:27 2016 +0530
@@ -120,53 +120,52 @@ 
   mercurial/httpconnection.py: error importing: <TypeError> Can't mix strings and bytes in path components (error at i18n.py:*)
   mercurial/httppeer.py: error importing: <TypeError> Can't mix strings and bytes in path components (error at i18n.py:*)
   mercurial/i18n.py: error importing module: <TypeError> bytes expected, not str (line *)
-  mercurial/keepalive.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/localrepo.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/lock.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/mail.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/manifest.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/match.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/mdiff.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/merge.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/minirst.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/namespaces.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/obsolete.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/patch.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/pathutil.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/peer.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/profiling.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/pushkey.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/pvec.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/registrar.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/repair.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/repoview.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/revlog.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/revset.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/scmutil.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/scmwindows.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/similar.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/simplemerge.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/sshpeer.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/sshserver.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/sslutil.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/statichttprepo.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/store.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/streamclone.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/subrepo.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/tagmerge.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/tags.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/templatefilters.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/templatekw.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/templater.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/transaction.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/ui.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/unionrepo.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/url.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/util.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
-  mercurial/verify.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
+  mercurial/keepalive.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/localrepo.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/lock.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/mail.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/manifest.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/match.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/mdiff.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/merge.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/minirst.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/namespaces.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/obsolete.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/patch.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/pathutil.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/peer.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/profiling.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/pushkey.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/pvec.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/registrar.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/repair.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/repoview.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/revlog.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/revset.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/scmutil.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/scmwindows.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/similar.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/simplemerge.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/sshpeer.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/sshserver.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/sslutil.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/statichttprepo.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/store.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/streamclone.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/subrepo.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/tagmerge.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/tags.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/templatefilters.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/templatekw.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/templater.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/transaction.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/ui.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/unionrepo.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/url.py: error importing: <TypeError> int() can't convert non-string with explicit base (error at util.py:*)
+  mercurial/verify.py: error importing module: <TypeError> unorderable types: str() >= tuple() (line *)
   mercurial/win32.py: error importing module: <ImportError> No module named 'msvcrt' (line *)
   mercurial/windows.py: error importing module: <ImportError> No module named 'msvcrt' (line *)
-  mercurial/wireproto.py: error importing: <TypeError> encode() argument 2 must be str, not bytes (error at i18n.py:*)
+  mercurial/wireproto.py: error importing module: <TypeError> unorderable types: str() >= tuple() (line *)
 
 #endif