Patchwork [V2] py3: a second argument to open can't be bytes

login
register
mail settings
Submitter Martijn Pieters
Date Oct. 9, 2016, 12:10 p.m.
Message ID <82489cd912f332be976c.1476015040@mjpieters-mbp>
Download mbox | patch
Permalink /patch/16989/
State Accepted
Headers show

Comments

Martijn Pieters - Oct. 9, 2016, 12:10 p.m.
# HG changeset patch
# User Martijn Pieters <mjpieters@fb.com>
# Date 1476015001 -7200
#      Sun Oct 09 14:10:01 2016 +0200
# Node ID 82489cd912f332be976cf432673ad47af0d04cd7
# Parent  a56076f85aa6aa728457ecc571ff58514bc59896
py3: a second argument to open can't be bytes
Augie Fackler - Oct. 9, 2016, 4:14 p.m.
On Sun, Oct 09, 2016 at 02:10:40PM +0200, Martijn Pieters wrote:
> # HG changeset patch
> # User Martijn Pieters <mjpieters@fb.com>
> # Date 1476015001 -7200
> #      Sun Oct 09 14:10:01 2016 +0200
> # Node ID 82489cd912f332be976cf432673ad47af0d04cd7
> # Parent  a56076f85aa6aa728457ecc571ff58514bc59896
> py3: a second argument to open can't be bytes

Queued, thanks!

>
> diff --git a/mercurial/__init__.py b/mercurial/__init__.py
> --- a/mercurial/__init__.py
> +++ b/mercurial/__init__.py
> @@ -305,6 +305,24 @@
>                      except IndexError:
>                          pass
>
> +                # Bare open call (not an attribute on something else)
> +                if (fn == 'open' and not (prevtoken.type == token.OP and
> +                                          prevtoken.string == '.')):
> +                    try:
> +                        # (NAME, 'open')
> +                        # (OP, '(')
> +                        # (NAME|STRING, 'filename')
> +                        # (OP, ',')
> +                        # (NAME|STRING, mode)
> +                        st = tokens[i + 4]
> +                        if (st.type == token.STRING and
> +                                st.string[0] in ("'", '"')):
> +                            rt = tokenize.TokenInfo(st.type, 'u%s' % st.string,
> +                                                    st.start, st.end, st.line)
> +                            tokens[i + 4] = rt
> +                    except IndexError:
> +                        pass
> +
>                  # It changes iteritems to items as iteritems is not
>                  # present in Python 3 world.
>                  if fn == 'iteritems':
> @@ -319,7 +337,7 @@
>      # ``replacetoken`` or any mechanism that changes semantics of module
>      # loading is changed. Otherwise cached bytecode may get loaded without
>      # the new transformation mechanisms applied.
> -    BYTECODEHEADER = b'HG\x00\x04'
> +    BYTECODEHEADER = b'HG\x00\x05'
>
>      class hgloader(importlib.machinery.SourceFileLoader):
>          """Custom module loader that transforms source code.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Pierre-Yves David - Oct. 11, 2016, 4:50 p.m.
On 10/09/2016 06:14 PM, Augie Fackler wrote:
> On Sun, Oct 09, 2016 at 02:10:40PM +0200, Martijn Pieters wrote:
>> # HG changeset patch
>> # User Martijn Pieters <mjpieters@fb.com>
>> # Date 1476015001 -7200
>> #      Sun Oct 09 14:10:01 2016 +0200
>> # Node ID 82489cd912f332be976cf432673ad47af0d04cd7
>> # Parent  a56076f85aa6aa728457ecc571ff58514bc59896
>> py3: a second argument to open can't be bytes
>
> Queued, thanks!

  I'm trying to double review this (to have it move from hg-commited to 
public phase) and this is a bit too obscure for me (I assume there was 
much more context around the py3 table). Martijn can you give me a bit 
more context about what it going on here? What problem are we fixing and 
how is this the right way to do it ?

>> diff --git a/mercurial/__init__.py b/mercurial/__init__.py
>> --- a/mercurial/__init__.py
>> +++ b/mercurial/__init__.py
>> @@ -305,6 +305,24 @@
>>                      except IndexError:
>>                          pass
>>
>> +                # Bare open call (not an attribute on something else)
>> +                if (fn == 'open' and not (prevtoken.type == token.OP and
>> +                                          prevtoken.string == '.')):
>> +                    try:
>> +                        # (NAME, 'open')
>> +                        # (OP, '(')
>> +                        # (NAME|STRING, 'filename')
>> +                        # (OP, ',')
>> +                        # (NAME|STRING, mode)
>> +                        st = tokens[i + 4]
>> +                        if (st.type == token.STRING and
>> +                                st.string[0] in ("'", '"')):
>> +                            rt = tokenize.TokenInfo(st.type, 'u%s' % st.string,
>> +                                                    st.start, st.end, st.line)
>> +                            tokens[i + 4] = rt
>> +                    except IndexError:
>> +                        pass
>> +
>>                  # It changes iteritems to items as iteritems is not
>>                  # present in Python 3 world.
>>                  if fn == 'iteritems':
>> @@ -319,7 +337,7 @@
>>      # ``replacetoken`` or any mechanism that changes semantics of module
>>      # loading is changed. Otherwise cached bytecode may get loaded without
>>      # the new transformation mechanisms applied.
>> -    BYTECODEHEADER = b'HG\x00\x04'
>> +    BYTECODEHEADER = b'HG\x00\x05'
>>
>>      class hgloader(importlib.machinery.SourceFileLoader):
>>          """Custom module loader that transforms source code.
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Martijn Pieters - Oct. 11, 2016, 5:19 p.m.
On 11 October 2016 at 17:50, Pierre-Yves David <
pierre-yves.david@ens-lyon.org> wrote:

>  I'm trying to double review this (to have it move from hg-commited to
> public phase) and this is a bit too obscure for me (I assume there was much
> more context around the py3 table). Martijn can you give me a bit more
> context about what it going on here? What problem are we fixing and how is
> this the right way to do it ?
>

This fixes open(filename, 'r'), open(filename, 'w'), etc. calls. In Python
3, that second argument *must* be a string, you can't use bytes.

The fix is the same as used with getattr() (where the second argument must
also always be a string); in the tokenizer, where we detect calls, if there
is something that looks like a call to open (and is not an attribute, so
the previous token is not a "." dot) then make sure that that second
argument is not converted to a `bytes` object instead.

You can compare this to the "if fn in ('getattr', 'setattr', 'hasattr',
'safehasattr'):" block above this change.
Gregory Szorc - Oct. 11, 2016, 9:09 p.m.
> On Oct 11, 2016, at 18:50, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote:
> 
> 
> 
>> On 10/09/2016 06:14 PM, Augie Fackler wrote:
>>> On Sun, Oct 09, 2016 at 02:10:40PM +0200, Martijn Pieters wrote:
>>> # HG changeset patch
>>> # User Martijn Pieters <mjpieters@fb.com>
>>> # Date 1476015001 -7200
>>> #      Sun Oct 09 14:10:01 2016 +0200
>>> # Node ID 82489cd912f332be976cf432673ad47af0d04cd7
>>> # Parent  a56076f85aa6aa728457ecc571ff58514bc59896
>>> py3: a second argument to open can't be bytes
>> 
>> Queued, thanks!
> 
> I'm trying to double review this (to have it move from hg-commited to public phase) and this is a bit too obscure for me (I assume there was much more context around the py3 table). Martijn can you give me a bit more context about what it going on here? What problem are we fixing and how is this the right way to do it ?

The patch looks good to me. (I wrote the original source transformer code.)

> 
>>> diff --git a/mercurial/__init__.py b/mercurial/__init__.py
>>> --- a/mercurial/__init__.py
>>> +++ b/mercurial/__init__.py
>>> @@ -305,6 +305,24 @@
>>>                     except IndexError:
>>>                         pass
>>> 
>>> +                # Bare open call (not an attribute on something else)
>>> +                if (fn == 'open' and not (prevtoken.type == token.OP and
>>> +                                          prevtoken.string == '.')):
>>> +                    try:
>>> +                        # (NAME, 'open')
>>> +                        # (OP, '(')
>>> +                        # (NAME|STRING, 'filename')
>>> +                        # (OP, ',')
>>> +                        # (NAME|STRING, mode)
>>> +                        st = tokens[i + 4]
>>> +                        if (st.type == token.STRING and
>>> +                                st.string[0] in ("'", '"')):
>>> +                            rt = tokenize.TokenInfo(st.type, 'u%s' % st.string,
>>> +                                                    st.start, st.end, st.line)
>>> +                            tokens[i + 4] = rt
>>> +                    except IndexError:
>>> +                        pass
>>> +
>>>                 # It changes iteritems to items as iteritems is not
>>>                 # present in Python 3 world.
>>>                 if fn == 'iteritems':
>>> @@ -319,7 +337,7 @@
>>>     # ``replacetoken`` or any mechanism that changes semantics of module
>>>     # loading is changed. Otherwise cached bytecode may get loaded without
>>>     # the new transformation mechanisms applied.
>>> -    BYTECODEHEADER = b'HG\x00\x04'
>>> +    BYTECODEHEADER = b'HG\x00\x05'
>>> 
>>>     class hgloader(importlib.machinery.SourceFileLoader):
>>>         """Custom module loader that transforms source code.
>>> _______________________________________________
>>> Mercurial-devel mailing list
>>> Mercurial-devel@mercurial-scm.org
>>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>> 
> 
> -- 
> Pierre-Yves David
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - Oct. 12, 2016, 1 p.m.
On Tue, 11 Oct 2016 18:19:23 +0100, Martijn Pieters wrote:
> On 11 October 2016 at 17:50, Pierre-Yves David <
> pierre-yves.david@ens-lyon.org> wrote:
> 
> >  I'm trying to double review this (to have it move from hg-commited to
> > public phase) and this is a bit too obscure for me (I assume there was much
> > more context around the py3 table). Martijn can you give me a bit more
> > context about what it going on here? What problem are we fixing and how is
> > this the right way to do it ?
> >
> 
> This fixes open(filename, 'r'), open(filename, 'w'), etc. calls. In Python
> 3, that second argument *must* be a string, you can't use bytes.
> 
> The fix is the same as used with getattr() (where the second argument must
> also always be a string); in the tokenizer, where we detect calls, if there
> is something that looks like a call to open (and is not an attribute, so
> the previous token is not a "." dot) then make sure that that second
> argument is not converted to a `bytes` object instead.
> 
> You can compare this to the "if fn in ('getattr', 'setattr', 'hasattr',
> 'safehasattr'):" block above this change.

FWIW, the current transformer will also rewrite open(f('foo')).
Martijn Pieters - Oct. 12, 2016, 9:23 p.m.
On 12 Oct 2016, at 14:00, Yuya Nishihara <yuya@tcha.org> wrote:
> FWIW, the current transformer will also rewrite open(f('foo')).

This then also applies to getattr(f('foo'), ...). We'll need to assert that the 3rd token is a comma, I think.
Pierre-Yves David - Oct. 12, 2016, 10:43 p.m.
On 10/12/2016 11:23 PM, Martijn Pieters wrote:
> On 12 Oct 2016, at 14:00, Yuya Nishihara <yuya@tcha.org
> <mailto:yuya@tcha.org>> wrote:
>> FWIW, the current transformer will also rewrite open(f('foo')).
>
> This then also applies to getattr(f('foo'), ...). We'll need to assert
> that the 3rd token is a comma, I think.

Seems like the whole group of rewrite have a similar flaw. We should fix 
it in a follow up.

My current plan is to accept that changeset with an updated description 
incorporating Martijn explanation. I need to find some time to review 
its few descendants first because of a current flaw/limitation in our 
acceptance system.

Cheers,
Pierre-Yves David - Oct. 13, 2016, 2:52 a.m.
On 10/13/2016 12:43 AM, Pierre-Yves David wrote:
>
>
> On 10/12/2016 11:23 PM, Martijn Pieters wrote:
>> On 12 Oct 2016, at 14:00, Yuya Nishihara <yuya@tcha.org
>> <mailto:yuya@tcha.org>> wrote:
>>> FWIW, the current transformer will also rewrite open(f('foo')).
>>
>> This then also applies to getattr(f('foo'), ...). We'll need to assert
>> that the 3rd token is a comma, I think.
>
> Seems like the whole group of rewrite have a similar flaw. We should fix
> it in a follow up.
>
> My current plan is to accept that changeset with an updated description
> incorporating Martijn explanation. I need to find some time to review
> its few descendants first because of a current flaw/limitation in our
> acceptance system.

I've pushed the updated description.

Fellow reviewers, here is what the accept label were before the repush

> 1f01e3e33336 augie 		py3: a second argument to open can't be bytes (mjpieters)
> 4b255d432d66 augie marmoute 	py3: add an os.fsencode backport to ease path handling (mjpieters)
> f2701c406cd3 marmoute 	test-clone: fix some instability in pooled clone race condition test (augie)
> 5da554404a82 marmoute 	test-clone: discard lock-related messages (augie)
> ff6ad83fed8c yuya 		bisect: minor movement of code handle flag updating state (pierre-yves)
> 2cc6dbf15658 yuya 		bisect: rename 'check_code' to match our naming scheme (pierre-yves)
> 6617d5231051 yuya 		bisect: remove code about "update-flag" in check_state (pierre-yves)
> e110b8c76d7c yuya 		bisect: simplify conditional in 'check_state' (pierre-yves)
> 38000dc76eff yuya 		bisect: move check_state into the bisect module (pierre-yves)
> 49639e29f292 yuya 		bisect: factor commonly update sequence (pierre-yves)
> a075dcc7b99f yuya 		bisect: build a displayer only once (pierre-yves)
> 8d079c0594b3 yuya marmoute 	py3: test to check which commands run (7895pulkit)
> 731c206da50c yuya 		debuginstall: use %d instead of %s for formatting an int (augie)
> aa408c1881c5 yuya 		py3: namedtuple takes unicode (journal ext) (mitrandir)
> b5e774ffce88 yuya 		py3: use raw strings in line continuation (convert ext) (mitrandir)
> a75b6c805d96 yuya 		pycompat: only accept a bytestring filepath in Python 2 (mjpieters)
> 30b2ca4bcf35 yuya 		bisect: extra a small initialisation outside of a loop (pierre-yves)
> 58bc268ab994 yuya 		checkcopies: rename 'ca' to 'base' (pierre-yves)
> c415f3b39839 yuya 		checkcopies: minor change to comment (pierre-yves)
> fe417a0192b1 yuya 		checkcopies: add an inline comment about the '_related' call (pierre-yves)
> 0354b8013257 yuya 		checkcopies: extract the '_related' closure (pierre-yves)

Cheers,

Patch

diff --git a/mercurial/__init__.py b/mercurial/__init__.py
--- a/mercurial/__init__.py
+++ b/mercurial/__init__.py
@@ -305,6 +305,24 @@ 
                     except IndexError:
                         pass
 
+                # Bare open call (not an attribute on something else)
+                if (fn == 'open' and not (prevtoken.type == token.OP and
+                                          prevtoken.string == '.')):
+                    try:
+                        # (NAME, 'open')
+                        # (OP, '(')
+                        # (NAME|STRING, 'filename')
+                        # (OP, ',')
+                        # (NAME|STRING, mode)
+                        st = tokens[i + 4]
+                        if (st.type == token.STRING and
+                                st.string[0] in ("'", '"')):
+                            rt = tokenize.TokenInfo(st.type, 'u%s' % st.string,
+                                                    st.start, st.end, st.line)
+                            tokens[i + 4] = rt
+                    except IndexError:
+                        pass
+
                 # It changes iteritems to items as iteritems is not
                 # present in Python 3 world.
                 if fn == 'iteritems':
@@ -319,7 +337,7 @@ 
     # ``replacetoken`` or any mechanism that changes semantics of module
     # loading is changed. Otherwise cached bytecode may get loaded without
     # the new transformation mechanisms applied.
-    BYTECODEHEADER = b'HG\x00\x04'
+    BYTECODEHEADER = b'HG\x00\x05'
 
     class hgloader(importlib.machinery.SourceFileLoader):
         """Custom module loader that transforms source code.