Submitter | Martijn Pieters |
---|---|
Date | Oct. 9, 2016, 12:10 p.m. |
Message ID | <82489cd912f332be976c.1476015040@mjpieters-mbp> |
Download | mbox | patch |
Permalink | /patch/16989/ |
State | Accepted |
Headers | show |
Comments
On Sun, Oct 09, 2016 at 02:10:40PM +0200, Martijn Pieters wrote: > # HG changeset patch > # User Martijn Pieters <mjpieters@fb.com> > # Date 1476015001 -7200 > # Sun Oct 09 14:10:01 2016 +0200 > # Node ID 82489cd912f332be976cf432673ad47af0d04cd7 > # Parent a56076f85aa6aa728457ecc571ff58514bc59896 > py3: a second argument to open can't be bytes Queued, thanks! > > diff --git a/mercurial/__init__.py b/mercurial/__init__.py > --- a/mercurial/__init__.py > +++ b/mercurial/__init__.py > @@ -305,6 +305,24 @@ > except IndexError: > pass > > + # Bare open call (not an attribute on something else) > + if (fn == 'open' and not (prevtoken.type == token.OP and > + prevtoken.string == '.')): > + try: > + # (NAME, 'open') > + # (OP, '(') > + # (NAME|STRING, 'filename') > + # (OP, ',') > + # (NAME|STRING, mode) > + st = tokens[i + 4] > + if (st.type == token.STRING and > + st.string[0] in ("'", '"')): > + rt = tokenize.TokenInfo(st.type, 'u%s' % st.string, > + st.start, st.end, st.line) > + tokens[i + 4] = rt > + except IndexError: > + pass > + > # It changes iteritems to items as iteritems is not > # present in Python 3 world. > if fn == 'iteritems': > @@ -319,7 +337,7 @@ > # ``replacetoken`` or any mechanism that changes semantics of module > # loading is changed. Otherwise cached bytecode may get loaded without > # the new transformation mechanisms applied. > - BYTECODEHEADER = b'HG\x00\x04' > + BYTECODEHEADER = b'HG\x00\x05' > > class hgloader(importlib.machinery.SourceFileLoader): > """Custom module loader that transforms source code. > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On 10/09/2016 06:14 PM, Augie Fackler wrote: > On Sun, Oct 09, 2016 at 02:10:40PM +0200, Martijn Pieters wrote: >> # HG changeset patch >> # User Martijn Pieters <mjpieters@fb.com> >> # Date 1476015001 -7200 >> # Sun Oct 09 14:10:01 2016 +0200 >> # Node ID 82489cd912f332be976cf432673ad47af0d04cd7 >> # Parent a56076f85aa6aa728457ecc571ff58514bc59896 >> py3: a second argument to open can't be bytes > > Queued, thanks! I'm trying to double review this (to have it move from hg-commited to public phase) and this is a bit too obscure for me (I assume there was much more context around the py3 table). Martijn can you give me a bit more context about what it going on here? What problem are we fixing and how is this the right way to do it ? >> diff --git a/mercurial/__init__.py b/mercurial/__init__.py >> --- a/mercurial/__init__.py >> +++ b/mercurial/__init__.py >> @@ -305,6 +305,24 @@ >> except IndexError: >> pass >> >> + # Bare open call (not an attribute on something else) >> + if (fn == 'open' and not (prevtoken.type == token.OP and >> + prevtoken.string == '.')): >> + try: >> + # (NAME, 'open') >> + # (OP, '(') >> + # (NAME|STRING, 'filename') >> + # (OP, ',') >> + # (NAME|STRING, mode) >> + st = tokens[i + 4] >> + if (st.type == token.STRING and >> + st.string[0] in ("'", '"')): >> + rt = tokenize.TokenInfo(st.type, 'u%s' % st.string, >> + st.start, st.end, st.line) >> + tokens[i + 4] = rt >> + except IndexError: >> + pass >> + >> # It changes iteritems to items as iteritems is not >> # present in Python 3 world. >> if fn == 'iteritems': >> @@ -319,7 +337,7 @@ >> # ``replacetoken`` or any mechanism that changes semantics of module >> # loading is changed. Otherwise cached bytecode may get loaded without >> # the new transformation mechanisms applied. >> - BYTECODEHEADER = b'HG\x00\x04' >> + BYTECODEHEADER = b'HG\x00\x05' >> >> class hgloader(importlib.machinery.SourceFileLoader): >> """Custom module loader that transforms source code. >> _______________________________________________ >> Mercurial-devel mailing list >> Mercurial-devel@mercurial-scm.org >> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >
On 11 October 2016 at 17:50, Pierre-Yves David < pierre-yves.david@ens-lyon.org> wrote: > I'm trying to double review this (to have it move from hg-commited to > public phase) and this is a bit too obscure for me (I assume there was much > more context around the py3 table). Martijn can you give me a bit more > context about what it going on here? What problem are we fixing and how is > this the right way to do it ? > This fixes open(filename, 'r'), open(filename, 'w'), etc. calls. In Python 3, that second argument *must* be a string, you can't use bytes. The fix is the same as used with getattr() (where the second argument must also always be a string); in the tokenizer, where we detect calls, if there is something that looks like a call to open (and is not an attribute, so the previous token is not a "." dot) then make sure that that second argument is not converted to a `bytes` object instead. You can compare this to the "if fn in ('getattr', 'setattr', 'hasattr', 'safehasattr'):" block above this change.
> On Oct 11, 2016, at 18:50, Pierre-Yves David <pierre-yves.david@ens-lyon.org> wrote: > > > >> On 10/09/2016 06:14 PM, Augie Fackler wrote: >>> On Sun, Oct 09, 2016 at 02:10:40PM +0200, Martijn Pieters wrote: >>> # HG changeset patch >>> # User Martijn Pieters <mjpieters@fb.com> >>> # Date 1476015001 -7200 >>> # Sun Oct 09 14:10:01 2016 +0200 >>> # Node ID 82489cd912f332be976cf432673ad47af0d04cd7 >>> # Parent a56076f85aa6aa728457ecc571ff58514bc59896 >>> py3: a second argument to open can't be bytes >> >> Queued, thanks! > > I'm trying to double review this (to have it move from hg-commited to public phase) and this is a bit too obscure for me (I assume there was much more context around the py3 table). Martijn can you give me a bit more context about what it going on here? What problem are we fixing and how is this the right way to do it ? The patch looks good to me. (I wrote the original source transformer code.) > >>> diff --git a/mercurial/__init__.py b/mercurial/__init__.py >>> --- a/mercurial/__init__.py >>> +++ b/mercurial/__init__.py >>> @@ -305,6 +305,24 @@ >>> except IndexError: >>> pass >>> >>> + # Bare open call (not an attribute on something else) >>> + if (fn == 'open' and not (prevtoken.type == token.OP and >>> + prevtoken.string == '.')): >>> + try: >>> + # (NAME, 'open') >>> + # (OP, '(') >>> + # (NAME|STRING, 'filename') >>> + # (OP, ',') >>> + # (NAME|STRING, mode) >>> + st = tokens[i + 4] >>> + if (st.type == token.STRING and >>> + st.string[0] in ("'", '"')): >>> + rt = tokenize.TokenInfo(st.type, 'u%s' % st.string, >>> + st.start, st.end, st.line) >>> + tokens[i + 4] = rt >>> + except IndexError: >>> + pass >>> + >>> # It changes iteritems to items as iteritems is not >>> # present in Python 3 world. >>> if fn == 'iteritems': >>> @@ -319,7 +337,7 @@ >>> # ``replacetoken`` or any mechanism that changes semantics of module >>> # loading is changed. Otherwise cached bytecode may get loaded without >>> # the new transformation mechanisms applied. >>> - BYTECODEHEADER = b'HG\x00\x04' >>> + BYTECODEHEADER = b'HG\x00\x05' >>> >>> class hgloader(importlib.machinery.SourceFileLoader): >>> """Custom module loader that transforms source code. >>> _______________________________________________ >>> Mercurial-devel mailing list >>> Mercurial-devel@mercurial-scm.org >>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >> _______________________________________________ >> Mercurial-devel mailing list >> Mercurial-devel@mercurial-scm.org >> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >> > > -- > Pierre-Yves David > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On Tue, 11 Oct 2016 18:19:23 +0100, Martijn Pieters wrote: > On 11 October 2016 at 17:50, Pierre-Yves David < > pierre-yves.david@ens-lyon.org> wrote: > > > I'm trying to double review this (to have it move from hg-commited to > > public phase) and this is a bit too obscure for me (I assume there was much > > more context around the py3 table). Martijn can you give me a bit more > > context about what it going on here? What problem are we fixing and how is > > this the right way to do it ? > > > > This fixes open(filename, 'r'), open(filename, 'w'), etc. calls. In Python > 3, that second argument *must* be a string, you can't use bytes. > > The fix is the same as used with getattr() (where the second argument must > also always be a string); in the tokenizer, where we detect calls, if there > is something that looks like a call to open (and is not an attribute, so > the previous token is not a "." dot) then make sure that that second > argument is not converted to a `bytes` object instead. > > You can compare this to the "if fn in ('getattr', 'setattr', 'hasattr', > 'safehasattr'):" block above this change. FWIW, the current transformer will also rewrite open(f('foo')).
On 12 Oct 2016, at 14:00, Yuya Nishihara <yuya@tcha.org> wrote:
> FWIW, the current transformer will also rewrite open(f('foo')).
This then also applies to getattr(f('foo'), ...). We'll need to assert that the 3rd token is a comma, I think.
On 10/12/2016 11:23 PM, Martijn Pieters wrote: > On 12 Oct 2016, at 14:00, Yuya Nishihara <yuya@tcha.org > <mailto:yuya@tcha.org>> wrote: >> FWIW, the current transformer will also rewrite open(f('foo')). > > This then also applies to getattr(f('foo'), ...). We'll need to assert > that the 3rd token is a comma, I think. Seems like the whole group of rewrite have a similar flaw. We should fix it in a follow up. My current plan is to accept that changeset with an updated description incorporating Martijn explanation. I need to find some time to review its few descendants first because of a current flaw/limitation in our acceptance system. Cheers,
On 10/13/2016 12:43 AM, Pierre-Yves David wrote: > > > On 10/12/2016 11:23 PM, Martijn Pieters wrote: >> On 12 Oct 2016, at 14:00, Yuya Nishihara <yuya@tcha.org >> <mailto:yuya@tcha.org>> wrote: >>> FWIW, the current transformer will also rewrite open(f('foo')). >> >> This then also applies to getattr(f('foo'), ...). We'll need to assert >> that the 3rd token is a comma, I think. > > Seems like the whole group of rewrite have a similar flaw. We should fix > it in a follow up. > > My current plan is to accept that changeset with an updated description > incorporating Martijn explanation. I need to find some time to review > its few descendants first because of a current flaw/limitation in our > acceptance system. I've pushed the updated description. Fellow reviewers, here is what the accept label were before the repush > 1f01e3e33336 augie py3: a second argument to open can't be bytes (mjpieters) > 4b255d432d66 augie marmoute py3: add an os.fsencode backport to ease path handling (mjpieters) > f2701c406cd3 marmoute test-clone: fix some instability in pooled clone race condition test (augie) > 5da554404a82 marmoute test-clone: discard lock-related messages (augie) > ff6ad83fed8c yuya bisect: minor movement of code handle flag updating state (pierre-yves) > 2cc6dbf15658 yuya bisect: rename 'check_code' to match our naming scheme (pierre-yves) > 6617d5231051 yuya bisect: remove code about "update-flag" in check_state (pierre-yves) > e110b8c76d7c yuya bisect: simplify conditional in 'check_state' (pierre-yves) > 38000dc76eff yuya bisect: move check_state into the bisect module (pierre-yves) > 49639e29f292 yuya bisect: factor commonly update sequence (pierre-yves) > a075dcc7b99f yuya bisect: build a displayer only once (pierre-yves) > 8d079c0594b3 yuya marmoute py3: test to check which commands run (7895pulkit) > 731c206da50c yuya debuginstall: use %d instead of %s for formatting an int (augie) > aa408c1881c5 yuya py3: namedtuple takes unicode (journal ext) (mitrandir) > b5e774ffce88 yuya py3: use raw strings in line continuation (convert ext) (mitrandir) > a75b6c805d96 yuya pycompat: only accept a bytestring filepath in Python 2 (mjpieters) > 30b2ca4bcf35 yuya bisect: extra a small initialisation outside of a loop (pierre-yves) > 58bc268ab994 yuya checkcopies: rename 'ca' to 'base' (pierre-yves) > c415f3b39839 yuya checkcopies: minor change to comment (pierre-yves) > fe417a0192b1 yuya checkcopies: add an inline comment about the '_related' call (pierre-yves) > 0354b8013257 yuya checkcopies: extract the '_related' closure (pierre-yves) Cheers,
Patch
diff --git a/mercurial/__init__.py b/mercurial/__init__.py --- a/mercurial/__init__.py +++ b/mercurial/__init__.py @@ -305,6 +305,24 @@ except IndexError: pass + # Bare open call (not an attribute on something else) + if (fn == 'open' and not (prevtoken.type == token.OP and + prevtoken.string == '.')): + try: + # (NAME, 'open') + # (OP, '(') + # (NAME|STRING, 'filename') + # (OP, ',') + # (NAME|STRING, mode) + st = tokens[i + 4] + if (st.type == token.STRING and + st.string[0] in ("'", '"')): + rt = tokenize.TokenInfo(st.type, 'u%s' % st.string, + st.start, st.end, st.line) + tokens[i + 4] = rt + except IndexError: + pass + # It changes iteritems to items as iteritems is not # present in Python 3 world. if fn == 'iteritems': @@ -319,7 +337,7 @@ # ``replacetoken`` or any mechanism that changes semantics of module # loading is changed. Otherwise cached bytecode may get loaded without # the new transformation mechanisms applied. - BYTECODEHEADER = b'HG\x00\x04' + BYTECODEHEADER = b'HG\x00\x05' class hgloader(importlib.machinery.SourceFileLoader): """Custom module loader that transforms source code.