Patchwork [5,of,5] check-code: handle py3 open divergence

login
register
mail settings
Submitter timeless
Date May 11, 2016, 4:12 a.m.
Message ID <c5dfd864d81f5d1f435f.1462939970@gcc2-power8.osuosl.org>
Download mbox | patch
Permalink /patch/14991/
State Accepted
Headers show

Comments

timeless - May 11, 2016, 4:12 a.m.
# HG changeset patch
# User timeless <timeless@mozdev.org>
# Date 1462931171 0
#      Wed May 11 01:46:11 2016 +0000
# Node ID c5dfd864d81f5d1f435ff376ea44b3ea82b12575
# Parent  ed2dbfec165a0de79c93d7fa1ea9cb5f36b10c29
# EXP-Topic runtests
# Available At bb://timeless/mercurial-crew
#              hg pull bb://timeless/mercurial-crew -r c5dfd864d81f
check-code: handle py3 open divergence

open() really wants an encoding attribute
Yuya Nishihara - May 15, 2016, 10:42 a.m.
On Wed, 11 May 2016 04:12:50 +0000, timeless wrote:
> # HG changeset patch
> # User timeless <timeless@mozdev.org>
> # Date 1462931171 0
> #      Wed May 11 01:46:11 2016 +0000
> # Node ID c5dfd864d81f5d1f435ff376ea44b3ea82b12575
> # Parent  ed2dbfec165a0de79c93d7fa1ea9cb5f36b10c29
> # EXP-Topic runtests
> # Available At bb://timeless/mercurial-crew
> #              hg pull bb://timeless/mercurial-crew -r c5dfd864d81f
> check-code: handle py3 open divergence
> 
> open() really wants an encoding attribute
> 
> diff -r ed2dbfec165a -r c5dfd864d81f contrib/check-code.py
> --- a/contrib/check-code.py	Wed May 11 01:44:39 2016 +0000
> +++ b/contrib/check-code.py	Wed May 11 01:46:11 2016 +0000
> @@ -26,6 +26,11 @@
>  import os
>  import re
>  import sys
> +if sys.version_info[0] < 3:
> +    opentext = open
> +else:
> +    def opentext(f):
> +        return open(f, encoding='ascii')

Do we want a unicode on Python3 ?

I'm not saying we shouldn't, I just wondered. That would be manageable to
use unicodes in small scripts like check-code.py.
timeless - May 15, 2016, 2:33 p.m.
I don't know. I'm trying to take baby steps. If I or someone else hit a
problem with only ASCII, we can address it then.
On May 15, 2016 3:44 AM, "Yuya Nishihara" <yuya@tcha.org> wrote:

> On Wed, 11 May 2016 04:12:50 +0000, timeless wrote:
> > # HG changeset patch
> > # User timeless <timeless@mozdev.org>
> > # Date 1462931171 0
> > #      Wed May 11 01:46:11 2016 +0000
> > # Node ID c5dfd864d81f5d1f435ff376ea44b3ea82b12575
> > # Parent  ed2dbfec165a0de79c93d7fa1ea9cb5f36b10c29
> > # EXP-Topic runtests
> > # Available At bb://timeless/mercurial-crew
> > #              hg pull bb://timeless/mercurial-crew -r c5dfd864d81f
> > check-code: handle py3 open divergence
> >
> > open() really wants an encoding attribute
> >
> > diff -r ed2dbfec165a -r c5dfd864d81f contrib/check-code.py
> > --- a/contrib/check-code.py   Wed May 11 01:44:39 2016 +0000
> > +++ b/contrib/check-code.py   Wed May 11 01:46:11 2016 +0000
> > @@ -26,6 +26,11 @@
> >  import os
> >  import re
> >  import sys
> > +if sys.version_info[0] < 3:
> > +    opentext = open
> > +else:
> > +    def opentext(f):
> > +        return open(f, encoding='ascii')
>
> Do we want a unicode on Python3 ?
>
> I'm not saying we shouldn't, I just wondered. That would be manageable to
> use unicodes in small scripts like check-code.py.
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Yuya Nishihara - May 16, 2016, 1:06 p.m.
On Sun, 15 May 2016 07:33:26 -0700, timeless wrote:
> I don't know. I'm trying to take baby steps. If I or someone else hit a
> problem with only ASCII, we can address it then.

It doesn't matter if encoding is 'ascii' or not. This patch implies we choose
unicode over bytes for check-code.py, on Python 3. I think that would work
for small scripts, but it scared me a little since we will do differently for
the core modules.

> > > +    def opentext(f):
> > > +        return open(f, encoding='ascii')

If we want to stick to bytes, it should be open(f, 'rb').
Gregory Szorc - May 16, 2016, 3:56 p.m.
> On May 16, 2016, at 06:06, Yuya Nishihara <yuya@tcha.org> wrote:
> 
>> On Sun, 15 May 2016 07:33:26 -0700, timeless wrote:
>> I don't know. I'm trying to take baby steps. If I or someone else hit a
>> problem with only ASCII, we can address it then.
> 
> It doesn't matter if encoding is 'ascii' or not. This patch implies we choose
> unicode over bytes for check-code.py, on Python 3. I think that would work
> for small scripts, but it scared me a little since we will do differently for
> the core modules.
> 
>>>> +    def opentext(f):
>>>> +        return open(f, encoding='ascii')
> 
> If we want to stick to bytes, it should be open(f, 'rb').

Python 3 treats source files as utf-8 by default. We should probably do the same. As long as the decoder is in strict mode, ascii should be OK for the immediate future.
Martijn Pieters - May 16, 2016, 4:22 p.m.
Rather than make Python 3 behave as Python 2, why not use `io.open()`
in Python 2? The *exact same behaviour* is available in Python 2, with
the only difference really being that `io.open()` is the built-in
`open()` in Python 3.

On 11 May 2016 at 05:12, timeless <timeless@fmr.im> wrote:
> # HG changeset patch
> # User timeless <timeless@mozdev.org>
> # Date 1462931171 0
> #      Wed May 11 01:46:11 2016 +0000
> # Node ID c5dfd864d81f5d1f435ff376ea44b3ea82b12575
> # Parent  ed2dbfec165a0de79c93d7fa1ea9cb5f36b10c29
> # EXP-Topic runtests
> # Available At bb://timeless/mercurial-crew
> #              hg pull bb://timeless/mercurial-crew -r c5dfd864d81f
> check-code: handle py3 open divergence
>
> open() really wants an encoding attribute
>
> diff -r ed2dbfec165a -r c5dfd864d81f contrib/check-code.py
> --- a/contrib/check-code.py     Wed May 11 01:44:39 2016 +0000
> +++ b/contrib/check-code.py     Wed May 11 01:46:11 2016 +0000
> @@ -26,6 +26,11 @@
>  import os
>  import re
>  import sys
> +if sys.version_info[0] < 3:
> +    opentext = open
> +else:
> +    def opentext(f):
> +        return open(f, encoding='ascii')
>  try:
>      xrange
>  except NameError:
> @@ -493,8 +498,12 @@
>      result = True
>
>      try:
> -        with open(f) as fp:
> -            pre = post = fp.read()
> +        with opentext(f) as fp:
> +            try:
> +                pre = post = fp.read()
> +            except UnicodeDecodeError as e:
> +                print("%s while reading %s" % (e, f))
> +                return result
>      except IOError as e:
>          print("Skipping %s, %s" % (f, str(e).split(':', 1)[0]))
>          return result
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
timeless - May 16, 2016, 6:18 p.m.
So, on the py2 side of things, io.open seems to like to return
<unicode>, whereas our stuff expects <str>.

On the py3 side of things. If I can avoid having to talk about unicode, I will.

On Mon, May 16, 2016 at 9:22 AM, Martijn Pieters <mj@zopatista.com> wrote:
> Rather than make Python 3 behave as Python 2, why not use `io.open()`
> in Python 2? The *exact same behaviour* is available in Python 2, with
> the only difference really being that `io.open()` is the built-in
> `open()` in Python 3.
>
> On 11 May 2016 at 05:12, timeless <timeless@fmr.im> wrote:
>> # HG changeset patch
>> # User timeless <timeless@mozdev.org>
>> # Date 1462931171 0
>> #      Wed May 11 01:46:11 2016 +0000
>> # Node ID c5dfd864d81f5d1f435ff376ea44b3ea82b12575
>> # Parent  ed2dbfec165a0de79c93d7fa1ea9cb5f36b10c29
>> # EXP-Topic runtests
>> # Available At bb://timeless/mercurial-crew
>> #              hg pull bb://timeless/mercurial-crew -r c5dfd864d81f
>> check-code: handle py3 open divergence
>>
>> open() really wants an encoding attribute
>>
>> diff -r ed2dbfec165a -r c5dfd864d81f contrib/check-code.py
>> --- a/contrib/check-code.py     Wed May 11 01:44:39 2016 +0000
>> +++ b/contrib/check-code.py     Wed May 11 01:46:11 2016 +0000
>> @@ -26,6 +26,11 @@
>>  import os
>>  import re
>>  import sys
>> +if sys.version_info[0] < 3:
>> +    opentext = open
>> +else:
>> +    def opentext(f):
>> +        return open(f, encoding='ascii')
>>  try:
>>      xrange
>>  except NameError:
>> @@ -493,8 +498,12 @@
>>      result = True
>>
>>      try:
>> -        with open(f) as fp:
>> -            pre = post = fp.read()
>> +        with opentext(f) as fp:
>> +            try:
>> +                pre = post = fp.read()
>> +            except UnicodeDecodeError as e:
>> +                print("%s while reading %s" % (e, f))
>> +                return result
>>      except IOError as e:
>>          print("Skipping %s, %s" % (f, str(e).split(':', 1)[0]))
>>          return result
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
>
> --
> Martijn Pieters
Martijn Pieters - May 16, 2016, 6:42 p.m.
On 16 May 2016, at 19:18, timeless <timeless@gmail.com> wrote:
> 
> So, on the py2 side of things, io.open seems to like to return
> <unicode>, whereas our stuff expects <str>.
> 
> On the py3 side of things. If I can avoid having to talk about unicode, I will.

But you could use binary mode and have `bytes` on Python 3 and `str` on Python 2. The only disadvantage is that universal line ending translation is gone, so you get `\r\n` if that's the convention in a given file.

--
Martijn
timeless - May 16, 2016, 7:12 p.m.
timeless <timeless@gmail.com> wrote:
> So, on the py2 side of things, io.open seems to like to return
> <unicode>, whereas our stuff expects <str>.
>
> On the py3 side of things. If I can avoid having to talk about unicode, I
> will.

Martijn Pieters <mj@zopatista.com> wrote:
> But you could use binary mode and have `bytes` on Python 3 and `str` on
> Python 2. The only disadvantage is that universal line ending translation is
> gone, so you get `\r\n` if that's the convention in a given file.

Doing that would require rewriting all the r'...' lines as br'...'.

At this point. I'd like to avoid doing that, since there's half a
chance that someone will write a module loader hack which will let us
skip that.

If we get that, then I'm happy to replace the opentext I've written w/
a py2+py3 io.open(...,'rb').

Patch

diff -r ed2dbfec165a -r c5dfd864d81f contrib/check-code.py
--- a/contrib/check-code.py	Wed May 11 01:44:39 2016 +0000
+++ b/contrib/check-code.py	Wed May 11 01:46:11 2016 +0000
@@ -26,6 +26,11 @@ 
 import os
 import re
 import sys
+if sys.version_info[0] < 3:
+    opentext = open
+else:
+    def opentext(f):
+        return open(f, encoding='ascii')
 try:
     xrange
 except NameError:
@@ -493,8 +498,12 @@ 
     result = True
 
     try:
-        with open(f) as fp:
-            pre = post = fp.read()
+        with opentext(f) as fp:
+            try:
+                pre = post = fp.read()
+            except UnicodeDecodeError as e:
+                print("%s while reading %s" % (e, f))
+                return result
     except IOError as e:
         print("Skipping %s, %s" % (f, str(e).split(':', 1)[0]))
         return result