Submitter | timeless |
---|---|
Date | May 18, 2016, 7:36 p.m. |
Message ID | <3e03d6d0a7be640088b7.1463600182@gcc2-power8.osuosl.org> |
Download | mbox | patch |
Permalink | /patch/15171/ |
State | Changes Requested |
Headers | show |
Comments
timeless wrote: > + with opentext(f) as fp: > + header = fp.readline() This doesn't actually fix the problem I'm hitting w/ py3: + Traceback (most recent call last): + File "/home/timeless/hg/crew/tests/../contrib/check-code.py", line 632, in <module> + header = fp.readline() + File "/home/timeless/hg/py3/lib/python3.5/encodings/ascii.py", line 26, in decode + return codecs.ascii_decode(input, self.errors)[0] + UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 949: ordinal not in range(128) The file in question is CONTRIBUTORS. And really, this shouldn't be hard to get right, but, boy does python3 make it hard. There's no reason that I can think of for readline() to go anywhere near position 949. fwiw, I've tried using io.open, and it works fine in standalone testing but breaks in run-tests.
On 18 May 2016 at 20:48, timeless <timeless@gmail.com> wrote: > timeless wrote: >> + with opentext(f) as fp: >> + header = fp.readline() > > This doesn't actually fix the problem I'm hitting w/ py3: > > + Traceback (most recent call last): > + File "/home/timeless/hg/crew/tests/../contrib/check-code.py", > line 632, in <module> > + header = fp.readline() > + File "/home/timeless/hg/py3/lib/python3.5/encodings/ascii.py", > line 26, in decode > + return codecs.ascii_decode(input, self.errors)[0] > + UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in > position 949: ordinal not in range(128) > > The file in question is CONTRIBUTORS. > And really, this shouldn't be hard to get right, but, boy does python3 > make it hard. > There's no reason that I can think of for readline() to go anywhere > near position 949. .readline() works with a buffer, and that buffer is greater than 949 bytes. Without a buffer looking for the next newline separator gets really inefficient real fast. Open the file in binary mode instead; .readline() still works and you can match against bytes. Python doesn't support multi-byte encodings like UTF-16 and UTF-32 for source code anyway. > fwiw, I've tried using io.open, and it works fine in standalone > testing but breaks in run-tests. In Python 3, open() is io.open(). The default value for encoding is taken from the locale, which I think is set to C in run-tests (so the default encoding for opening files is then ASCII).
Patch
diff -r c641b8dfb98c -r 3e03d6d0a7be contrib/check-code.py --- a/contrib/check-code.py Wed May 11 01:46:11 2016 +0000 +++ b/contrib/check-code.py Wed May 11 01:37:20 2016 +0000 @@ -426,6 +426,12 @@ ('web template', r'mercurial/templates/.*\.tmpl', '', webtemplatefilters, webtemplatepats), ] +def joinedre(checks): + return r'|'.join(['(?:%s)' % c for c in checks]) +filechecks = joinedre([c[1] for c in checks]) +filecheckre = re.compile(filechecks) +magicchecks = joinedre([c[2] for c in checks if c[2] != '']) +magiccheckre = re.compile(magicchecks) def _preparepats(): for c in checks: @@ -615,6 +621,19 @@ ret = 0 for f in check: + name = os.path.basename(f) + if '.' in name: + if not filecheckre.search(f): + if options.debug: + print("Ignoring %s" % f) + continue + else: + with opentext(f) as fp: + header = fp.readline() + if not magiccheckre.search(header): + if options.debug: + print("Ignoring %s" % f) + continue if not checkfile(f, maxerr=options.per_file, warnings=options.warnings, blame=options.blame, debug=options.debug, lineno=options.lineno):