Patchwork [v3] check-code: skip unhandled files

login
register
mail settings
Submitter timeless
Date May 18, 2016, 7:36 p.m.
Message ID <3e03d6d0a7be640088b7.1463600182@gcc2-power8.osuosl.org>
Download mbox | patch
Permalink /patch/15171/
State Changes Requested
Headers show

Comments

timeless - May 18, 2016, 7:36 p.m.
# HG changeset patch
# User timeless <timeless@mozdev.org>
# Date 1462930640 0
#      Wed May 11 01:37:20 2016 +0000
# Node ID 3e03d6d0a7be640088b73c1f1ca8c15ad5a4a345
# Parent  c641b8dfb98c2ade6995ba3aa341fe4d7b154827
# Available At bb://timeless/mercurial-crew
#              hg pull bb://timeless/mercurial-crew -r 3e03d6d0a7be
check-code: skip unhandled files

check-code is organized into a list of file patterns for which there
are a list of string patterns in which to look for strings (and then
complain if matches are found).

It also has a list of magic file types that it uses for files without
extensions.

It doesn't make sense to open files for which there are no matching
file patterns. So, we compile the matching file patterns and then
skip anything that doesn't match them.

Similarly, it doesn't make sense to read the entirety of a file just
to check to see if the magic header matches. So for files without an
extension, we can pay the penalty of opening a file, peeking at its
first line, and in the case that the line matches our precompiled
pattern, reopening it later for scanning.
timeless - May 18, 2016, 7:48 p.m.
timeless wrote:
> +            with opentext(f) as fp:
> +                header = fp.readline()

This doesn't actually fix the problem I'm hitting w/ py3:

+  Traceback (most recent call last):
+    File "/home/timeless/hg/crew/tests/../contrib/check-code.py",
line 632, in <module>
+      header = fp.readline()
+    File "/home/timeless/hg/py3/lib/python3.5/encodings/ascii.py",
line 26, in decode
+      return codecs.ascii_decode(input, self.errors)[0]
+  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
position 949: ordinal not in range(128)

The file in question is CONTRIBUTORS.
And really, this shouldn't be hard to get right, but, boy does python3
make it hard.
There's no reason that I can think of for readline() to go anywhere
near position 949.

fwiw, I've tried using io.open, and it works fine in standalone
testing but breaks in run-tests.
Martijn Pieters - May 18, 2016, 8:17 p.m.
On 18 May 2016 at 20:48, timeless <timeless@gmail.com> wrote:
> timeless wrote:
>> +            with opentext(f) as fp:
>> +                header = fp.readline()
>
> This doesn't actually fix the problem I'm hitting w/ py3:
>
> +  Traceback (most recent call last):
> +    File "/home/timeless/hg/crew/tests/../contrib/check-code.py",
> line 632, in <module>
> +      header = fp.readline()
> +    File "/home/timeless/hg/py3/lib/python3.5/encodings/ascii.py",
> line 26, in decode
> +      return codecs.ascii_decode(input, self.errors)[0]
> +  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in
> position 949: ordinal not in range(128)
>
> The file in question is CONTRIBUTORS.
> And really, this shouldn't be hard to get right, but, boy does python3
> make it hard.
> There's no reason that I can think of for readline() to go anywhere
> near position 949.

.readline() works with a buffer, and that buffer is greater than 949
bytes. Without a buffer looking for the next newline separator gets
really inefficient real fast.

Open the file in binary mode instead; .readline() still works and you
can match against bytes. Python doesn't support multi-byte encodings
like UTF-16 and UTF-32 for source code anyway.

> fwiw, I've tried using io.open, and it works fine in standalone
> testing but breaks in run-tests.

In Python 3, open() is io.open(). The default value for encoding is
taken from the locale, which I think is set to C in run-tests (so the
default encoding for opening files is then ASCII).

Patch

diff -r c641b8dfb98c -r 3e03d6d0a7be contrib/check-code.py
--- a/contrib/check-code.py	Wed May 11 01:46:11 2016 +0000
+++ b/contrib/check-code.py	Wed May 11 01:37:20 2016 +0000
@@ -426,6 +426,12 @@ 
     ('web template', r'mercurial/templates/.*\.tmpl', '',
      webtemplatefilters, webtemplatepats),
 ]
+def joinedre(checks):
+    return r'|'.join(['(?:%s)' % c for c in checks])
+filechecks = joinedre([c[1] for c in checks])
+filecheckre = re.compile(filechecks)
+magicchecks = joinedre([c[2] for c in checks if c[2] != ''])
+magiccheckre = re.compile(magicchecks)
 
 def _preparepats():
     for c in checks:
@@ -615,6 +621,19 @@ 
 
     ret = 0
     for f in check:
+        name = os.path.basename(f)
+        if '.' in name:
+            if not filecheckre.search(f):
+                if options.debug:
+                    print("Ignoring %s" % f)
+                continue
+        else:
+            with opentext(f) as fp:
+                header = fp.readline()
+                if not magiccheckre.search(header):
+                    if options.debug:
+                        print("Ignoring %s" % f)
+                    continue
         if not checkfile(f, maxerr=options.per_file, warnings=options.warnings,
                          blame=options.blame, debug=options.debug,
                          lineno=options.lineno):