Patchwork D7225: import-checker: open all source files as utf-8

login
register
mail settings
Submitter phabricator
Date Nov. 5, 2019, 4:58 a.m.
Message ID <differential-rev-PHID-DREV-tbn43ie4fa53uopinjfo-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/42742/
State Superseded
Headers show

Comments

phabricator - Nov. 5, 2019, 4:58 a.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Before, we opened in text mode and used the default encoding
  to interpret the bytes within.
  
  This caused problems interpreting some byte sequences in some
  files.
  
  This commit changes things to always open files as UTF-8, which
  makes the error go away.
  
  test-check-module-imports.t now passes on Python 3.5 and 3.6
  with this change.

REPOSITORY
  rHG Mercurial

BRANCH
  stable

REVISION DETAIL
  https://phab.mercurial-scm.org/D7225

AFFECTED FILES
  contrib/import-checker.py

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/contrib/import-checker.py b/contrib/import-checker.py
--- a/contrib/import-checker.py
+++ b/contrib/import-checker.py
@@ -4,6 +4,7 @@ 
 
 import ast
 import collections
+import io
 import os
 import sys
 
@@ -754,7 +755,11 @@ 
             yield src.read(), modname, f, 0
             py = True
     if py or f.endswith('.t'):
-        with open(f, 'r') as src:
+        # Strictly speaking we should sniff for the magic header that denotes
+        # Python source file encoding. But in reality we don't use anything
+        # other than ASCII (mainly) and UTF-8 (in a few exceptions), so
+        # simplicity is fine.
+        with io.open(f, 'r', encoding='utf-8') as src:
             for script, modname, t, line in embedded(f, modname, src):
                 yield script, modname.encode('utf8'), t, line