Patchwork D5049: hgdemandimport: don't use str.isidentifier()

login
register
mail settings
Submitter phabricator
Date Oct. 13, 2018, 7:28 a.m.
Message ID <differential-rev-PHID-DREV-wqrzmr6timkbib7lhble-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/35834/
State New
Headers show

Comments

phabricator - Oct. 13, 2018, 7:28 a.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This method doesn't exist in Python 2.7.
  
  The new code is equivalent to what Python was doing up until
  isidentifier() was introduced to Lib/tokenize.py in
  33856de84d1115a18b699e0ca93c3b921bc6a1af.
  
  Strictly speaking, a simple ascii check is not sufficient, as a proper
  identifier is defined by PEP 3131 and allows non-ascii characters. But
  we don't plan to use this tokenizer on arbitrary source code, so I think
  we can get away with not conforming to PEP 3131.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5049

AFFECTED FILES
  hgdemandimport/py3tokenize.py

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel

Patch

diff --git a/hgdemandimport/py3tokenize.py b/hgdemandimport/py3tokenize.py
--- a/hgdemandimport/py3tokenize.py
+++ b/hgdemandimport/py3tokenize.py
@@ -63,6 +63,7 @@ 
 # * absolute_import added.
 # * Removed re.ASCII.
 # * Various backports to work on Python 2.7.
+# * Dropped support for non-ASCII identifiers (PEP 3131 support).
 
 from __future__ import absolute_import
 
@@ -75,6 +76,7 @@ 
 from itertools import chain
 import itertools as _itertools
 import re
+import string
 from .py3token import *
 
 cookie_re = re.compile(r'^[ \t\f]*#.*?coding[:=][ \t]*([-\w.]+)')
@@ -524,6 +526,7 @@ 
 
 def _tokenize(readline, encoding):
     lnum = parenlev = continued = 0
+    namechars = string.ascii_letters + '_'
     numchars = '0123456789'
     contstr, needcont = '', 0
     contline = None
@@ -682,7 +685,8 @@ 
                     else:                                  # ordinary string
                         yield TokenInfo(STRING, token, spos, epos, line)
 
-                elif initial.isidentifier():               # ordinary name
+                # This doesn't conform to PEP 3131.
+                elif initial in namechars:                 # ordinary name
                     yield TokenInfo(NAME, token, spos, epos, line)
                 elif initial == '\\':                      # continued stmt
                     continued = 1