Patchwork D2866: contrib: fix a subtle bug in check-code's regex rewriting

login
register
mail settings
Submitter phabricator
Date March 14, 2018, 7:53 p.m.
Message ID <differential-rev-PHID-DREV-v77doeaccfx5u3m4tf7m-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/29512/
State Superseded
Headers show

Comments

phabricator - March 14, 2018, 7:53 p.m.
durin42 created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  We rewrite `\s` to `[ \t]` when preparing our regular expressions, but
  we previously weren't working to avoid having nested sets. Previously,
  Python let this slide without incident, but in Python 3.7 wants to
  make sure you meant an actual [ in a set, and so this warns. This
  appears to be fortunate for us, because `[\s(]` was getting rewritten
  to be `[[ \t](]` which doesn't actually match what we expected. See
  preceding changes that were revealed to be necessary after
  implementing this fix.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2866

AFFECTED FILES
  contrib/check-code.py

CHANGE DETAILS




To: durin42, #hg-reviewers
Cc: mercurial-devel
phabricator - March 15, 2018, 12:32 p.m.
pulkit accepted this revision.
pulkit added a comment.


  Looks good to me but I am not feeling confident enough to push this. Queued the first three.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D2866

To: durin42, #hg-reviewers, pulkit
Cc: pulkit, mercurial-devel

Patch

diff --git a/contrib/check-code.py b/contrib/check-code.py
--- a/contrib/check-code.py
+++ b/contrib/check-code.py
@@ -542,8 +542,11 @@ 
             for i, pseq in enumerate(pats):
                 # fix-up regexes for multi-line searches
                 p = pseq[0]
-                # \s doesn't match \n
-                p = re.sub(r'(?<!\\)\\s', r'[ \\t]', p)
+                # \s doesn't match \n (done in two steps)
+                # first, we replace \s that appears in a set already
+                p = re.sub(r'\[\\s', r'[ \\t', p)
+                # now we replace other \s instances.
+                p = re.sub(r'(?<!(\\|\[))\\s', r'[ \\t]', p)
                 # [^...] doesn't match newline
                 p = re.sub(r'(?<!\\)\[\^', r'[^\\n', p)