Patchwork D10284: re2: feed unicode string to re2 module when necessary

login
register
mail settings
Submitter phabricator
Date March 28, 2021, 10:53 p.m.
Message ID <differential-rev-PHID-DREV-3kxonrlk3qf4htk6kv53-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/48600/
State Superseded
Headers show

Comments

phabricator - March 28, 2021, 10:53 p.m.
marmoute created this revision.
Herald added a reviewer: hg-reviewers.
Herald added a subscriber: mercurial-patches.

REVISION SUMMARY
  My previous test were using the `pyre2` Python project, that wrap the Google RE2 library
  in python as a `re2` module and accept bytes as input. However the `fb-re2`
  Python project is also offering a wrapping of the Google RE2 library in python
  as a `re2` module ans accept only unicode on python3. So we detect this
  situation and convert thing to unicode when necessary.
  
  Hooray…
  
  We should consider using a rust wrapping for regexp handling. We needs regexps
  in Rust anyway and this give use more control with less variants and more
  sanity.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D10284

AFFECTED FILES
  mercurial/util.py

CHANGE DETAILS




To: marmoute, #hg-reviewers
Cc: mercurial-patches, mercurial-devel

Patch

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -2172,6 +2172,7 @@ 
         return True
 
 
+_re2_input = lambda x: x
 try:
     import re2  # pytype: disable=import-error
 
@@ -2183,11 +2184,21 @@ 
 class _re(object):
     def _checkre2(self):
         global _re2
+        global _re2_input
         try:
             # check if match works, see issue3964
-            _re2 = bool(re2.match(br'\[([^\[]+)\]', b'[ui]'))
+            check_pattern = br'\[([^\[]+)\]'
+            check_input = b'[ui]'
+            _re2 = bool(re2.match(check_pattern, check_input))
         except ImportError:
             _re2 = False
+        except TypeError:
+            # the `pyre-2` project provides a re2 module that accept bytes
+            # the `fb-re2` project provides a re2 module that acccept sysstr
+            check_pattern = pycompat.sysstr(check_pattern)
+            check_input = pycompat.sysstr(check_input)
+            _re2 = bool(re2.match(check_pattern, check_input))
+            _re2_input = pycompat.sysstr
 
     def compile(self, pat, flags=0):
         """Compile a regular expression, using re2 if possible
@@ -2203,7 +2214,7 @@ 
             if flags & remod.MULTILINE:
                 pat = b'(?m)' + pat
             try:
-                return re2.compile(pat)
+                return re2.compile(_re2_input(pat))
             except re2.error:
                 pass
         return remod.compile(pat, flags)