Patchwork [9,of,9] match: use util.re.escape instead of re.escape

login
register
mail settings
Submitter Siddharth Agarwal
Date July 15, 2014, 11:15 p.m.
Message ID <2d645e993f8cb3d386ae.1405466131@dev1738.prn1.facebook.com>
Download mbox | patch
Permalink /patch/5178/
State Accepted
Commit d516b6de38210dea25aad7cb59ee999c6cbe37fd
Headers show

Comments

Siddharth Agarwal - July 15, 2014, 11:15 p.m.
# HG changeset patch
# User Siddharth Agarwal <sid0@fb.com>
# Date 1405463690 25200
#      Tue Jul 15 15:34:50 2014 -0700
# Node ID 2d645e993f8cb3d386ae520e7233089316e830f2
# Parent  8ec138de734383da9ab4fd60e4a61054906f50ed
match: use util.re.escape instead of re.escape

For a pathological .hgignore with over 2500 glob lines and over 200000 calls to
re.escape, and with re2 available, this speeds up parsing the .hgignore from
0.75 seconds to 0.20 seconds. This causes e.g. 'hg status' with hgwatchman
enabled to go from 1.02 seconds to 0.47 seconds.
Augie Fackler - July 18, 2014, 2:26 a.m.
On Tue, Jul 15, 2014 at 04:15:31PM -0700, Siddharth Agarwal wrote:
> # HG changeset patch
> # User Siddharth Agarwal <sid0@fb.com>
> # Date 1405463690 25200
> #      Tue Jul 15 15:34:50 2014 -0700
> # Node ID 2d645e993f8cb3d386ae520e7233089316e830f2
> # Parent  8ec138de734383da9ab4fd60e4a61054906f50ed
> match: use util.re.escape instead of re.escape

Series looks sensible and straighforward. Queued.

(I particuarly like that (ab)use of @propertycache to return different
functions and have it look like a class method. Clever.)

>
> For a pathological .hgignore with over 2500 glob lines and over 200000 calls to
> re.escape, and with re2 available, this speeds up parsing the .hgignore from
> 0.75 seconds to 0.20 seconds. This causes e.g. 'hg status' with hgwatchman
> enabled to go from 1.02 seconds to 0.47 seconds.
>
> diff --git a/mercurial/match.py b/mercurial/match.py
> --- a/mercurial/match.py
> +++ b/mercurial/match.py
> @@ -247,7 +247,7 @@
>      i, n = 0, len(pat)
>      res = ''
>      group = 0
> -    escape = re.escape
> +    escape = util.re.escape
>      def peek():
>          return i < n and pat[i]
>      while i < n:
> @@ -310,11 +310,11 @@
>      if kind == 're':
>          return pat
>      if kind == 'path':
> -        return '^' + re.escape(pat) + '(?:/|$)'
> +        return '^' + util.re.escape(pat) + '(?:/|$)'
>      if kind == 'relglob':
>          return '(?:|.*/)' + _globre(pat) + globsuffix
>      if kind == 'relpath':
> -        return re.escape(pat) + '(?:/|$)'
> +        return util.re.escape(pat) + '(?:/|$)'
>      if kind == 'relre':
>          if pat.startswith('^'):
>              return pat
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Siddharth Agarwal - July 18, 2014, 3:21 a.m.
On 07/17/2014 07:26 PM, Augie Fackler wrote:
> On Tue, Jul 15, 2014 at 04:15:31PM -0700, Siddharth Agarwal wrote:
>> # HG changeset patch
>> # User Siddharth Agarwal <sid0@fb.com>
>> # Date 1405463690 25200
>> #      Tue Jul 15 15:34:50 2014 -0700
>> # Node ID 2d645e993f8cb3d386ae520e7233089316e830f2
>> # Parent  8ec138de734383da9ab4fd60e4a61054906f50ed
>> match: use util.re.escape instead of re.escape
> Series looks sensible and straighforward. Queued.
>
> (I particuarly like that (ab)use of @propertycache to return different
> functions and have it look like a class method. Clever.)

My motivation for it was to avoid an extra function call per invocation 
of util.re.escape -- once the value's been saved as a local, as is done 
in patch 9.


>
>> For a pathological .hgignore with over 2500 glob lines and over 200000 calls to
>> re.escape, and with re2 available, this speeds up parsing the .hgignore from
>> 0.75 seconds to 0.20 seconds. This causes e.g. 'hg status' with hgwatchman
>> enabled to go from 1.02 seconds to 0.47 seconds.
>>
>> diff --git a/mercurial/match.py b/mercurial/match.py
>> --- a/mercurial/match.py
>> +++ b/mercurial/match.py
>> @@ -247,7 +247,7 @@
>>       i, n = 0, len(pat)
>>       res = ''
>>       group = 0
>> -    escape = re.escape
>> +    escape = util.re.escape
>>       def peek():
>>           return i < n and pat[i]
>>       while i < n:
>> @@ -310,11 +310,11 @@
>>       if kind == 're':
>>           return pat
>>       if kind == 'path':
>> -        return '^' + re.escape(pat) + '(?:/|$)'
>> +        return '^' + util.re.escape(pat) + '(?:/|$)'
>>       if kind == 'relglob':
>>           return '(?:|.*/)' + _globre(pat) + globsuffix
>>       if kind == 'relpath':
>> -        return re.escape(pat) + '(?:/|$)'
>> +        return util.re.escape(pat) + '(?:/|$)'
>>       if kind == 'relre':
>>           if pat.startswith('^'):
>>               return pat
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/match.py b/mercurial/match.py
--- a/mercurial/match.py
+++ b/mercurial/match.py
@@ -247,7 +247,7 @@ 
     i, n = 0, len(pat)
     res = ''
     group = 0
-    escape = re.escape
+    escape = util.re.escape
     def peek():
         return i < n and pat[i]
     while i < n:
@@ -310,11 +310,11 @@ 
     if kind == 're':
         return pat
     if kind == 'path':
-        return '^' + re.escape(pat) + '(?:/|$)'
+        return '^' + util.re.escape(pat) + '(?:/|$)'
     if kind == 'relglob':
         return '(?:|.*/)' + _globre(pat) + globsuffix
     if kind == 'relpath':
-        return re.escape(pat) + '(?:/|$)'
+        return util.re.escape(pat) + '(?:/|$)'
     if kind == 'relre':
         if pat.startswith('^'):
             return pat