Patchwork [4,of,4,V2] hgweb: config option to blacklist some revset functions in hgweb search

login
register
mail settings
Submitter Alexander Plavin
Date Aug. 9, 2013, 6:54 p.m.
Message ID <e495c742bf85e0aef491.1376074498@debian-alexander.dolgopa>
Download mbox | patch
Permalink /patch/2105/
State Changes Requested
Headers show

Comments

Alexander Plavin - Aug. 9, 2013, 6:54 p.m.
# HG changeset patch
# User Alexander Plavin <alexander@plav.in>
# Date 1374269558 -14400
#      Sat Jul 20 01:32:38 2013 +0400
# Node ID e495c742bf85e0aef4919c94f08effa6effd3695
# Parent  80319cecf93938fb529984f4a2f5c105bcc709b1
hgweb: config option to blacklist some revset functions in hgweb search

This option defaults to ['contains'], as this is a heavy-weight function.
Augie Fackler - Aug. 12, 2013, 2:30 p.m.
On Fri, Aug 09, 2013 at 10:54:58PM +0400, Alexander Plavin wrote:
> # HG changeset patch
> # User Alexander Plavin <alexander@plav.in>
> # Date 1374269558 -14400
> #      Sat Jul 20 01:32:38 2013 +0400
> # Node ID e495c742bf85e0aef4919c94f08effa6effd3695
> # Parent  80319cecf93938fb529984f4a2f5c105bcc709b1
> hgweb: config option to blacklist some revset functions in hgweb search
>
> This option defaults to ['contains'], as this is a heavy-weight function.
>
> diff -r 80319cecf939 -r e495c742bf85 mercurial/help/config.txt
> --- a/mercurial/help/config.txt	Wed Aug 07 01:16:14 2013 +0400
> +++ b/mercurial/help/config.txt	Sat Jul 20 01:32:38 2013 +0400
> @@ -1461,6 +1461,10 @@
>      Whether to require that inbound pushes be transported over SSL to
>      prevent password sniffing. Default is True.
>
> +``revsetblacklist``
> +    List of revset functions which are not allowed in search queries.
> +    Default is 'contains'.

Probably want to blacklist anything that does regexp matches too,
since we're not on re2.

> +
>  ``staticurl``
>      Base URL to use for static files. If unset, static files (e.g. the
>      hgicon.png favicon) will be served by the CGI script itself. Use
> diff -r 80319cecf939 -r e495c742bf85 mercurial/hgweb/webcommands.py
> --- a/mercurial/hgweb/webcommands.py	Wed Aug 07 01:16:14 2013 +0400
> +++ b/mercurial/hgweb/webcommands.py	Sat Jul 20 01:32:38 2013 +0400
> @@ -211,7 +211,11 @@
>              # can't parse to a tree
>              modename = 'kw'
>          else:
> -            if revset.depth(tree) > 2:
> +            funcsused = revset.funcsused(tree)
> +            blacklist = web.configlist('web', 'revsetblacklist', ['contains'])
> +            blacklist = set(blacklist)
> +
> +            if revset.depth(tree) > 2 and not funcsused & blacklist:
>                  mfunc = revset.match(None, revdef)
>                  try:
>                      # try running against empty subset
> @@ -224,7 +228,7 @@
>                      # can't run the revset query, e.g. some function misspelled
>                      modename = 'kw'
>              else:
> -                # no revset syntax used
> +                # no revset syntax used or blacklisted functions in the query
>                  modename = 'kw'
>
>      searchfunc = searchfuncs[modename]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Alexander Plavin - Aug. 12, 2013, 6:21 p.m.
12.08.2013, 18:30, "Augie Fackler" <raf@durin42.com>:
> On Fri, Aug 09, 2013 at 10:54:58PM +0400, Alexander Plavin wrote:
>
>>  # HG changeset patch
>>  # User Alexander Plavin <alexander@plav.in>
>>  # Date 1374269558 -14400
>>  #      Sat Jul 20 01:32:38 2013 +0400
>>  # Node ID e495c742bf85e0aef4919c94f08effa6effd3695
>>  # Parent  80319cecf93938fb529984f4a2f5c105bcc709b1
>>  hgweb: config option to blacklist some revset functions in hgweb search
>>
>>  This option defaults to ['contains'], as this is a heavy-weight function.
>>
>>  diff -r 80319cecf939 -r e495c742bf85 mercurial/help/config.txt
>>  --- a/mercurial/help/config.txt Wed Aug 07 01:16:14 2013 +0400
>>  +++ b/mercurial/help/config.txt Sat Jul 20 01:32:38 2013 +0400
>>  @@ -1461,6 +1461,10 @@
>>       Whether to require that inbound pushes be transported over SSL to
>>       prevent password sniffing. Default is True.
>>
>>  +``revsetblacklist``
>>  +    List of revset functions which are not allowed in search queries.
>>  +    Default is 'contains'.
>
> Probably want to blacklist anything that does regexp matches too,
> since we're not on re2.

As I understand, blacklisting grep function and also 're:' prefix for others? I can see two ways to do this: pass an argument somehow to revset._stringmatcher function to switch off 're:' prefix check, or just replacing '\(\s+re:' with '(literal:re:' in the query string. The first methods seems more robust of course. Am I correct here?

Btw, nice library re2, didn't see it before :)

>
>>  +
>>   ``staticurl``
>>       Base URL to use for static files. If unset, static files (e.g. the
>>       hgicon.png favicon) will be served by the CGI script itself. Use
>>  diff -r 80319cecf939 -r e495c742bf85 mercurial/hgweb/webcommands.py
>>  --- a/mercurial/hgweb/webcommands.py Wed Aug 07 01:16:14 2013 +0400
>>  +++ b/mercurial/hgweb/webcommands.py Sat Jul 20 01:32:38 2013 +0400
>>  @@ -211,7 +211,11 @@
>>               # can't parse to a tree
>>               modename = 'kw'
>>           else:
>>  -            if revset.depth(tree) > 2:
>>  +            funcsused = revset.funcsused(tree)
>>  +            blacklist = web.configlist('web', 'revsetblacklist', ['contains'])
>>  +            blacklist = set(blacklist)
>>  +
>>  +            if revset.depth(tree) > 2 and not funcsused & blacklist:
>>                   mfunc = revset.match(None, revdef)
>>                   try:
>>                       # try running against empty subset
>>  @@ -224,7 +228,7 @@
>>                       # can't run the revset query, e.g. some function misspelled
>>                       modename = 'kw'
>>               else:
>>  -                # no revset syntax used
>>  +                # no revset syntax used or blacklisted functions in the query
>>                   modename = 'kw'
>>
>>       searchfunc = searchfuncs[modename]
>>  _______________________________________________
>>  Mercurial-devel mailing list
>>  Mercurial-devel@selenic.com
>>  http://selenic.com/mailman/listinfo/mercurial-devel
Augie Fackler - Aug. 12, 2013, 6:22 p.m.
On Mon, Aug 12, 2013 at 2:21 PM, Alexander Plavin <alexander@plav.in> wrote:
>> Probably want to blacklist anything that does regexp matches too,
>> since we're not on re2.
>
> As I understand, blacklisting grep function and also 're:' prefix for others? I can see two ways to do this: pass an argument somehow to revset._stringmatcher function to switch off 're:' prefix check, or just replacing '\(\s+re:' with '(literal:re:' in the query string. The first methods seems more robust of course. Am I correct here?

sounds right to me. I can't think of other things that take regexps at
the moment.

>
> Btw, nice library re2, didn't see it before :)
Alexander Plavin - Aug. 12, 2013, 6:41 p.m.
12.08.2013, 22:22, "Augie Fackler" <raf@durin42.com>:
> On Mon, Aug 12, 2013 at 2:21 PM, Alexander Plavin <alexander@plav.in> wrote:
>
>>>  Probably want to blacklist anything that does regexp matches too,
>>>  since we're not on re2.
>>  As I understand, blacklisting grep function and also 're:' prefix for others? I can see two ways to do this: pass an argument somehow to revset._stringmatcher function to switch off 're:' prefix check, or just replacing '\(\s+re:' with '(literal:re:' in the query string. The first methods seems more robust of course. Am I correct here?
>
> sounds right to me. I can't think of other things that take regexps at
> the moment.

And what about " just replacing '\(\s+re:' with '(literal:re:' "? Revset syntax is quite simple, so this should probably work for all cases. 
Just noticed, BitBucket supports 're:' prefix, but they possibly use re2.

>
>>  Btw, nice library re2, didn't see it before :)
Augie Fackler - Aug. 12, 2013, 6:42 p.m.
On Mon, Aug 12, 2013 at 2:41 PM, Alexander Plavin <alexander@plav.in> wrote:
> And what about " just replacing '\(\s+re:' with '(literal:re:' "? Revset syntax is quite simple, so this should probably work for all cases.

That seems fine.

> Just noticed, BitBucket supports 're:' prefix, but they possibly use re2.

Or they're living dangerously.

Patch

diff -r 80319cecf939 -r e495c742bf85 mercurial/help/config.txt
--- a/mercurial/help/config.txt	Wed Aug 07 01:16:14 2013 +0400
+++ b/mercurial/help/config.txt	Sat Jul 20 01:32:38 2013 +0400
@@ -1461,6 +1461,10 @@ 
     Whether to require that inbound pushes be transported over SSL to
     prevent password sniffing. Default is True.
 
+``revsetblacklist``
+    List of revset functions which are not allowed in search queries.
+    Default is 'contains'.
+
 ``staticurl``
     Base URL to use for static files. If unset, static files (e.g. the
     hgicon.png favicon) will be served by the CGI script itself. Use
diff -r 80319cecf939 -r e495c742bf85 mercurial/hgweb/webcommands.py
--- a/mercurial/hgweb/webcommands.py	Wed Aug 07 01:16:14 2013 +0400
+++ b/mercurial/hgweb/webcommands.py	Sat Jul 20 01:32:38 2013 +0400
@@ -211,7 +211,11 @@ 
             # can't parse to a tree
             modename = 'kw'
         else:
-            if revset.depth(tree) > 2:
+            funcsused = revset.funcsused(tree)
+            blacklist = web.configlist('web', 'revsetblacklist', ['contains'])
+            blacklist = set(blacklist)
+
+            if revset.depth(tree) > 2 and not funcsused & blacklist:
                 mfunc = revset.match(None, revdef)
                 try:
                     # try running against empty subset
@@ -224,7 +228,7 @@ 
                     # can't run the revset query, e.g. some function misspelled
                     modename = 'kw'
             else:
-                # no revset syntax used
+                # no revset syntax used or blacklisted functions in the query
                 modename = 'kw'
 
     searchfunc = searchfuncs[modename]