Patchwork [5,of,6,V3] hgweb: blacklist heavyweight revset functions in hgweb search

login
register
mail settings
Submitter Alexander Plavin
Date Aug. 22, 2013, 3:11 p.m.
Message ID <3767921c4b274499fe42.1377184276@debian-alexander.dolgopa>
Download mbox | patch
Permalink /patch/2248/
State Deferred
Headers show

Comments

Alexander Plavin - Aug. 22, 2013, 3:11 p.m.
# HG changeset patch
# User Alexander Plavin <alexander@plav.in>
# Date 1374269558 -14400
#      Sat Jul 20 01:32:38 2013 +0400
# Node ID 3767921c4b274499fe4254bdafef56bba346b088
# Parent  5734dd4b2bd2a859a2ef0be6e0f4485f028abf6e
hgweb: blacklist heavyweight revset functions in hgweb search

Disallow usage of functions 'contains' and 'grep'.
Katsunori FUJIWARA - Aug. 29, 2013, 6:04 a.m.
At Thu, 22 Aug 2013 19:11:16 +0400,
Alexander Plavin wrote:
> 
> # HG changeset patch
> # User Alexander Plavin <alexander@plav.in>
> # Date 1374269558 -14400
> #      Sat Jul 20 01:32:38 2013 +0400
> # Node ID 3767921c4b274499fe4254bdafef56bba346b088
> # Parent  5734dd4b2bd2a859a2ef0be6e0f4485f028abf6e
> hgweb: blacklist heavyweight revset functions in hgweb search
> 
> Disallow usage of functions 'contains' and 'grep'.
> 
> diff -r 5734dd4b2bd2 -r 3767921c4b27 mercurial/hgweb/webcommands.py
> --- a/mercurial/hgweb/webcommands.py	Wed Aug 07 01:21:31 2013 +0400
> +++ b/mercurial/hgweb/webcommands.py	Sat Jul 20 01:32:38 2013 +0400
> @@ -179,6 +179,10 @@
>          if any_((token, (value or '')[:3]) == ('string', 're:')
>                 for token, value, pos in revset.tokenize(revdef)):
>              return 'kw', query
> +        funcsused = revset.funcsused(tree)
> +        blacklist = set(['contains', 'grep'])
> +        if funcsused & blacklist:
> +            return 'kw', query

IMHO, "blacklist" information of revsets should be managed in
mercurial/revset.py, for ease of maintenance in future (following
newly added predicates, for example).

  
>          mfunc = revset.match(None, revdef)
>          try:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
> 


----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Alexander Plavin - Aug. 31, 2013, 12:43 p.m.
29.08.2013, 10:04, "FUJIWARA Katsunori" <foozy@lares.dti.ne.jp>:
> At Thu, 22 Aug 2013 19:11:16 +0400,
> Alexander Plavin wrote:
>
>>  # HG changeset patch
>>  # User Alexander Plavin <alexander@plav.in>
>>  # Date 1374269558 -14400
>>  #      Sat Jul 20 01:32:38 2013 +0400
>>  # Node ID 3767921c4b274499fe4254bdafef56bba346b088
>>  # Parent  5734dd4b2bd2a859a2ef0be6e0f4485f028abf6e
>>  hgweb: blacklist heavyweight revset functions in hgweb search
>>
>>  Disallow usage of functions 'contains' and 'grep'.
>>
>>  diff -r 5734dd4b2bd2 -r 3767921c4b27 mercurial/hgweb/webcommands.py
>>  --- a/mercurial/hgweb/webcommands.py Wed Aug 07 01:21:31 2013 +0400
>>  +++ b/mercurial/hgweb/webcommands.py Sat Jul 20 01:32:38 2013 +0400
>>  @@ -179,6 +179,10 @@
>>           if any_((token, (value or '')[:3]) == ('string', 're:')
>>                  for token, value, pos in revset.tokenize(revdef)):
>>               return 'kw', query
>>  +        funcsused = revset.funcsused(tree)
>>  +        blacklist = set(['contains', 'grep'])
>>  +        if funcsused & blacklist:
>>  +            return 'kw', query
>
> IMHO, "blacklist" information of revsets should be managed in
> mercurial/revset.py, for ease of maintenance in future (following
> newly added predicates, for example).

Not sure if I am for or against this, so probably neutral :) I would like to hear other opinions, and if it's really more suitable then no problem to move the related code to revset.py.

>
>>           mfunc = revset.match(None, revdef)
>>           try:
>>  _______________________________________________
>>  Mercurial-devel mailing list
>>  Mercurial-devel@selenic.com
>>  http://selenic.com/mailman/listinfo/mercurial-devel
>
> ----------------------------------------------------------------------
> [FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Matt Mackall - Sept. 12, 2013, 3:39 a.m.
On Mon, 2013-09-02 at 22:57 -0500, Kevin Bullock wrote:
> On 1 Sep 2013, at 1:25 AM, Alexander Plavin wrote:
> 
> > 01.09.2013, 08:54, "Kevin Bullock" <kbullock+mercurial@ringworld.org>:
> >> On 22 Aug 2013, at 10:11 AM, Alexander Plavin wrote:
> >> 
> >>>  # HG changeset patch
> >>>  # User Alexander Plavin <alexander@plav.in>
> >>>  # Date 1374269558 -14400
> >>>  #      Sat Jul 20 01:32:38 2013 +0400
> >>>  # Node ID 3767921c4b274499fe4254bdafef56bba346b088
> >>>  # Parent  5734dd4b2bd2a859a2ef0be6e0f4485f028abf6e
> >>>  hgweb: blacklist heavyweight revset functions in hgweb search
> >>> 
> >>>  Disallow usage of functions 'contains' and 'grep'.
> >> 
> >> It will be verbose, but I'd rather have a whitelist of known-safe(-ish) revsets. That way when we add the next (possibly unexpectedly!) compute-intensive revset, we won't be opening our users up to new DoS attacks because we forgot to blacklist it.
> > 
> > Agree, makes sense. And what about moving this part of code to revset.py?
> 
> No strong opinion either way.

Seems a good idea to me.
Alexander Plavin - Sept. 12, 2013, 3:58 p.m.
12.09.2013, 07:39, "Matt Mackall" <mpm@selenic.com>:
> On Mon, 2013-09-02 at 22:57 -0500, Kevin Bullock wrote:
>
>>  On 1 Sep 2013, at 1:25 AM, Alexander Plavin wrote:
>>>  01.09.2013, 08:54, "Kevin Bullock" <kbullock+mercurial@ringworld.org>:
>>>>  On 22 Aug 2013, at 10:11 AM, Alexander Plavin wrote:
>>>>>   # HG changeset patch
>>>>>   # User Alexander Plavin <alexander@plav.in>
>>>>>   # Date 1374269558 -14400
>>>>>   #      Sat Jul 20 01:32:38 2013 +0400
>>>>>   # Node ID 3767921c4b274499fe4254bdafef56bba346b088
>>>>>   # Parent  5734dd4b2bd2a859a2ef0be6e0f4485f028abf6e
>>>>>   hgweb: blacklist heavyweight revset functions in hgweb search
>>>>>
>>>>>   Disallow usage of functions 'contains' and 'grep'.
>>>>  It will be verbose, but I'd rather have a whitelist of known-safe(-ish) revsets. That way when we add the next (possibly unexpectedly!) compute-intensive revset, we won't be opening our users up to new DoS attacks because we forgot to blacklist it.
>>>  Agree, makes sense. And what about moving this part of code to revset.py?
>>  No strong opinion either way.
>
> Seems a good idea to me.

In the more recent version of these patches (namely, those with V5 flag) I'd done this. It would be nice if you looked at them too :)

>
> --
> Mathematics is the supreme nostalgia of our time.
Matt Mackall - Sept. 12, 2013, 11:03 p.m.
On Thu, 2013-09-12 at 19:58 +0400, Alexander Plavin wrote:
> 
> 12.09.2013, 07:39, "Matt Mackall" <mpm@selenic.com>:
> > On Mon, 2013-09-02 at 22:57 -0500, Kevin Bullock wrote:
> >
> >>  On 1 Sep 2013, at 1:25 AM, Alexander Plavin wrote:
> >>>  01.09.2013, 08:54, "Kevin Bullock" <kbullock+mercurial@ringworld.org>:
> >>>>  On 22 Aug 2013, at 10:11 AM, Alexander Plavin wrote:
> >>>>>   # HG changeset patch
> >>>>>   # User Alexander Plavin <alexander@plav.in>
> >>>>>   # Date 1374269558 -14400
> >>>>>   #      Sat Jul 20 01:32:38 2013 +0400
> >>>>>   # Node ID 3767921c4b274499fe4254bdafef56bba346b088
> >>>>>   # Parent  5734dd4b2bd2a859a2ef0be6e0f4485f028abf6e
> >>>>>   hgweb: blacklist heavyweight revset functions in hgweb search
> >>>>>
> >>>>>   Disallow usage of functions 'contains' and 'grep'.
> >>>>  It will be verbose, but I'd rather have a whitelist of known-safe(-ish) revsets. That way when we add the next (possibly unexpectedly!) compute-intensive revset, we won't be opening our users up to new DoS attacks because we forgot to blacklist it.
> >>>  Agree, makes sense. And what about moving this part of code to revset.py?
> >>  No strong opinion either way.
> >
> > Seems a good idea to me.
> 
> In the more recent version of these patches (namely, those with V5 flag) I'd done this. It would be nice if you looked at them too :)

Argh. V5, he says! V5 of WHICH of the eight or so hard-to-distinguish
series currently clogging my inbox??

It's extremely difficult for me to determine what's new and what's
obsolete in my inbox anymore, because I have pages and pages of
patchsets with the same author and similar summaries and no good way to
spot duplicates. It's a maze of twisty little messages, all slightly
different and I don't want to spend all my time sorting through this
mess. And not sorting through this mess means my patch acceptance rate
(and your progress) has suffered for weeks. That your mentor has failed
to throttle[1] you earlier is unfortunate.

I _really_ should just dump absolutely everything from you currently in
my inbox so I can get back on top of things, but that means I would miss
existing review comments to figure out whether your future patches are
acceptable and what's already been addressed.

Recommended reading:

http://mercurial.selenic.com/wiki/ContributingChanges#Flow_control
http://www.selenic.com/inbox

[1] pun intended?
Augie Fackler - Sept. 13, 2013, 12:39 a.m.
On Sep 12, 2013, at 7:03 PM, Matt Mackall <mpm@selenic.com> wrote:

> I _really_ should just dump absolutely everything from you currently in
> my inbox so I can get back on top of things, but that means I would miss
> existing review comments to figure out whether your future patches are
> acceptable and what's already been addressed.

I think Kevin and I may have exacerbated your problem. At this point, I think Kevin and I have gone through everything that came in while you were at BM, so perhaps just declare bankruptcy explicitly and we'll move on?
Alexander Plavin - Sept. 13, 2013, 8:37 a.m.
13.09.2013, 03:03, "Matt Mackall" <mpm@selenic.com>:
> On Thu, 2013-09-12 at 19:58 +0400, Alexander Plavin wrote:
>
>>  12.09.2013, 07:39, "Matt Mackall" <mpm@selenic.com>:
>>>  On Mon, 2013-09-02 at 22:57 -0500, Kevin Bullock wrote:
>>>>   On 1 Sep 2013, at 1:25 AM, Alexander Plavin wrote:
>>>>>   01.09.2013, 08:54, "Kevin Bullock" <kbullock+mercurial@ringworld.org>:
>>>>>>   On 22 Aug 2013, at 10:11 AM, Alexander Plavin wrote:
>>>>>>>    # HG changeset patch
>>>>>>>    # User Alexander Plavin <alexander@plav.in>
>>>>>>>    # Date 1374269558 -14400
>>>>>>>    #      Sat Jul 20 01:32:38 2013 +0400
>>>>>>>    # Node ID 3767921c4b274499fe4254bdafef56bba346b088
>>>>>>>    # Parent  5734dd4b2bd2a859a2ef0be6e0f4485f028abf6e
>>>>>>>    hgweb: blacklist heavyweight revset functions in hgweb search
>>>>>>>
>>>>>>>    Disallow usage of functions 'contains' and 'grep'.
>>>>>>   It will be verbose, but I'd rather have a whitelist of known-safe(-ish) revsets. That way when we add the next (possibly unexpectedly!) compute-intensive revset, we won't be opening our users up to new DoS attacks because we forgot to blacklist it.
>>>>>   Agree, makes sense. And what about moving this part of code to revset.py?
>>>>   No strong opinion either way.
>>>  Seems a good idea to me.
>>  In the more recent version of these patches (namely, those with V5 flag) I'd done this. It would be nice if you looked at them too :)
>
> Argh. V5, he says! V5 of WHICH of the eight or so hard-to-distinguish
> series currently clogging my inbox??
>
> It's extremely difficult for me to determine what's new and what's
> obsolete in my inbox anymore, because I have pages and pages of
> patchsets with the same author and similar summaries and no good way to
> spot duplicates. It's a maze of twisty little messages, all slightly
> different and I don't want to spend all my time sorting through this
> mess. And not sorting through this mess means my patch acceptance rate
> (and your progress) has suffered for weeks. That your mentor has failed
> to throttle[1] you earlier is unfortunate.
>
> I _really_ should just dump absolutely everything from you currently in
> my inbox so I can get back on top of things, but that means I would miss
> existing review comments to figure out whether your future patches are
> acceptable and what's already been addressed.
>
> Recommended reading:
>
> http://mercurial.selenic.com/wiki/ContributingChanges#Flow_control
> http://www.selenic.com/inbox

I've read this already, and my submissions satisfy almost all of the points at #Flow_control. For example, there are only 8 my patches which are marked with 'Action Required' at patchwork, and it doesn't seem a great number. And quick re-sends of newer versions also match the advice given on that page, namely 'Try to respond quickly to feedback before we forget what your patch is about'. So, I don't know where I'm wrong now and how to do this correctly.

>
> [1] pun intended?
>
> --
> Mathematics is the supreme nostalgia of our time.
Matt Mackall - Sept. 13, 2013, 10:15 p.m.
> > Recommended reading:
> >
> > http://mercurial.selenic.com/wiki/ContributingChanges#Flow_control
> > http://www.selenic.com/inbox
> 
> I've read this already, and my submissions satisfy almost all of the
> points at #Flow_control.

Then you've completely failed to understand what's written there. Let's
quote Wikipedia: 

"In data communications, flow control is the process of managing the
rate of data transmission between two nodes to prevent a fast sender
from overwhelming a slow receiver."

Click here:

http://www.selenic.com/inbox

See that second number? That's the number of patches still waiting in my
inbox. See how it's MUCH greater than 100? See the third number? That's
roughly the number of days behind on email I am. Kindly slow way the
hell down so I can catch up. Consider going all the way down to one
unacknowledged patch at a time.

>  For example, there are only 8 my patches which are marked with
> 'Action Required' at patchwork, and it doesn't seem a great number.

Patchwork (as the first page says) is irrelevant. It is of practically
no use to me in particular, since I'm already using my inbox to track
patches. It actually makes what I do MORE WORK by adding a whole second
system to track what's already in my inbox and since I can't do email
reviews from patchwork, I can't ditch my inbox.
Alexander Plavin - Sept. 14, 2013, 8:02 a.m.
14.09.2013, 02:15, "Matt Mackall" <mpm@selenic.com>:
>>>  Recommended reading:
>>>
>>>  http://mercurial.selenic.com/wiki/ContributingChanges#Flow_control
>>>  http://www.selenic.com/inbox
>>  I've read this already, and my submissions satisfy almost all of the
>>  points at #Flow_control.
>
> Then you've completely failed to understand what's written there. Let's
> quote Wikipedia:
>
> "In data communications, flow control is the process of managing the
> rate of data transmission between two nodes to prevent a fast sender
> from overwhelming a slow receiver."
>
> Click here:
>
> http://www.selenic.com/inbox
>
> See that second number? That's the number of patches still waiting in my
> inbox. See how it's MUCH greater than 100? See the third number? That's
> roughly the number of days behind on email I am. Kindly slow way the
> hell down so I can catch up. Consider going all the way down to one
> unacknowledged patch at a time.

By writing "my submissions satisfy almost all of the points" I really meant "almost all", not just "all". Actually, all points are impossible to satisfy: I can't simultaneously reply with new versions of patches quickly AND slow down while the number of patches at http://www.selenic.com/inbox is >100 (as I rememeber, during the last week or more it was >100 all the time, throught I may be wrong here). So, as I can't satisfy all the points, I've chosen a subset of them, and according to your messages here I had to chose another subset. And still not quite sure, what exactly is 'slow down' here? Send no more than N patches (and K emails?) per M days while the amount of patches is >100? Or something else?

By the way, the previous version of http://mercurial.selenic.com/wiki/ContributingChanges (about just a month ago) was much more loyal to contributors.

>
>>   For example, there are only 8 my patches which are marked with
>>  'Action Required' at patchwork, and it doesn't seem a great number.
>
> Patchwork (as the first page says) is irrelevant. It is of practically
> no use to me in particular, since I'm already using my inbox to track
> patches. It actually makes what I do MORE WORK by adding a whole second
> system to track what's already in my inbox and since I can't do email
> reviews from patchwork, I can't ditch my inbox.

The whole situation isn't clear for contributors, as we can see which patches need review at patchwork but can't see those that need review at your inbox. The amount of patches at http://www.selenic.com/inbox is always greater than at patchwork, so I think there are just old versions of patches which don't need reviewing, and only discussion related to them can be of some use.

I'd note that while the whole system of reviewing by email looks like the best way for those who got used to it during years, it's not so for other people who already used special patchtracking systems. Even from this part of your message it seems to me that it would be much better for the whole reviewing process if there was the ability to unite an old and new version of a patch/series in one place, with the discussion together. And yes, I know that you all can name a lot of advantages of plain old email reviewing, here is just a note.

>
> --
> Mathematics is the supreme nostalgia of our time.

I want to say once again here, that it would be better if you told not what NOT to do (like 'don't send many new patches ...'), but what to do actually (like 'send no more than N patches (and K emails?) per M days while the amount of patches is >100'). Now it's not fully clear, at least for me.

Patch

diff -r 5734dd4b2bd2 -r 3767921c4b27 mercurial/hgweb/webcommands.py
--- a/mercurial/hgweb/webcommands.py	Wed Aug 07 01:21:31 2013 +0400
+++ b/mercurial/hgweb/webcommands.py	Sat Jul 20 01:32:38 2013 +0400
@@ -179,6 +179,10 @@ 
         if any_((token, (value or '')[:3]) == ('string', 're:')
                for token, value, pos in revset.tokenize(revdef)):
             return 'kw', query
+        funcsused = revset.funcsused(tree)
+        blacklist = set(['contains', 'grep'])
+        if funcsused & blacklist:
+            return 'kw', query
 
         mfunc = revset.match(None, revdef)
         try: