Patchwork [1,of,2,V2] revset: use smartset minus operator

login
register
mail settings
Submitter Durham Goode
Date Feb. 18, 2016, 7:38 p.m.
Message ID <dc8d72c3df7dfcd7b67f.1455824287@dev8486.prn1.facebook.com>
Download mbox | patch
Permalink /patch/13255/
State Superseded
Commit d2ac8b57a75d5a28bcf448e46ac94fb62a921c2b
Delegated to: Martin von Zweigbergk
Headers show

Comments

Durham Goode - Feb. 18, 2016, 7:38 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1455824115 28800
#      Thu Feb 18 11:35:15 2016 -0800
# Node ID dc8d72c3df7dfcd7b67fa71d679cc7c3ad90f25c
# Parent  a036e1ae1fbe88ab99cb861ebfc2e4da7a3912ca
revset: use smartset minus operator

Previously, revsets like 'X - Y' were translated to be 'X and not Y'. This can
be expensive, since if Y is a single commit then 'not Y' becomes a huge set and
sometimes the query optimizer doesn't account for it well.

This patch changes revsets to use the built in smartset minus operator, which is
often smarter than 'X and not Y'.

On a large repo this saves 2.2 seconds on rebase and histedit because "X:: - X"
becomes almost instant.

Relevant performance numbers from revsetbenchmark.py

revset #13: roots((tip~100::) - (tip~100::tip))
   plain         min           max           first         last          reverse       rev..rst      rev..ast      sort          sor..rst      sor..ast
0) 0.001080      0.001107      0.001102      0.001118      0.001121      0.001114      0.001141      0.001123      0.001099      0.001123      0.001137
1) 0.000708  65% 0.000738  66% 0.000735  66% 0.000739  66% 0.000784  69% 0.000780  70% 0.000807  70% 0.000756  67% 0.000727  66% 0.000759  67% 0.000808  71%

revset #14: roots((0::) - (0::tip))
   plain         min           max           first         last          reverse       rev..rst      rev..ast      sort          sor..rst      sor..ast
0) 0.131304      0.079168      0.133129      0.076560      0.048179      0.133349      0.049153      0.077097      0.129689      0.076212      0.048543
1) 0.065066  49% 0.036941  46% 0.066063  49% 0.034755  45% 0.048558      0.071091  53% 0.047679      0.034984  45% 0.064572  49% 0.035680  46% 0.048508

revset #22: (not public() - obsolete())
   plain         min           max           first         last          reverse       rev..rst      rev..ast      sort          sor..rst      sor..ast
0) 0.000139      0.000133      0.000133      0.000138      0.000134      0.000155      0.000157      0.000152      0.000157      0.000156      0.000153
1) 0.000108  77% 0.000129      0.000129      0.000134      0.000132      0.000127  81% 0.000151      0.000147      0.000127  80% 0.000152      0.000149

revset #25: (20000::) - (20000)
   plain         min           max           first         last          reverse       rev..rst      rev..ast      sort          sor..rst      sor..ast
0) 0.050560      0.045513      0.022593      0.043588      0.021909      0.045517      0.021822      0.044660      0.049740      0.044227      0.021819
1) 0.018614  36% 0.000171   0% 0.019659  87% 0.000168   0% 0.015543  70% 0.021069  46% 0.015623  71% 0.000180   0% 0.018658  37% 0.000186   0% 0.015750  72%
Martin von Zweigbergk - Feb. 20, 2016, 7:39 a.m.
On Thu, Feb 18, 2016 at 11:38 AM, Durham Goode <durham@fb.com> wrote:
> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1455824115 28800
> #      Thu Feb 18 11:35:15 2016 -0800
> # Node ID dc8d72c3df7dfcd7b67fa71d679cc7c3ad90f25c
> # Parent  a036e1ae1fbe88ab99cb861ebfc2e4da7a3912ca
> revset: use smartset minus operator
>
> Previously, revsets like 'X - Y' were translated to be 'X and not Y'. This can
> be expensive, since if Y is a single commit then 'not Y' becomes a huge set and
> sometimes the query optimizer doesn't account for it well.
>
> This patch changes revsets to use the built in smartset minus operator, which is
> often smarter than 'X and not Y'.

How does this patch compare to my patch in
http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/89348/focus=89391?
I did not include the test case changes there, but it only affects the
first one among the ones affected by this patch, so there is no need
to revert that behavior in a second patch like here. It also makes
"(20000::) and not (20000)" faster (which these two patches do not).
What are the disadvantages?
Durham Goode - Feb. 23, 2016, 8:30 p.m.
On 2/19/16, 11:39 PM, "Martin von Zweigbergk" <martinvonz@google.com> wrote:

>On Thu, Feb 18, 2016 at 11:38 AM, Durham Goode <durham@fb.com> wrote:

>> # HG changeset patch

>> # User Durham Goode <durham@fb.com>

>> # Date 1455824115 28800

>> #      Thu Feb 18 11:35:15 2016 -0800

>> # Node ID dc8d72c3df7dfcd7b67fa71d679cc7c3ad90f25c

>> # Parent  a036e1ae1fbe88ab99cb861ebfc2e4da7a3912ca

>> revset: use smartset minus operator

>>

>> Previously, revsets like 'X - Y' were translated to be 'X and not Y'. This can

>> be expensive, since if Y is a single commit then 'not Y' becomes a huge set and

>> sometimes the query optimizer doesn't account for it well.

>>

>> This patch changes revsets to use the built in smartset minus operator, which is

>> often smarter than 'X and not Y'.

>

>How does this patch compare to my patch in

>https://urldefense.proofpoint.com/v2/url?u=http-3A__thread.gmane.org_gmane.comp.version-2Dcontrol.mercurial.devel_89348_focus-3D89391-3F&d=CwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=Ic3AKLl02HHMU_yM5NlmBLJc3oZ8MUkGiPj-TnPQQiI&s=NoBhM7NiNWrsbHqoCgMbe-t_MuPSdYkIqB5W2go53mY&e= 

>I did not include the test case changes there, but it only affects the

>first one among the ones affected by this patch, so there is no need

>to revert that behavior in a second patch like here. It also makes

>"(20000::) and not (20000)" faster (which these two patches do not).

>What are the disadvantages?


We could add your patch as well, but it seems like it would have to be a separate patch anyway (one adds minus and avoids the minus->and-not translation, then another patch takes advantage of it for and not).  I did not add the optimization to 'and' because none of the benchmark tests showed any perf improvement.  I suppose "(20000::) and not (20000)" might show an improvement, but that probably doesn't stop these from going.
Martin von Zweigbergk - Feb. 23, 2016, 8:45 p.m.
I think my patch also took care of all the improvements these two patches
address. Are there any cases where my patch would be worse?

On Tue, Feb 23, 2016, 12:30 Durham Goode <durham@fb.com> wrote:

>
>
>
>
>
> On 2/19/16, 11:39 PM, "Martin von Zweigbergk" <martinvonz@google.com>
> wrote:
>
> >On Thu, Feb 18, 2016 at 11:38 AM, Durham Goode <durham@fb.com> wrote:
> >> # HG changeset patch
> >> # User Durham Goode <durham@fb.com>
> >> # Date 1455824115 28800
> >> #      Thu Feb 18 11:35:15 2016 -0800
> >> # Node ID dc8d72c3df7dfcd7b67fa71d679cc7c3ad90f25c
> >> # Parent  a036e1ae1fbe88ab99cb861ebfc2e4da7a3912ca
> >> revset: use smartset minus operator
> >>
> >> Previously, revsets like 'X - Y' were translated to be 'X and not Y'.
> This can
> >> be expensive, since if Y is a single commit then 'not Y' becomes a huge
> set and
> >> sometimes the query optimizer doesn't account for it well.
> >>
> >> This patch changes revsets to use the built in smartset minus operator,
> which is
> >> often smarter than 'X and not Y'.
> >
> >How does this patch compare to my patch in
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__thread.gmane.org_gmane.comp.version-2Dcontrol.mercurial.devel_89348_focus-3D89391-3F&d=CwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=Ic3AKLl02HHMU_yM5NlmBLJc3oZ8MUkGiPj-TnPQQiI&s=NoBhM7NiNWrsbHqoCgMbe-t_MuPSdYkIqB5W2go53mY&e=
> >I did not include the test case changes there, but it only affects the
> >first one among the ones affected by this patch, so there is no need
> >to revert that behavior in a second patch like here. It also makes
> >"(20000::) and not (20000)" faster (which these two patches do not).
> >What are the disadvantages?
>
> We could add your patch as well, but it seems like it would have to be a
> separate patch anyway (one adds minus and avoids the minus->and-not
> translation, then another patch takes advantage of it for and not).  I did
> not add the optimization to 'and' because none of the benchmark tests
> showed any perf improvement.  I suppose "(20000::) and not (20000)" might
> show an improvement, but that probably doesn't stop these from going.
>
Durham Goode - Feb. 23, 2016, 8:54 p.m.
Your patch routes all minus revsets through the and-not operator optimizer, which seemed inelegant and prone to future maintenance mistakes.

From: Martin von Zweigbergk <martinvonz@google.com<mailto:martinvonz@google.com>>

Date: Tuesday, February 23, 2016 at 12:45 PM
To: Durham Goode <durham@fb.com<mailto:durham@fb.com>>
Cc: "mercurial-devel@mercurial-scm.org<mailto:mercurial-devel@mercurial-scm.org>" <mercurial-devel@mercurial-scm.org<mailto:mercurial-devel@mercurial-scm.org>>
Subject: Re: [PATCH 1 of 2 V2] revset: use smartset minus operator


I think my patch also took care of all the improvements these two patches address. Are there any cases where my patch would be worse?

On Tue, Feb 23, 2016, 12:30 Durham Goode <durham@fb.com<mailto:durham@fb.com>> wrote:





On 2/19/16, 11:39 PM, "Martin von Zweigbergk" <martinvonz@google.com<mailto:martinvonz@google.com>> wrote:

>On Thu, Feb 18, 2016 at 11:38 AM, Durham Goode <durham@fb.com<mailto:durham@fb.com>> wrote:

>> # HG changeset patch

>> # User Durham Goode <durham@fb.com<mailto:durham@fb.com>>

>> # Date 1455824115 28800

>> #      Thu Feb 18 11:35:15 2016 -0800

>> # Node ID dc8d72c3df7dfcd7b67fa71d679cc7c3ad90f25c

>> # Parent  a036e1ae1fbe88ab99cb861ebfc2e4da7a3912ca

>> revset: use smartset minus operator

>>

>> Previously, revsets like 'X - Y' were translated to be 'X and not Y'. This can

>> be expensive, since if Y is a single commit then 'not Y' becomes a huge set and

>> sometimes the query optimizer doesn't account for it well.

>>

>> This patch changes revsets to use the built in smartset minus operator, which is

>> often smarter than 'X and not Y'.

>

>How does this patch compare to my patch in

>https://urldefense.proofpoint.com/v2/url?u=http-3A__thread.gmane.org_gmane.comp.version-2Dcontrol.mercurial.devel_89348_focus-3D89391-3F&d=CwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=Ic3AKLl02HHMU_yM5NlmBLJc3oZ8MUkGiPj-TnPQQiI&s=NoBhM7NiNWrsbHqoCgMbe-t_MuPSdYkIqB5W2go53mY&e=

>I did not include the test case changes there, but it only affects the

>first one among the ones affected by this patch, so there is no need

>to revert that behavior in a second patch like here. It also makes

>"(20000::) and not (20000)" faster (which these two patches do not).

>What are the disadvantages?


We could add your patch as well, but it seems like it would have to be a separate patch anyway (one adds minus and avoids the minus->and-not translation, then another patch takes advantage of it for and not).  I did not add the optimization to 'and' because none of the benchmark tests showed any perf improvement.  I suppose "(20000::) and not (20000)" might show an improvement, but that probably doesn't stop these from going.
Martin von Zweigbergk - Feb. 23, 2016, 9:44 p.m.
Could you elaborate on why? It results in less duplication, which is
usually good for maintainability.

The only thing I disliked about my patch was the name "minusset". I would
perhaps rename that to "difference". Otherwise it seemed simple and clean
to me.

On Tue, Feb 23, 2016, 12:54 Durham Goode <durham@fb.com> wrote:

> Your patch routes all minus revsets through the and-not operator
> optimizer, which seemed inelegant and prone to future maintenance mistakes.
>
> From: Martin von Zweigbergk <martinvonz@google.com>
> Date: Tuesday, February 23, 2016 at 12:45 PM
> To: Durham Goode <durham@fb.com>
> Cc: "mercurial-devel@mercurial-scm.org" <mercurial-devel@mercurial-scm.org
> >
> Subject: Re: [PATCH 1 of 2 V2] revset: use smartset minus operator
>
> I think my patch also took care of all the improvements these two patches
> address. Are there any cases where my patch would be worse?
>
> On Tue, Feb 23, 2016, 12:30 Durham Goode <durham@fb.com> wrote:
>
>>
>>
>>
>>
>>
>> On 2/19/16, 11:39 PM, "Martin von Zweigbergk" <martinvonz@google.com>
>> wrote:
>>
>> >On Thu, Feb 18, 2016 at 11:38 AM, Durham Goode <durham@fb.com> wrote:
>> >> # HG changeset patch
>> >> # User Durham Goode <durham@fb.com>
>> >> # Date 1455824115 28800
>> >> #      Thu Feb 18 11:35:15 2016 -0800
>> >> # Node ID dc8d72c3df7dfcd7b67fa71d679cc7c3ad90f25c
>> >> # Parent  a036e1ae1fbe88ab99cb861ebfc2e4da7a3912ca
>> >> revset: use smartset minus operator
>> >>
>> >> Previously, revsets like 'X - Y' were translated to be 'X and not Y'.
>> This can
>> >> be expensive, since if Y is a single commit then 'not Y' becomes a
>> huge set and
>> >> sometimes the query optimizer doesn't account for it well.
>> >>
>> >> This patch changes revsets to use the built in smartset minus
>> operator, which is
>> >> often smarter than 'X and not Y'.
>> >
>> >How does this patch compare to my patch in
>> >
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__thread.gmane.org_gmane.comp.version-2Dcontrol.mercurial.devel_89348_focus-3D89391-3F&d=CwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=Ic3AKLl02HHMU_yM5NlmBLJc3oZ8MUkGiPj-TnPQQiI&s=NoBhM7NiNWrsbHqoCgMbe-t_MuPSdYkIqB5W2go53mY&e=
>> >I did not include the test case changes there, but it only affects the
>> >first one among the ones affected by this patch, so there is no need
>> >to revert that behavior in a second patch like here. It also makes
>> >"(20000::) and not (20000)" faster (which these two patches do not).
>> >What are the disadvantages?
>>
>> We could add your patch as well, but it seems like it would have to be a
>> separate patch anyway (one adds minus and avoids the minus->and-not
>> translation, then another patch takes advantage of it for and not).  I did
>> not add the optimization to 'and' because none of the benchmark tests
>> showed any perf improvement.  I suppose "(20000::) and not (20000)" might
>> show an improvement, but that probably doesn't stop these from going.
>>
>
Durham Goode - Feb. 23, 2016, 9:54 p.m.
Taking a minus operator, then converting it to "and-not", then converting it back to minus just seems strange.  I can see people screwing up minus operator perf while trying to improve 'and' performance and not even realizing it.  Seems cleaner to just let minus be minus and be done with it.


From: Martin von Zweigbergk <martinvonz@google.com<mailto:martinvonz@google.com>>

Date: Tuesday, February 23, 2016 at 1:44 PM
To: Durham Goode <durham@fb.com<mailto:durham@fb.com>>
Cc: "mercurial-devel@mercurial-scm.org<mailto:mercurial-devel@mercurial-scm.org>" <mercurial-devel@mercurial-scm.org<mailto:mercurial-devel@mercurial-scm.org>>
Subject: Re: [PATCH 1 of 2 V2] revset: use smartset minus operator


Could you elaborate on why? It results in less duplication, which is usually good for maintainability.

The only thing I disliked about my patch was the name "minusset". I would perhaps rename that to "difference". Otherwise it seemed simple and clean to me.

On Tue, Feb 23, 2016, 12:54 Durham Goode <durham@fb.com<mailto:durham@fb.com>> wrote:
Your patch routes all minus revsets through the and-not operator optimizer, which seemed inelegant and prone to future maintenance mistakes.

From: Martin von Zweigbergk <martinvonz@google.com<mailto:martinvonz@google.com>>

Date: Tuesday, February 23, 2016 at 12:45 PM
To: Durham Goode <durham@fb.com<mailto:durham@fb.com>>
Cc: "mercurial-devel@mercurial-scm.org<mailto:mercurial-devel@mercurial-scm.org>" <mercurial-devel@mercurial-scm.org<mailto:mercurial-devel@mercurial-scm.org>>
Subject: Re: [PATCH 1 of 2 V2] revset: use smartset minus operator


I think my patch also took care of all the improvements these two patches address. Are there any cases where my patch would be worse?

On Tue, Feb 23, 2016, 12:30 Durham Goode <durham@fb.com<mailto:durham@fb.com>> wrote:





On 2/19/16, 11:39 PM, "Martin von Zweigbergk" <martinvonz@google.com<mailto:martinvonz@google.com>> wrote:

>On Thu, Feb 18, 2016 at 11:38 AM, Durham Goode <durham@fb.com<mailto:durham@fb.com>> wrote:

>> # HG changeset patch

>> # User Durham Goode <durham@fb.com<mailto:durham@fb.com>>

>> # Date 1455824115 28800

>> #      Thu Feb 18 11:35:15 2016 -0800

>> # Node ID dc8d72c3df7dfcd7b67fa71d679cc7c3ad90f25c

>> # Parent  a036e1ae1fbe88ab99cb861ebfc2e4da7a3912ca

>> revset: use smartset minus operator

>>

>> Previously, revsets like 'X - Y' were translated to be 'X and not Y'. This can

>> be expensive, since if Y is a single commit then 'not Y' becomes a huge set and

>> sometimes the query optimizer doesn't account for it well.

>>

>> This patch changes revsets to use the built in smartset minus operator, which is

>> often smarter than 'X and not Y'.

>

>How does this patch compare to my patch in

>https://urldefense.proofpoint.com/v2/url?u=http-3A__thread.gmane.org_gmane.comp.version-2Dcontrol.mercurial.devel_89348_focus-3D89391-3F&d=CwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=nuarHzhP1wi1T9iURRCj1A&m=Ic3AKLl02HHMU_yM5NlmBLJc3oZ8MUkGiPj-TnPQQiI&s=NoBhM7NiNWrsbHqoCgMbe-t_MuPSdYkIqB5W2go53mY&e=

>I did not include the test case changes there, but it only affects the

>first one among the ones affected by this patch, so there is no need

>to revert that behavior in a second patch like here. It also makes

>"(20000::) and not (20000)" faster (which these two patches do not).

>What are the disadvantages?


We could add your patch as well, but it seems like it would have to be a separate patch anyway (one adds minus and avoids the minus->and-not translation, then another patch takes advantage of it for and not).  I did not add the optimization to 'and' because none of the benchmark tests showed any perf improvement.  I suppose "(20000::) and not (20000)" might show an improvement, but that probably doesn't stop these from going.
Yuya Nishihara - Feb. 24, 2016, 3:21 p.m.
On Tue, 23 Feb 2016 21:54:53 +0000, Durham Goode wrote:
> Taking a minus operator, then converting it to "and-not", then converting it
> back to minus just seems strange.  I can see people screwing up minus
> operator perf while trying to improve 'and' performance and not even
> realizing it.  Seems cleaner to just let minus be minus and be done with it.

I prefer Martin's patch. A minus operator is just a shorthand for "and not". If
we have a fast path for "x and not y", it should be optimized as well.

Optimizing only "op == 'minus'" case would imply that it cannot be applied to
a general "and not" form.
Durham Goode - Feb. 24, 2016, 6:28 p.m.
On 2/24/16, 7:21 AM, "Yuya Nishihara" <youjah@gmail.com on behalf of yuya@tcha.org> wrote:



>On Tue, 23 Feb 2016 21:54:53 +0000, Durham Goode wrote:

>> Taking a minus operator, then converting it to "and-not", then converting it

>> back to minus just seems strange.  I can see people screwing up minus

>> operator perf while trying to improve 'and' performance and not even

>> realizing it.  Seems cleaner to just let minus be minus and be done with it.

>

>I prefer Martin's patch. A minus operator is just a shorthand for "and not". If

>we have a fast path for "x and not y", it should be optimized as well.

>

>Optimizing only "op == 'minus'" case would imply that it cannot be applied to

>a general "and not" form.


Alright, this patch isn't worth this much discussion.  I'll resend with Martin's version.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -436,6 +436,9 @@  def dagrange(repo, subset, x, y):
 def andset(repo, subset, x, y):
     return getset(repo, getset(repo, subset, x), y)
 
+def minusset(repo, subset, x, y):
+    return getset(repo, subset, x) - getset(repo, subset, y)
+
 def orset(repo, subset, *xs):
     assert xs
     if len(xs) == 1:
@@ -2143,6 +2146,7 @@  methods = {
     "string": stringset,
     "symbol": stringset,
     "and": andset,
+    "minus": minusset,
     "or": orset,
     "not": notset,
     "list": listset,
@@ -2163,7 +2167,10 @@  def optimize(x, small):
 
     op = x[0]
     if op == 'minus':
-        return optimize(('and', x[1], ('not', x[2])), small)
+        wa, ta = optimize(x[1], True)
+        wb, tb = optimize(x[2], True)
+
+        return wa, (op, ta, tb)
     elif op == 'only':
         return optimize(('func', ('symbol', 'only'),
                          ('list', x[1], x[2])), small)
diff --git a/tests/test-revset.t b/tests/test-revset.t
--- a/tests/test-revset.t
+++ b/tests/test-revset.t
@@ -693,12 +693,11 @@  Test opreand of '%' is optimized recursi
   * optimized:
   (func
     ('symbol', 'only')
-    (and
+    (minus
       (range
         ('symbol', '8')
         ('symbol', '9'))
-      (not
-        ('symbol', '8'))))
+      ('symbol', '8')))
   * set:
   <baseset+ [8, 9]>
   8
@@ -1249,13 +1248,16 @@  check that conversion to only works
     (dagrangepre
       ('symbol', '1')))
   * optimized:
-  (func
-    ('symbol', 'only')
-    (list
-      ('symbol', '3')
+  (minus
+    (func
+      ('symbol', 'ancestors')
+      ('symbol', '3'))
+    (func
+      ('symbol', 'ancestors')
       ('symbol', '1')))
   * set:
-  <baseset+ [3]>
+  <filteredset
+    <generatorset+>>
   3
   $ try --optimize 'ancestors(1) - ancestors(3)'
   (minus
@@ -1266,13 +1268,16 @@  check that conversion to only works
       ('symbol', 'ancestors')
       ('symbol', '3')))
   * optimized:
-  (func
-    ('symbol', 'only')
-    (list
-      ('symbol', '1')
+  (minus
+    (func
+      ('symbol', 'ancestors')
+      ('symbol', '1'))
+    (func
+      ('symbol', 'ancestors')
       ('symbol', '3')))
   * set:
-  <baseset+ []>
+  <filteredset
+    <generatorset+>>
   $ try --optimize 'not ::2 and ::6'
   (and
     (not