Patchwork [V3] revset: use smartset minus operator

login
register
mail settings
Submitter Durham Goode
Date Feb. 24, 2016, 6:48 p.m.
Message ID <36d63242b3ccc2322ad0.1456339698@dev8486.prn1.facebook.com>
Download mbox | patch
Permalink /patch/13364/
State Accepted
Headers show

Comments

Durham Goode - Feb. 24, 2016, 6:48 p.m.
# HG changeset patch
# User Durham Goode <durham@fb.com>
# Date 1456339275 28800
#      Wed Feb 24 10:41:15 2016 -0800
# Node ID 36d63242b3ccc2322ad09dd778374de275323626
# Parent  91a827e760df9d9b3d86692c5aa195a3d6ba2208
revset: use smartset minus operator

Previously, revsets like 'X - Y' were translated to be 'X and not Y'. This can
be expensive, since if Y is a single commit then 'not Y' becomes a huge set and
sometimes the query optimizer doesn't account for it well.

This patch changes revsets to use the built in smartset minus operator, which is
often smarter than 'X and not Y'.

On a large repo this saves 2.2 seconds on rebase and histedit because "X:: - X"
becomes almost instant.

Relevant performance numbers from revsetbenchmark.py

revset #13: roots((tip~100::) - (tip~100::tip))
   plain         min           max           first         last          reverse       rev..rst      rev..ast      sort          sor..rst      sor..ast
0) 0.001080      0.001107      0.001102      0.001118      0.001121      0.001114      0.001141      0.001123      0.001099      0.001123      0.001137
1) 0.000708  65% 0.000738  66% 0.000735  66% 0.000739  66% 0.000784  69% 0.000780  70% 0.000807  70% 0.000756  67% 0.000727  66% 0.000759  67% 0.000808  71%

revset #14: roots((0::) - (0::tip))
   plain         min           max           first         last          reverse       rev..rst      rev..ast      sort          sor..rst      sor..ast
0) 0.131304      0.079168      0.133129      0.076560      0.048179      0.133349      0.049153      0.077097      0.129689      0.076212      0.048543
1) 0.065066  49% 0.036941  46% 0.066063  49% 0.034755  45% 0.048558      0.071091  53% 0.047679      0.034984  45% 0.064572  49% 0.035680  46% 0.048508

revset #22: (not public() - obsolete())
   plain         min           max           first         last          reverse       rev..rst      rev..ast      sort          sor..rst      sor..ast
0) 0.000139      0.000133      0.000133      0.000138      0.000134      0.000155      0.000157      0.000152      0.000157      0.000156      0.000153
1) 0.000108  77% 0.000129      0.000129      0.000134      0.000132      0.000127  81% 0.000151      0.000147      0.000127  80% 0.000152      0.000149

revset #25: (20000::) - (20000)
   plain         min           max           first         last          reverse       rev..rst      rev..ast      sort          sor..rst      sor..ast
0) 0.050560      0.045513      0.022593      0.043588      0.021909      0.045517      0.021822      0.044660      0.049740      0.044227      0.021819
1) 0.018614  36% 0.000171   0% 0.019659  87% 0.000168   0% 0.015543  70% 0.021069  46% 0.015623  71% 0.000180   0% 0.018658  37% 0.000186   0% 0.015750  72%
Martin von Zweigbergk - Feb. 24, 2016, 6:57 p.m.
Pushed to the clowncopter, thanks!

On Wed, Feb 24, 2016 at 10:50 AM Durham Goode <durham@fb.com> wrote:

> # HG changeset patch
> # User Durham Goode <durham@fb.com>
> # Date 1456339275 28800
> #      Wed Feb 24 10:41:15 2016 -0800
> # Node ID 36d63242b3ccc2322ad09dd778374de275323626
> # Parent  91a827e760df9d9b3d86692c5aa195a3d6ba2208
> revset: use smartset minus operator
>
> Previously, revsets like 'X - Y' were translated to be 'X and not Y'. This
> can
> be expensive, since if Y is a single commit then 'not Y' becomes a huge
> set and
> sometimes the query optimizer doesn't account for it well.
>
> This patch changes revsets to use the built in smartset minus operator,
> which is
> often smarter than 'X and not Y'.
>
> On a large repo this saves 2.2 seconds on rebase and histedit because "X::
> - X"
> becomes almost instant.
>
> Relevant performance numbers from revsetbenchmark.py
>
> revset #13: roots((tip~100::) - (tip~100::tip))
>    plain         min           max           first         last
> reverse       rev..rst      rev..ast      sort          sor..rst
> sor..ast
> 0) 0.001080      0.001107      0.001102      0.001118      0.001121
> 0.001114      0.001141      0.001123      0.001099      0.001123
> 0.001137
> 1) 0.000708  65% 0.000738  66% 0.000735  66% 0.000739  66% 0.000784  69%
> 0.000780  70% 0.000807  70% 0.000756  67% 0.000727  66% 0.000759  67%
> 0.000808  71%
>
> revset #14: roots((0::) - (0::tip))
>    plain         min           max           first         last
> reverse       rev..rst      rev..ast      sort          sor..rst
> sor..ast
> 0) 0.131304      0.079168      0.133129      0.076560      0.048179
> 0.133349      0.049153      0.077097      0.129689      0.076212
> 0.048543
> 1) 0.065066  49% 0.036941  46% 0.066063  49% 0.034755  45% 0.048558
> 0.071091  53% 0.047679      0.034984  45% 0.064572  49% 0.035680  46%
> 0.048508
>
> revset #22: (not public() - obsolete())
>    plain         min           max           first         last
> reverse       rev..rst      rev..ast      sort          sor..rst
> sor..ast
> 0) 0.000139      0.000133      0.000133      0.000138      0.000134
> 0.000155      0.000157      0.000152      0.000157      0.000156
> 0.000153
> 1) 0.000108  77% 0.000129      0.000129      0.000134      0.000132
> 0.000127  81% 0.000151      0.000147      0.000127  80% 0.000152
> 0.000149
>
> revset #25: (20000::) - (20000)
>    plain         min           max           first         last
> reverse       rev..rst      rev..ast      sort          sor..rst
> sor..ast
> 0) 0.050560      0.045513      0.022593      0.043588      0.021909
> 0.045517      0.021822      0.044660      0.049740      0.044227
> 0.021819
> 1) 0.018614  36% 0.000171   0% 0.019659  87% 0.000168   0% 0.015543  70%
> 0.021069  46% 0.015623  71% 0.000180   0% 0.018658  37% 0.000186   0%
> 0.015750  72%
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -436,6 +436,9 @@ def dagrange(repo, subset, x, y):
>  def andset(repo, subset, x, y):
>      return getset(repo, getset(repo, subset, x), y)
>
> +def differenceset(repo, subset, x, y):
> +    return getset(repo, subset, x) - getset(repo, subset, y)
> +
>  def orset(repo, subset, *xs):
>      assert xs
>      if len(xs) == 1:
> @@ -2145,6 +2148,7 @@ methods = {
>      "and": andset,
>      "or": orset,
>      "not": notset,
> +    "difference": differenceset,
>      "list": listset,
>      "keyvalue": keyvaluepair,
>      "func": func,
> @@ -2205,6 +2209,9 @@ def optimize(x, small):
>          if isonly(tb, ta):
>              return w, ('func', ('symbol', 'only'), ('list', tb[2],
> ta[1][2]))
>
> +        if tb is not None and tb[0] == 'not':
> +            return wa, ('difference', ta, tb[1])
> +
>          if wa > wb:
>              return w, (op, tb, ta)
>          return w, (op, ta, tb)
> diff --git a/tests/test-revset.t b/tests/test-revset.t
> --- a/tests/test-revset.t
> +++ b/tests/test-revset.t
> @@ -693,12 +693,11 @@ Test opreand of '%' is optimized recursi
>    * optimized:
>    (func
>      ('symbol', 'only')
> -    (and
> +    (difference
>        (range
>          ('symbol', '8')
>          ('symbol', '9'))
> -      (not
> -        ('symbol', '8'))))
> +      ('symbol', '8')))
>    * set:
>    <baseset+ [8, 9]>
>    8
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -436,6 +436,9 @@  def dagrange(repo, subset, x, y):
 def andset(repo, subset, x, y):
     return getset(repo, getset(repo, subset, x), y)
 
+def differenceset(repo, subset, x, y):
+    return getset(repo, subset, x) - getset(repo, subset, y)
+
 def orset(repo, subset, *xs):
     assert xs
     if len(xs) == 1:
@@ -2145,6 +2148,7 @@  methods = {
     "and": andset,
     "or": orset,
     "not": notset,
+    "difference": differenceset,
     "list": listset,
     "keyvalue": keyvaluepair,
     "func": func,
@@ -2205,6 +2209,9 @@  def optimize(x, small):
         if isonly(tb, ta):
             return w, ('func', ('symbol', 'only'), ('list', tb[2], ta[1][2]))
 
+        if tb is not None and tb[0] == 'not':
+            return wa, ('difference', ta, tb[1])
+
         if wa > wb:
             return w, (op, tb, ta)
         return w, (op, ta, tb)
diff --git a/tests/test-revset.t b/tests/test-revset.t
--- a/tests/test-revset.t
+++ b/tests/test-revset.t
@@ -693,12 +693,11 @@  Test opreand of '%' is optimized recursi
   * optimized:
   (func
     ('symbol', 'only')
-    (and
+    (difference
       (range
         ('symbol', '8')
         ('symbol', '9'))
-      (not
-        ('symbol', '8'))))
+      ('symbol', '8')))
   * set:
   <baseset+ [8, 9]>
   8