Patchwork [7,of,7,V2] revset: optimize out sort() to noop function according to ordering policy

login
register
mail settings
Submitter Yuya Nishihara
Date June 3, 2016, 3:02 p.m.
Message ID <c4c434b91960e089a4fd.1464966150@mimosa>
Download mbox | patch
Permalink /patch/15378/
State Changes Requested
Delegated to: Augie Fackler
Headers show

Comments

Yuya Nishihara - June 3, 2016, 3:02 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1462250172 -32400
#      Tue May 03 13:36:12 2016 +0900
# Node ID c4c434b91960e089a4fd28fae019280f49e4d2ab
# Parent  57640fa6414ce96c8e4946a5f4862ce2688f9571
revset: optimize out sort() to noop function according to ordering policy

See the previous patch for why. Unlike reverse(), we need a noop function
to validate arguments.
via Mercurial-devel - June 10, 2016, 5 p.m.
On Fri, Jun 3, 2016 at 8:02 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1462250172 -32400
> #      Tue May 03 13:36:12 2016 +0900
> # Node ID c4c434b91960e089a4fd28fae019280f49e4d2ab
> # Parent  57640fa6414ce96c8e4946a5f4862ce2688f9571
> revset: optimize out sort() to noop function according to ordering policy
>
> See the previous patch for why. Unlike reverse(), we need a noop function
> to validate arguments.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -1896,6 +1896,12 @@ def sort(repo, subset, x):
>              raise error.ParseError(_("unknown sort key %r") % fk)
>      return baseset([c.rev() for c in ctxs])
>
> +@predicate('_nosort(set[, [-]key...])', safe=True)
> +def _nosort(repo, subset, x):
> +    sort(repo, baseset(), x)  # validate arguments

sort() ends with "return baseset([c.rev() for c in ctxs])". Does that
evaluate the set? I know there's some laziness in revsets, but that
looks suspicious to me. Just checking; you probably know this code
better.

> +    args = getargsdict(x, 'sort', 'set keys')
> +    return getset(repo, subset, args['set'])
> +
>  @predicate('subrepo([pattern])')
>  def subrepo(repo, subset, x):
>      """Changesets that add, modify or remove the given subrepo.  If no subrepo
> @@ -2092,6 +2098,7 @@ methods = {
>  # functions that do 'return getset(repo, subset, x)'.
>  _noopsymbols = set([
>      'present',
> +    '_nosort',
>  ])
>
>  # constants for ordering policy, used in _optimize():
> @@ -2260,6 +2267,8 @@ def _optimize(x, small, order):
>          wa, ta = _optimize(x[2], small, d)
>          if f == 'reverse' and order != _defineorder:
>              return wa, ta
> +        if f == 'sort' and order != _defineorder:
> +            return wa, ('func', ('symbol', '_nosort'), ta)
>          if f in ("author branch closed date desc file grep keyword "
>                   "outgoing user"):
>              w = 10 # slow
> diff --git a/tests/test-revset.t b/tests/test-revset.t
> --- a/tests/test-revset.t
> +++ b/tests/test-revset.t
> @@ -1191,8 +1191,7 @@ Test order of revisions in compound expr
>    (see hg help "revsets.x or y")
>    [255]
>
> - BROKEN: sort() will need different workaround since it modifies the flag
> - of input set:
> + sort() should be optimized out if order doesn't matter:
>
>    $ try --optimize '0:2 & sort(all(), -rev)'
>    (and
> @@ -1214,23 +1213,28 @@ Test order of revisions in compound expr
>        ('symbol', '2')
>        ('string', 'define'))
>      (func
> -      ('symbol', '_reorder')
> -      (func
> -        ('symbol', 'sort')
> -        (list
> -          (func
> -            ('symbol', 'all')
> -            None)
> -          ('string', '-rev')))))
> +      ('symbol', '_nosort')
> +      (list
> +        (func
> +          ('symbol', 'all')
> +          None)
> +        ('string', '-rev'))))
>    * set:
>    <filteredset
> -    <spanset- 0:2>,
> -    <filteredset
> -      <spanset- 0:2>,
> -      <spanset+ 0:9>>>
> +    <spanset+ 0:2>,
> +    <spanset+ 0:9>>
> +  0
> +  1
>    2
> -  1
> -  0
> +
> + invalid argument for sort() to be optimized out:
> +
> +  $ log '0:2 & sort()'
> +  hg: parse error: sort requires one or two arguments
> +  [255]
> +  $ log '0:2 & sort(all(), -invalid)'
> +  hg: parse error: unknown sort key '-invalid'
> +  [255]
>
>   for 'A & f(B)', 'B' should not be affected by the order of 'A':
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - June 11, 2016, 3:59 a.m.
On Fri, 10 Jun 2016 10:00:37 -0700, Martin von Zweigbergk wrote:
> On Fri, Jun 3, 2016 at 8:02 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya@tcha.org>
> > # Date 1462250172 -32400
> > #      Tue May 03 13:36:12 2016 +0900
> > # Node ID c4c434b91960e089a4fd28fae019280f49e4d2ab
> > # Parent  57640fa6414ce96c8e4946a5f4862ce2688f9571
> > revset: optimize out sort() to noop function according to ordering policy
> >
> > See the previous patch for why. Unlike reverse(), we need a noop function
> > to validate arguments.
> >
> > diff --git a/mercurial/revset.py b/mercurial/revset.py
> > --- a/mercurial/revset.py
> > +++ b/mercurial/revset.py
> > @@ -1896,6 +1896,12 @@ def sort(repo, subset, x):
> >              raise error.ParseError(_("unknown sort key %r") % fk)
> >      return baseset([c.rev() for c in ctxs])
> >
> > +@predicate('_nosort(set[, [-]key...])', safe=True)
> > +def _nosort(repo, subset, x):
> > +    sort(repo, baseset(), x)  # validate arguments  
> 
> sort() ends with "return baseset([c.rev() for c in ctxs])". Does that
> evaluate the set? I know there's some laziness in revsets, but that
> looks suspicious to me. Just checking; you probably know this code
> better.

Good point. The line "baseset([c.rev() for c in ctxs])" should have no
problem, but there is another problem. _nosort() evaluates the given set
against an empty set "baseset()", assuming that its cost is nearly zero.
That isn't always true.

I have to factor out the validation code from sort(), and in this way, I can
get rid of _nosort(). I'll send revised patches in V3.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -1896,6 +1896,12 @@  def sort(repo, subset, x):
             raise error.ParseError(_("unknown sort key %r") % fk)
     return baseset([c.rev() for c in ctxs])
 
+@predicate('_nosort(set[, [-]key...])', safe=True)
+def _nosort(repo, subset, x):
+    sort(repo, baseset(), x)  # validate arguments
+    args = getargsdict(x, 'sort', 'set keys')
+    return getset(repo, subset, args['set'])
+
 @predicate('subrepo([pattern])')
 def subrepo(repo, subset, x):
     """Changesets that add, modify or remove the given subrepo.  If no subrepo
@@ -2092,6 +2098,7 @@  methods = {
 # functions that do 'return getset(repo, subset, x)'.
 _noopsymbols = set([
     'present',
+    '_nosort',
 ])
 
 # constants for ordering policy, used in _optimize():
@@ -2260,6 +2267,8 @@  def _optimize(x, small, order):
         wa, ta = _optimize(x[2], small, d)
         if f == 'reverse' and order != _defineorder:
             return wa, ta
+        if f == 'sort' and order != _defineorder:
+            return wa, ('func', ('symbol', '_nosort'), ta)
         if f in ("author branch closed date desc file grep keyword "
                  "outgoing user"):
             w = 10 # slow
diff --git a/tests/test-revset.t b/tests/test-revset.t
--- a/tests/test-revset.t
+++ b/tests/test-revset.t
@@ -1191,8 +1191,7 @@  Test order of revisions in compound expr
   (see hg help "revsets.x or y")
   [255]
 
- BROKEN: sort() will need different workaround since it modifies the flag
- of input set:
+ sort() should be optimized out if order doesn't matter:
 
   $ try --optimize '0:2 & sort(all(), -rev)'
   (and
@@ -1214,23 +1213,28 @@  Test order of revisions in compound expr
       ('symbol', '2')
       ('string', 'define'))
     (func
-      ('symbol', '_reorder')
-      (func
-        ('symbol', 'sort')
-        (list
-          (func
-            ('symbol', 'all')
-            None)
-          ('string', '-rev')))))
+      ('symbol', '_nosort')
+      (list
+        (func
+          ('symbol', 'all')
+          None)
+        ('string', '-rev'))))
   * set:
   <filteredset
-    <spanset- 0:2>,
-    <filteredset
-      <spanset- 0:2>,
-      <spanset+ 0:9>>>
+    <spanset+ 0:2>,
+    <spanset+ 0:9>>
+  0
+  1
   2
-  1
-  0
+
+ invalid argument for sort() to be optimized out:
+
+  $ log '0:2 & sort()'
+  hg: parse error: sort requires one or two arguments
+  [255]
+  $ log '0:2 & sort(all(), -invalid)'
+  hg: parse error: unknown sort key '-invalid'
+  [255]
 
  for 'A & f(B)', 'B' should not be affected by the order of 'A':