Patchwork [1,of,6,V3] revset: teach optimize if current expression should define/follow order

login
register
mail settings
Submitter Yuya Nishihara
Date June 17, 2016, 2:45 p.m.
Message ID <defa6a51c4613a4294db.1466174727@mimosa>
Download mbox | patch
Permalink /patch/15534/
State Superseded
Delegated to: Martin von Zweigbergk
Headers show

Comments

Yuya Nishihara - June 17, 2016, 2:45 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1455627736 -32400
#      Tue Feb 16 22:02:16 2016 +0900
# Node ID defa6a51c4613a4294db3a16a3dea7b5cb6025a0
# Parent  d269e7db2f55f812c9ee1efe5626602614174df3
revset: teach optimize if current expression should define/follow order

New flag 'order' will be used to fix a couple of ordering bugs, such as:

 a) 'x & (y | z)' disregards the order of 'x' (issue 5100)
 b) 'x & y:z' is listed from 'y' to 'z'
 c) 'x & y' can be rewritten as 'y & x' if weight(x) > weight(y)

(c) is a hidden bug of optimize(), which could be exposed if we've fixed
(a) and (b) in a different way. (a) and (b) are not bugs of optimize(), but
they can be addressed well in optimize() since optimize() knows the context.

'order' is tri-state. It starts with 'define', and shifts to 'follow' by
'x & y'. It changes back to 'define' on function call 'f(x)' or function-like
operation 'x (f) y' because 'f' may have its own ordering requirement for 'x'
and 'y'. The state 'any' will allow us to avoid extra cost that would be
necessary to constrain ordering where it isn't important, 'not x'. For details,
see also the tests added by the next patch.
via Mercurial-devel - June 21, 2016, 5:47 p.m.
On Fri, Jun 17, 2016 at 7:45 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1455627736 -32400
> #      Tue Feb 16 22:02:16 2016 +0900
> # Node ID defa6a51c4613a4294db3a16a3dea7b5cb6025a0
> # Parent  d269e7db2f55f812c9ee1efe5626602614174df3
> revset: teach optimize if current expression should define/follow order
>
> New flag 'order' will be used to fix a couple of ordering bugs, such as:
>
>  a) 'x & (y | z)' disregards the order of 'x' (issue 5100)
>  b) 'x & y:z' is listed from 'y' to 'z'
>  c) 'x & y' can be rewritten as 'y & x' if weight(x) > weight(y)
>
> (c) is a hidden bug of optimize(), which could be exposed if we've fixed
> (a) and (b) in a different way. (a) and (b) are not bugs of optimize(), but
> they can be addressed well in optimize() since optimize() knows the context.

It would be good to say what (a) and (b) *are* bugs of. I think (a) is
a bug in orset(). It seems easy to fix the bug there. Could you
explain why you didn't do that? Is it not actually easy (maybe I'm
mistaken), or is it too slow, or something else? (b) and (c) look very
similar. Is (b) a bug in rangeset()?

Can we also have the tests from patch 2/6 by themselves before the
other changes? That would make it clearer what is what each new patch
fixes.

>
> 'order' is tri-state. It starts with 'define', and shifts to 'follow' by
> 'x & y'. It changes back to 'define' on function call 'f(x)' or function-like
> operation 'x (f) y' because 'f' may have its own ordering requirement for 'x'
> and 'y'. The state 'any' will allow us to avoid extra cost that would be
> necessary to constrain ordering where it isn't important, 'not x'. For details,
> see also the tests added by the next patch.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2302,6 +2302,46 @@ methods = {
>      "parentpost": p1,
>  }
>
> +# Constants for ordering requirement, used in _optimize():
> +#
> +# If 'define', any future functions and operations can change the ordering of
> +# the entries in the set. If 'follow', any future functions and operations
> +# should take the ordering specified by the first operand to the '&' operator.

This was quite confusing to me. I think it was mostly the "future"
that tricked me. I saw it as future in terms of evaluation order of
the expression, but it's in terms of the call graph of optimize() or
something, right? Maybe "nested" or "deeper" instead?

> +#
> +# For instance,
> +#
> +#   X & (Y | Z)
> +#   ^   ^^^^^^^
> +#   |   follow
> +#   define
> +#
> +# will be evaluated as 'or(y(x()), z(x()))', where 'x()' can change the order
> +# of the entries in the set, but 'y()', 'z()' and 'or()' shouldn't.
> +#
> +# 'any' means the order doesn't matter. For 'not X', 'x()' may change the
> +# order of the entries in the initial set. So for 'not (x | y)', we don't
> +# have to fix the result of 'or()' to enforce the ordering requirement.
> +# Also, 'not sort(x)' can be optimized to 'not x'.
> +#
> +# Transition of ordering requirement:
> +#
> +# 1. starts with 'define'
> +# 2. shifts to 'follow' by 'x & y'
> +# 3. changes back to 'define' on function call 'f(x)' or function-like
> +#    operation 'x (f) y' because 'f' may have its own ordering requirement
> +#    for 'x' and 'y' (e.g. 'first(x)')
> +#
> +_anyorder = 'any'        # don't care the order
> +_defineorder = 'define'  # should define the order
> +_followorder = 'follow'  # must follow the current order
> +
> +# transition table for 'x & y', from the current expression 'x' to 'y'
> +_tofolloworder = {
> +    _anyorder: _anyorder,
> +    _defineorder: _followorder,
> +    _followorder: _followorder,
> +}
> +
>  def _matchonly(revs, bases):
>      """
>      >>> f = lambda *args: _matchonly(*map(parse, args))
> @@ -2317,7 +2357,11 @@ def _matchonly(revs, bases):
>          and getstring(bases[1][1], _('not a symbol')) == 'ancestors'):
>          return ('list', revs[2], bases[1][2])
>
> -def _optimize(x, small):
> +def _optimize(x, small, order):
> +    """
> +    'order' specifies how the current expression 'x' is ordered (see the
> +    constants defined above)
> +    """
>      if x is None:
>          return 0, x
>
> @@ -2327,30 +2371,32 @@ def _optimize(x, small):
>
>      op = x[0]
>      if op == 'minus':
> -        return _optimize(('and', x[1], ('not', x[2])), small)
> +        return _optimize(('and', x[1], ('not', x[2])), small, order)
>      elif op == 'only':
>          t = ('func', ('symbol', 'only'), ('list', x[1], x[2]))
> -        return _optimize(t, small)
> +        return _optimize(t, small, order)
>      elif op == 'onlypost':
> -        return _optimize(('func', ('symbol', 'only'), x[1]), small)
> +        return _optimize(('func', ('symbol', 'only'), x[1]), small, order)
>      elif op == 'dagrangepre':
> -        return _optimize(('func', ('symbol', 'ancestors'), x[1]), small)
> +        return _optimize(('func', ('symbol', 'ancestors'), x[1]), small, order)
>      elif op == 'dagrangepost':
> -        return _optimize(('func', ('symbol', 'descendants'), x[1]), small)
> +        t = ('func', ('symbol', 'descendants'), x[1])
> +        return _optimize(t, small, order)
>      elif op == 'rangeall':
> -        return _optimize(('range', ('string', '0'), ('string', 'tip')), small)
> +        t = ('range', ('string', '0'), ('string', 'tip'))
> +        return _optimize(t, small, order)
>      elif op == 'rangepre':
> -        return _optimize(('range', ('string', '0'), x[1]), small)
> +        return _optimize(('range', ('string', '0'), x[1]), small, order)
>      elif op == 'rangepost':
> -        return _optimize(('range', x[1], ('string', 'tip')), small)
> +        return _optimize(('range', x[1], ('string', 'tip')), small, order)
>      elif op == 'negate':
>          s = getstring(x[1], _("can't negate that"))
> -        return _optimize(('string', '-' + s), small)
> +        return _optimize(('string', '-' + s), small, order)
>      elif op in 'string symbol negate':
>          return smallbonus, x # single revisions are small
>      elif op == 'and':
> -        wa, ta = _optimize(x[1], True)
> -        wb, tb = _optimize(x[2], True)
> +        wa, ta = _optimize(x[1], True, order)
> +        wb, tb = _optimize(x[2], True, _tofolloworder[order])
>          w = min(wa, wb)
>
>          # (::x and not ::y)/(not ::y and ::x) have a fast path
> @@ -2376,12 +2422,12 @@ def _optimize(x, small):
>              else:
>                  s = '\0'.join(t[1] for w, t in ss)
>                  y = ('func', ('symbol', '_list'), ('string', s))
> -                w, t = _optimize(y, False)
> +                w, t = _optimize(y, False, order)
>              ws.append(w)
>              ts.append(t)
>              del ss[:]
>          for y in x[1:]:
> -            w, t = _optimize(y, False)
> +            w, t = _optimize(y, False, order)
>              if t is not None and (t[0] == 'string' or t[0] == 'symbol'):
>                  ss.append((w, t))
>                  continue
> @@ -2399,34 +2445,34 @@ def _optimize(x, small):
>          # Optimize not public() to _notpublic() because we have a fast version
>          if x[1] == ('func', ('symbol', 'public'), None):
>              newsym = ('func', ('symbol', '_notpublic'), None)
> -            o = _optimize(newsym, not small)
> +            o = _optimize(newsym, not small, order)
>              return o[0], o[1]
>          else:
> -            o = _optimize(x[1], not small)
> +            o = _optimize(x[1], not small, _anyorder)
>              return o[0], (op, o[1])
>      elif op == 'parentpost':
> -        o = _optimize(x[1], small)
> +        o = _optimize(x[1], small, _defineorder)
>          return o[0], (op, o[1])
>      elif op == 'group':
> -        return _optimize(x[1], small)
> +        return _optimize(x[1], small, order)
>      elif op in 'dagrange range parent ancestorspec':
>          if op == 'parent':
>              # x^:y means (x^) : y, not x ^ (:y)
>              post = ('parentpost', x[1])
>              if x[2][0] == 'dagrangepre':
> -                return _optimize(('dagrange', post, x[2][1]), small)
> +                return _optimize(('dagrange', post, x[2][1]), small, order)
>              elif x[2][0] == 'rangepre':
> -                return _optimize(('range', post, x[2][1]), small)
> -
> -        wa, ta = _optimize(x[1], small)
> -        wb, tb = _optimize(x[2], small)
> +                return _optimize(('range', post, x[2][1]), small, order)
> +
> +        wa, ta = _optimize(x[1], small, _defineorder)
> +        wb, tb = _optimize(x[2], small, _defineorder)
>          return wa + wb, (op, ta, tb)
>      elif op == 'list':
> -        ws, ts = zip(*(_optimize(y, small) for y in x[1:]))
> +        ws, ts = zip(*(_optimize(y, small, order) for y in x[1:]))
>          return sum(ws), (op,) + ts
>      elif op == 'func':
>          f = getstring(x[1], _("not a symbol"))
> -        wa, ta = _optimize(x[2], small)
> +        wa, ta = _optimize(x[2], small, _defineorder)
>          if f in ("author branch closed date desc file grep keyword "
>                   "outgoing user"):
>              w = 10 # slow
> @@ -2446,7 +2492,7 @@ def _optimize(x, small):
>      return 1, x
>
>  def optimize(tree):
> -    _weight, newtree = _optimize(tree, small=True)
> +    _weight, newtree = _optimize(tree, small=True, order=_defineorder)
>      return newtree
>
>  # the set of valid characters for the initial letter of symbols in
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - June 22, 2016, 3:40 p.m.
On Tue, 21 Jun 2016 10:47:46 -0700, Martin von Zweigbergk wrote:
> On Fri, Jun 17, 2016 at 7:45 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya@tcha.org>
> > # Date 1455627736 -32400
> > #      Tue Feb 16 22:02:16 2016 +0900
> > # Node ID defa6a51c4613a4294db3a16a3dea7b5cb6025a0
> > # Parent  d269e7db2f55f812c9ee1efe5626602614174df3
> > revset: teach optimize if current expression should define/follow order
> >
> > New flag 'order' will be used to fix a couple of ordering bugs, such as:
> >
> >  a) 'x & (y | z)' disregards the order of 'x' (issue 5100)
> >  b) 'x & y:z' is listed from 'y' to 'z'
> >  c) 'x & y' can be rewritten as 'y & x' if weight(x) > weight(y)
> >
> > (c) is a hidden bug of optimize(), which could be exposed if we've fixed
> > (a) and (b) in a different way. (a) and (b) are not bugs of optimize(), but
> > they can be addressed well in optimize() since optimize() knows the context.  
> 
> It would be good to say what (a) and (b) *are* bugs of.

(a) and (b) are bugs of the revset core. Before this change, there was no
hint if a revset function should enforce its ordering, or take the ordering
specified by the subset argument. These bugs could be addressed by overriding
__and__() of the initial set to take the ordering of the other set:

    class initialset(fullreposet):
        def __and__(self, other):
            # allow other to enforce its ordering
            return other & self

but it would expose (c), which is a hidden bug of optimize(). So either ways,
optimize() have to know the current ordering requirement. Otherwise, it
couldn't rebalance expressions by weights reliably, nor tell how a revset
function should order the entries.

That's what this patch does.

> I think (a) is
> a bug in orset(). It seems easy to fix the bug there. Could you
> explain why you didn't do that? Is it not actually easy (maybe I'm
> mistaken), or is it too slow, or something else?

orset() could be something like this:

    def orset(repo, subset, o, *xs):
        if getstring(o) != _followorder:
            return _orset(repo, subset, *xs)  # original orset
        ss = [getset(repo, fullreposet(subset), x) for x in xs]
        return subset.filter(lambda r: any(r in s for s in ss))

but from my experience around b333ca94403d, breadth-first search wasn't
as fast as depth-first search of balanced addset tree. So I didn't take
this route.

Also, the _reorder() insertion can be applied to both orset() and _list-family
functions.

> (b) and (c) look very similar.
> Is (b) a bug in rangeset()?

No. (a) and (b) are bugs of the same category, but (c) isn't.

> Can we also have the tests from patch 2/6 by themselves before the
> other changes? That would make it clearer what is what each new patch
> fixes.

Okay, will do in V4.

> > +# Constants for ordering requirement, used in _optimize():
> > +#
> > +# If 'define', any future functions and operations can change the ordering of
> > +# the entries in the set. If 'follow', any future functions and operations
> > +# should take the ordering specified by the first operand to the '&' operator.  
> 
> This was quite confusing to me. I think it was mostly the "future"
> that tricked me. I saw it as future in terms of evaluation order of
> the expression, but it's in terms of the call graph of optimize() or
> something, right? Maybe "nested" or "deeper" instead?

"nested" sounds better.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2302,6 +2302,46 @@  methods = {
     "parentpost": p1,
 }
 
+# Constants for ordering requirement, used in _optimize():
+#
+# If 'define', any future functions and operations can change the ordering of
+# the entries in the set. If 'follow', any future functions and operations
+# should take the ordering specified by the first operand to the '&' operator.
+#
+# For instance,
+#
+#   X & (Y | Z)
+#   ^   ^^^^^^^
+#   |   follow
+#   define
+#
+# will be evaluated as 'or(y(x()), z(x()))', where 'x()' can change the order
+# of the entries in the set, but 'y()', 'z()' and 'or()' shouldn't.
+#
+# 'any' means the order doesn't matter. For 'not X', 'x()' may change the
+# order of the entries in the initial set. So for 'not (x | y)', we don't
+# have to fix the result of 'or()' to enforce the ordering requirement.
+# Also, 'not sort(x)' can be optimized to 'not x'.
+#
+# Transition of ordering requirement:
+#
+# 1. starts with 'define'
+# 2. shifts to 'follow' by 'x & y'
+# 3. changes back to 'define' on function call 'f(x)' or function-like
+#    operation 'x (f) y' because 'f' may have its own ordering requirement
+#    for 'x' and 'y' (e.g. 'first(x)')
+#
+_anyorder = 'any'        # don't care the order
+_defineorder = 'define'  # should define the order
+_followorder = 'follow'  # must follow the current order
+
+# transition table for 'x & y', from the current expression 'x' to 'y'
+_tofolloworder = {
+    _anyorder: _anyorder,
+    _defineorder: _followorder,
+    _followorder: _followorder,
+}
+
 def _matchonly(revs, bases):
     """
     >>> f = lambda *args: _matchonly(*map(parse, args))
@@ -2317,7 +2357,11 @@  def _matchonly(revs, bases):
         and getstring(bases[1][1], _('not a symbol')) == 'ancestors'):
         return ('list', revs[2], bases[1][2])
 
-def _optimize(x, small):
+def _optimize(x, small, order):
+    """
+    'order' specifies how the current expression 'x' is ordered (see the
+    constants defined above)
+    """
     if x is None:
         return 0, x
 
@@ -2327,30 +2371,32 @@  def _optimize(x, small):
 
     op = x[0]
     if op == 'minus':
-        return _optimize(('and', x[1], ('not', x[2])), small)
+        return _optimize(('and', x[1], ('not', x[2])), small, order)
     elif op == 'only':
         t = ('func', ('symbol', 'only'), ('list', x[1], x[2]))
-        return _optimize(t, small)
+        return _optimize(t, small, order)
     elif op == 'onlypost':
-        return _optimize(('func', ('symbol', 'only'), x[1]), small)
+        return _optimize(('func', ('symbol', 'only'), x[1]), small, order)
     elif op == 'dagrangepre':
-        return _optimize(('func', ('symbol', 'ancestors'), x[1]), small)
+        return _optimize(('func', ('symbol', 'ancestors'), x[1]), small, order)
     elif op == 'dagrangepost':
-        return _optimize(('func', ('symbol', 'descendants'), x[1]), small)
+        t = ('func', ('symbol', 'descendants'), x[1])
+        return _optimize(t, small, order)
     elif op == 'rangeall':
-        return _optimize(('range', ('string', '0'), ('string', 'tip')), small)
+        t = ('range', ('string', '0'), ('string', 'tip'))
+        return _optimize(t, small, order)
     elif op == 'rangepre':
-        return _optimize(('range', ('string', '0'), x[1]), small)
+        return _optimize(('range', ('string', '0'), x[1]), small, order)
     elif op == 'rangepost':
-        return _optimize(('range', x[1], ('string', 'tip')), small)
+        return _optimize(('range', x[1], ('string', 'tip')), small, order)
     elif op == 'negate':
         s = getstring(x[1], _("can't negate that"))
-        return _optimize(('string', '-' + s), small)
+        return _optimize(('string', '-' + s), small, order)
     elif op in 'string symbol negate':
         return smallbonus, x # single revisions are small
     elif op == 'and':
-        wa, ta = _optimize(x[1], True)
-        wb, tb = _optimize(x[2], True)
+        wa, ta = _optimize(x[1], True, order)
+        wb, tb = _optimize(x[2], True, _tofolloworder[order])
         w = min(wa, wb)
 
         # (::x and not ::y)/(not ::y and ::x) have a fast path
@@ -2376,12 +2422,12 @@  def _optimize(x, small):
             else:
                 s = '\0'.join(t[1] for w, t in ss)
                 y = ('func', ('symbol', '_list'), ('string', s))
-                w, t = _optimize(y, False)
+                w, t = _optimize(y, False, order)
             ws.append(w)
             ts.append(t)
             del ss[:]
         for y in x[1:]:
-            w, t = _optimize(y, False)
+            w, t = _optimize(y, False, order)
             if t is not None and (t[0] == 'string' or t[0] == 'symbol'):
                 ss.append((w, t))
                 continue
@@ -2399,34 +2445,34 @@  def _optimize(x, small):
         # Optimize not public() to _notpublic() because we have a fast version
         if x[1] == ('func', ('symbol', 'public'), None):
             newsym = ('func', ('symbol', '_notpublic'), None)
-            o = _optimize(newsym, not small)
+            o = _optimize(newsym, not small, order)
             return o[0], o[1]
         else:
-            o = _optimize(x[1], not small)
+            o = _optimize(x[1], not small, _anyorder)
             return o[0], (op, o[1])
     elif op == 'parentpost':
-        o = _optimize(x[1], small)
+        o = _optimize(x[1], small, _defineorder)
         return o[0], (op, o[1])
     elif op == 'group':
-        return _optimize(x[1], small)
+        return _optimize(x[1], small, order)
     elif op in 'dagrange range parent ancestorspec':
         if op == 'parent':
             # x^:y means (x^) : y, not x ^ (:y)
             post = ('parentpost', x[1])
             if x[2][0] == 'dagrangepre':
-                return _optimize(('dagrange', post, x[2][1]), small)
+                return _optimize(('dagrange', post, x[2][1]), small, order)
             elif x[2][0] == 'rangepre':
-                return _optimize(('range', post, x[2][1]), small)
-
-        wa, ta = _optimize(x[1], small)
-        wb, tb = _optimize(x[2], small)
+                return _optimize(('range', post, x[2][1]), small, order)
+
+        wa, ta = _optimize(x[1], small, _defineorder)
+        wb, tb = _optimize(x[2], small, _defineorder)
         return wa + wb, (op, ta, tb)
     elif op == 'list':
-        ws, ts = zip(*(_optimize(y, small) for y in x[1:]))
+        ws, ts = zip(*(_optimize(y, small, order) for y in x[1:]))
         return sum(ws), (op,) + ts
     elif op == 'func':
         f = getstring(x[1], _("not a symbol"))
-        wa, ta = _optimize(x[2], small)
+        wa, ta = _optimize(x[2], small, _defineorder)
         if f in ("author branch closed date desc file grep keyword "
                  "outgoing user"):
             w = 10 # slow
@@ -2446,7 +2492,7 @@  def _optimize(x, small):
     return 1, x
 
 def optimize(tree):
-    _weight, newtree = _optimize(tree, small=True)
+    _weight, newtree = _optimize(tree, small=True, order=_defineorder)
     return newtree
 
 # the set of valid characters for the initial letter of symbols in