Patchwork [3,of,7,V4] revset: insert _unordered to fix ordering of nested 'or' and '_list*()' (BC)

login
register
mail settings
Submitter Yuya Nishihara
Date June 23, 2016, 3:59 p.m.
Message ID <bd5e886ceca4257dd843.1466697595@mimosa>
Download mbox | patch
Permalink /patch/15581/
State Accepted
Headers show

Comments

Yuya Nishihara - June 23, 2016, 3:59 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1455629461 -32400
#      Tue Feb 16 22:31:01 2016 +0900
# Node ID bd5e886ceca4257dd843f98539a18ef7c31ec8e0
# Parent  0a1b1af8c9b8aaa0690de2fc0614b531dc03a5c2
revset: insert _unordered to fix ordering of nested 'or' and '_list*()' (BC)

This fixes the order of 'x & (y | z)' and 'x & _list*(...)' by inserting
'_unordered' function.

This is obviously slower than the current buggy version if an input set is
large:

       #0           #1           #2           #3
    0) 0.002968     0.002980     0.002982     0.073042
    1) 0.004513     0.004485     0.012029     0.075261

    #0: 0:4000 & (0:1099 + 1000:2099 + 2000:3099)
    #1: 4000:0 & (0:1099 + 1000:2099 + 2000:3099)
    #2: 10000:0 & (0:1099 + 1000:2099 + 2000:3099)
    #3: file("path:hg") & (0:1099 + 1000:2099 + 2000:3099)

This issue could be addressed by redirecting nested 'or' to new filter
function, but it appears to be slower than the '_unordered()' insertion.

    def unionset(repo, subset, *xs):
        ss = [getset(repo, fullreposet(repo), x) for x in xs]
        return subset.filter(lambda r: any(r in s for s in ss), cache=False)

Changes from V3:
 - got rid of double subset filtering
 - dropped 'sort()', 'reverse()' and 'head()' from the list as their problems
   are slightly different
 - renamed '_reorder(set)' to '_unordered(set)' to clarify the nested set is
   evaluated as if it were 'unordered'
via Mercurial-devel - June 24, 2016, 5:27 p.m.
Sorry to provide comments so late. It has taken me a long time to
understand the current revset bugs. While playing with revsets, I also
wrote some patches. I'll send them out soon.


On Thu, Jun 23, 2016 at 8:59 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1455629461 -32400
> #      Tue Feb 16 22:31:01 2016 +0900
> # Node ID bd5e886ceca4257dd843f98539a18ef7c31ec8e0
> # Parent  0a1b1af8c9b8aaa0690de2fc0614b531dc03a5c2
> revset: insert _unordered to fix ordering of nested 'or' and '_list*()' (BC)
>
> This fixes the order of 'x & (y | z)' and 'x & _list*(...)' by inserting
> '_unordered' function.
>
> This is obviously slower than the current buggy version if an input set is
> large:
>
>        #0           #1           #2           #3
>     0) 0.002968     0.002980     0.002982     0.073042
>     1) 0.004513     0.004485     0.012029     0.075261
>
>     #0: 0:4000 & (0:1099 + 1000:2099 + 2000:3099)
>     #1: 4000:0 & (0:1099 + 1000:2099 + 2000:3099)
>     #2: 10000:0 & (0:1099 + 1000:2099 + 2000:3099)
>     #3: file("path:hg") & (0:1099 + 1000:2099 + 2000:3099)
>
> This issue could be addressed by redirecting nested 'or' to new filter
> function, but it appears to be slower than the '_unordered()' insertion.
>
>     def unionset(repo, subset, *xs):
>         ss = [getset(repo, fullreposet(repo), x) for x in xs]
>         return subset.filter(lambda r: any(r in s for s in ss), cache=False)
>
> Changes from V3:
>  - got rid of double subset filtering
>  - dropped 'sort()', 'reverse()' and 'head()' from the list as their problems
>    are slightly different
>  - renamed '_reorder(set)' to '_unordered(set)' to clarify the nested set is
>    evaluated as if it were 'unordered'
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -2205,6 +2205,14 @@ def tag(repo, subset, x):
>  def tagged(repo, subset, x):
>      return tag(repo, subset, x)
>
> +# for internal use
> +@predicate('_unordered(set)', safe=True)
> +def _unordered(repo, subset, x):
> +    # Evaluate right-hand-side expression 'set' and reorder the result set
> +    # according to the left-hand-side set 'subset'
> +    rs = getset(repo, fullreposet(repo), x)
> +    return subset & rs
> +
>  @predicate('unstable()', safe=True)
>  def unstable(repo, subset, x):
>      """Non-obsolete changesets with obsolete ancestors.
> @@ -2346,6 +2354,13 @@ followorder = 'follow'  # must follow th
>      followorder: followorder,
>  }
>
> +def _fixunionorder(x, order):
> +    """Wrap 'or' and '_list*()' by '_unordered()' to enforce the ordering
> +    requirement"""
> +    if order == followorder:
> +        return ('func', ('symbol', '_unordered'), x)
> +    return x
> +
>  def _matchonly(revs, bases):
>      """
>      >>> f = lambda *args: _matchonly(*map(parse, args))
> @@ -2444,7 +2459,7 @@ def _optimize(x, small, order):
>          # we can't reorder trees by weight because it would change the order.
>          # ("sort(a + b)" == "sort(b + a)", but "a + b" != "b + a")
>          #   ts = tuple(t for w, t in sorted(zip(ws, ts), key=lambda wt: wt[0]))
> -        return max(ws), (op,) + tuple(ts)
> +        return max(ws), _fixunionorder((op,) + tuple(ts), order)
>      elif op == 'not':
>          # Optimize not public() to _notpublic() because we have a fast version
>          if x[1] == ('func', ('symbol', 'public'), None):
> @@ -2492,7 +2507,10 @@ def _optimize(x, small, order):
>              w = 10 # assume most sorts look at changelog
>          else:
>              w = 1
> -        return w + wa, (op, x[1], ta)
> +        t = (op, x[1], ta)
> +        if f in "_list _intlist _hexlist":
> +            t = _fixunionorder(t, order)

Getting the order right is not an optimization, so I'd really like the
order to be fixed before optimize(), and then optimize() can (as you
do in later patches) remove some unnecessary ordering. I think orset()
and and _*list() should preserve the order of their subset arguments.
Perhaps rename the current orset() to unorderedorset() and make a new
orset() that simply does "return subset & unorderedorset(repo, subset,
*x)". The optimizer would then replace calls to orset() (i.e. "or"
methods) by calls to unorderedorset()  where possible. To be able to
share optimizations between orset() and _*list(), perhaps an early
step could be to replace them by your _unordered applied to the
unordered version of them, so e.g.

      (func
        ('symbol', '_list')
        ('string', '0\x002\x001')))

would become

    (func
      ('symbol', '_unordered')
      (func
        ('symbol', '_unorderedlist')
        ('string', '0\x002\x001')))

which can then possibly be further optimized as you already have done
in later patches. The names are a bit confusing; it sounds like the
outer _unordered wouldn't do anything, but it actually does order it.
I think part of the confusion is that the trees are rendered without
the implicit subset argument. I think I liked _reorder better, or
perhaps we should even call _unorderedlist() _misorderedlist() to make
it clear that there was an order (the subset's) that was ignored.

(Perhaps adding a --unoptimized to debugrevspec would be useful for
checking that optimizations don't break (or fix) anything.)

> +        return w + wa, t
>      return 1, x
>
>  def optimize(tree):
> diff --git a/tests/test-revset.t b/tests/test-revset.t
> --- a/tests/test-revset.t
> +++ b/tests/test-revset.t
> @@ -979,14 +979,17 @@ ordering defined by it.
>        ('symbol', '2')
>        ('symbol', '0'))
>      (func
> -      ('symbol', '_list')
> -      ('string', '0\x001\x002')))
> +      ('symbol', '_unordered')
> +      (func
> +        ('symbol', '_list')
> +        ('string', '0\x001\x002'))))
>    * set:
> -  <baseset [0, 1, 2]>
> +  <filteredset
> +    <spanset- 0:2>,
> +    <baseset [0, 1, 2]>>
> +  2
> +  1
>    0
> -  1
> -  2
> - BROKEN: should be '2 1 0'
>
>   'A + B' should take the ordering of the left expression:
>
> @@ -1006,21 +1009,22 @@ ordering defined by it.
>      (range
>        ('symbol', '2')
>        ('symbol', '0'))
> -    (or
> -      (range
> -        ('symbol', '0')
> -        ('symbol', '1'))
> -      ('symbol', '2')))
> +    (func
> +      ('symbol', '_unordered')
> +      (or
> +        (range
> +          ('symbol', '0')
> +          ('symbol', '1'))
> +        ('symbol', '2'))))
>    * set:
> -  <addset
> -    <filteredset
> +  <filteredset
> +    <spanset- 0:2>,
> +    <addset
>        <spanset+ 0:1>,
> -      <spanset- 0:2>>,
> -    <baseset [2]>>
> +      <baseset [2]>>>
> +  2
> +  1
>    0
> -  1
> -  2
> - BROKEN: should be '2 1 0'
>
>   '_intlist(a b)' should behave like 'a + b':
>
> @@ -1035,15 +1039,17 @@ ordering defined by it.
>    * optimized:
>    (and
>      (func
> -      ('symbol', '_intlist')
> -      ('string', '0\x001\x002'))
> +      ('symbol', '_unordered')
> +      (func
> +        ('symbol', '_intlist')
> +        ('string', '0\x001\x002')))
>      (range
>        ('symbol', '2')
>        ('symbol', '0')))
>    * set:
>    <filteredset
>      <spanset- 0:2>,
> -    <baseset [0, 1, 2]>>
> +    <baseset+ [0, 1, 2]>>
>    2
>    1
>    0
> @@ -1089,14 +1095,17 @@ ordering defined by it.
>        ('symbol', '2')
>        ('symbol', '0'))
>      (func
> -      ('symbol', '_hexlist')
> -      ('string', '*'))) (glob)
> +      ('symbol', '_unordered')
> +      (func
> +        ('symbol', '_hexlist')
> +        ('string', '*')))) (glob)
>    * set:
> -  <baseset [0, 1, 2]>
> +  <filteredset
> +    <spanset- 0:2>,
> +    <baseset [0, 1, 2]>>
> +  2
> +  1
>    0
> -  1
> -  2
> - BROKEN: should be '2 1 0'
>
>    $ trylist --optimize --bin '%ln & 2:0' `hg log -T '{node} ' -r0+2+1`
>    (and
> @@ -1120,6 +1129,67 @@ ordering defined by it.
>    2
>    1
>
> + '_unordered' should be omitted if order doesn't matter:
> +
> +  $ try --optimize '2:0 & not (0 + 1)'
> +  (and
> +    (range
> +      ('symbol', '2')
> +      ('symbol', '0'))
> +    (not
> +      (group
> +        (or
> +          ('symbol', '0')
> +          ('symbol', '1')))))
> +  * optimized:
> +  (difference
> +    (range
> +      ('symbol', '2')
> +      ('symbol', '0'))
> +    (func
> +      ('symbol', '_list')
> +      ('string', '0\x001')))
> +  * set:
> +  <filteredset
> +    <spanset- 0:2>,
> +    <not
> +      <baseset [0, 1]>>>
> +  2
> +
> +  $ try --optimize '2:0 & not (0:2 & (0 + 1))'
> +  (and
> +    (range
> +      ('symbol', '2')
> +      ('symbol', '0'))
> +    (not
> +      (group
> +        (and
> +          (range
> +            ('symbol', '0')
> +            ('symbol', '2'))
> +          (group
> +            (or
> +              ('symbol', '0')
> +              ('symbol', '1')))))))
> +  * optimized:
> +  (difference
> +    (range
> +      ('symbol', '2')
> +      ('symbol', '0'))
> +    (and
> +      (range
> +        ('symbol', '0')
> +        ('symbol', '2'))
> +      (func
> +        ('symbol', '_list')
> +        ('string', '0\x001'))))
> +  * set:
> +  <filteredset
> +    <spanset- 0:2>,
> +    <not
> +      <baseset [0, 1]>>>
> +  2
> +
>   'present()' should do nothing other than suppressing an error:
>
>    $ try --optimize '2:0 & present(0 + 1 + 2)'
> @@ -1316,9 +1386,10 @@ ordering defined by it.
>      <spanset- 0:2>>
>    1
>
> - 'A & B' can be rewritten as 'B & A' by weight, but the ordering rule should
> - be determined before the optimization (i.e. 'B' should take the ordering of
> - 'A'):
> + 'A & B' can be rewritten as 'B & A' by weight, but that's fine as long as
> + the ordering rule is determined before the rewrite; in this example,
> + 'B' follows the order of the initial set, which is the same order as 'A'
> + since 'A' also follows the order:
>
>    $ try --optimize 'contains("glob:*") & (2 + 0 + 1)'
>    (and
> @@ -1333,19 +1404,23 @@ ordering defined by it.
>    * optimized:
>    (and
>      (func
> -      ('symbol', '_list')
> -      ('string', '2\x000\x001'))
> +      ('symbol', '_unordered')
> +      (func
> +        ('symbol', '_list')
> +        ('string', '2\x000\x001')))
>      (func
>        ('symbol', 'contains')
>        ('string', 'glob:*')))
>    * set:
>    <filteredset
> -    <baseset [2, 0, 1]>,
> +    <baseset+ [0, 1, 2]>,
>      <contains 'glob:*'>>
> -  2
>    0
>    1
> - BROKEN: should be '0 1 2'
> +  2
> +
> + and in this example, 'A & B' is rewritten as 'B & A', but 'A' overrides
> + the order appropriately:
>
>    $ try --optimize 'reverse(contains("glob:*")) & (0 + 2 + 1)'
>    (and
> @@ -1362,8 +1437,10 @@ ordering defined by it.
>    * optimized:
>    (and
>      (func
> -      ('symbol', '_list')
> -      ('string', '0\x002\x001'))
> +      ('symbol', '_unordered')
> +      (func
> +        ('symbol', '_list')
> +        ('string', '0\x002\x001')))
>      (func
>        ('symbol', 'reverse')
>        (func
> @@ -1371,12 +1448,11 @@ ordering defined by it.
>          ('string', 'glob:*'))))
>    * set:
>    <filteredset
> -    <baseset [1, 2, 0]>,
> +    <baseset- [0, 1, 2]>,
>      <contains 'glob:*'>>
> +  2
>    1
> -  2
>    0
> - BROKEN: should be '2 1 0'
>
>  test sort revset
>  --------------------------------------------
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - June 25, 2016, 5:58 a.m.
On Fri, 24 Jun 2016 10:27:03 -0700, Martin von Zweigbergk wrote:
> On Thu, Jun 23, 2016 at 8:59 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > # HG changeset patch
> > # User Yuya Nishihara <yuya@tcha.org>
> > # Date 1455629461 -32400
> > #      Tue Feb 16 22:31:01 2016 +0900
> > # Node ID bd5e886ceca4257dd843f98539a18ef7c31ec8e0
> > # Parent  0a1b1af8c9b8aaa0690de2fc0614b531dc03a5c2
> > revset: insert _unordered to fix ordering of nested 'or' and '_list*()' (BC)

> Getting the order right is not an optimization, so I'd really like the
> order to be fixed before optimize(), and then optimize() can (as you
> do in later patches) remove some unnecessary ordering.

optimize() does a bunch of tree rewriting stuffs unrelated to the
optimization. Do you mean we should split optimize() into two more parts
so that we can compare the optimized result with the pure result?

In that case, sort() and reverse() should be eliminated at the first step
to get a correct result. It doesn't make sense to rewrite sort to
_unorderedsort at the first step, and then remove it at the second step.

> I think orset()
> and and _*list() should preserve the order of their subset arguments.
> Perhaps rename the current orset() to unorderedorset() and make a new
> orset() that simply does "return subset & unorderedorset(repo, subset,
> *x)". The optimizer would then replace calls to orset() (i.e. "or"
> methods) by calls to unorderedorset()  where possible. To be able to
> share optimizations between orset() and _*list(), perhaps an early
> step could be to replace them by your _unordered applied to the
> unordered version of them, so e.g.
> 
>       (func
>         ('symbol', '_list')
>         ('string', '0\x002\x001')))
> 
> would become
> 
>     (func
>       ('symbol', '_unordered')
>       (func
>         ('symbol', '_unorderedlist')
>         ('string', '0\x002\x001')))
> 
> which can then possibly be further optimized as you already have done
> in later patches.

I just wanted to avoid introducing a set of internal functions. If we need to
resolve ordering before optimize(), we'll need '_list*' and '_unorderedlist*'
pairs anyway. I don't think it's worth remapping them to '_unordered' at
the second stage of optimize().

> The names are a bit confusing; it sounds like the
> outer _unordered wouldn't do anything, but it actually does order it.
> I think part of the confusion is that the trees are rendered without
> the implicit subset argument. I think I liked _reorder better, or
> perhaps we should even call _unorderedlist() _misorderedlist() to make
> it clear that there was an order (the subset's) that was ignored.

I'm okay to add unorderedor(), _unorderdlist*(), etc. and replace _unordered()
by them. That will avoid confusion.
via Mercurial-devel - June 25, 2016, 1:26 p.m.
On Fri, Jun 24, 2016 at 10:58 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Fri, 24 Jun 2016 10:27:03 -0700, Martin von Zweigbergk wrote:
>> On Thu, Jun 23, 2016 at 8:59 AM, Yuya Nishihara <yuya@tcha.org> wrote:
>> > # HG changeset patch
>> > # User Yuya Nishihara <yuya@tcha.org>
>> > # Date 1455629461 -32400
>> > #      Tue Feb 16 22:31:01 2016 +0900
>> > # Node ID bd5e886ceca4257dd843f98539a18ef7c31ec8e0
>> > # Parent  0a1b1af8c9b8aaa0690de2fc0614b531dc03a5c2
>> > revset: insert _unordered to fix ordering of nested 'or' and '_list*()' (BC)
>
>> Getting the order right is not an optimization, so I'd really like the
>> order to be fixed before optimize(), and then optimize() can (as you
>> do in later patches) remove some unnecessary ordering.
>
> optimize() does a bunch of tree rewriting stuffs unrelated to the
> optimization. Do you mean we should split optimize() into two more parts
> so that we can compare the optimized result with the pure result?

It seems to me like that would make it clearer if optimize() only did
optimization, and I had assumed that that's how it already was. Since
that is far from the case, that would obviously be another patch if we
even want to do that.

>
> In that case, sort() and reverse() should be eliminated at the first step
> to get a correct result. It doesn't make sense to rewrite sort to
> _unorderedsort at the first step, and then remove it at the second step.

Or make the optimization step aware of both _unordered/_reorder and
sort. Anyway, not in scope, so we can discuss that later.

>
>> I think orset()
>> and and _*list() should preserve the order of their subset arguments.
>> Perhaps rename the current orset() to unorderedorset() and make a new
>> orset() that simply does "return subset & unorderedorset(repo, subset,
>> *x)". The optimizer would then replace calls to orset() (i.e. "or"
>> methods) by calls to unorderedorset()  where possible. To be able to
>> share optimizations between orset() and _*list(), perhaps an early
>> step could be to replace them by your _unordered applied to the
>> unordered version of them, so e.g.
>>
>>       (func
>>         ('symbol', '_list')
>>         ('string', '0\x002\x001')))
>>
>> would become
>>
>>     (func
>>       ('symbol', '_unordered')
>>       (func
>>         ('symbol', '_unorderedlist')
>>         ('string', '0\x002\x001')))
>>
>> which can then possibly be further optimized as you already have done
>> in later patches.
>
> I just wanted to avoid introducing a set of internal functions. If we need to
> resolve ordering before optimize(), we'll need '_list*' and '_unorderedlist*'
> pairs anyway. I don't think it's worth remapping them to '_unordered' at
> the second stage of optimize().
>
>> The names are a bit confusing; it sounds like the
>> outer _unordered wouldn't do anything, but it actually does order it.
>> I think part of the confusion is that the trees are rendered without
>> the implicit subset argument. I think I liked _reorder better, or
>> perhaps we should even call _unorderedlist() _misorderedlist() to make
>> it clear that there was an order (the subset's) that was ignored.
>
> I'm okay to add unorderedor(), _unorderdlist*(), etc. and replace _unordered()
> by them. That will avoid confusion.

Yes, I think that would make it much clearer that they were not
following the subset ordering like the other functions do. Thanks!
Yuya Nishihara - June 26, 2016, 8:33 a.m.
On Sat, 25 Jun 2016 06:26:17 -0700, Martin von Zweigbergk wrote:
> On Fri, Jun 24, 2016 at 10:58 PM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Fri, 24 Jun 2016 10:27:03 -0700, Martin von Zweigbergk wrote:  
> >> On Thu, Jun 23, 2016 at 8:59 AM, Yuya Nishihara <yuya@tcha.org> wrote:  
> >> > # HG changeset patch
> >> > # User Yuya Nishihara <yuya@tcha.org>
> >> > # Date 1455629461 -32400
> >> > #      Tue Feb 16 22:31:01 2016 +0900
> >> > # Node ID bd5e886ceca4257dd843f98539a18ef7c31ec8e0
> >> > # Parent  0a1b1af8c9b8aaa0690de2fc0614b531dc03a5c2
> >> > revset: insert _unordered to fix ordering of nested 'or' and '_list*()' (BC)  
> >  
> >> Getting the order right is not an optimization, so I'd really like the
> >> order to be fixed before optimize(), and then optimize() can (as you
> >> do in later patches) remove some unnecessary ordering.  
> >
> > optimize() does a bunch of tree rewriting stuffs unrelated to the
> > optimization. Do you mean we should split optimize() into two more parts
> > so that we can compare the optimized result with the pure result?  
> 
> It seems to me like that would make it clearer if optimize() only did
> optimization, and I had assumed that that's how it already was. Since
> that is far from the case, that would obviously be another patch if we
> even want to do that.

Sure. I like the idea of splitting optimize(). If it is simple enough, let's
try that as a separate series.

> > I'm okay to add unorderedor(), _unorderdlist*(), etc. and replace _unordered()
> > by them. That will avoid confusion.  
> 
> Yes, I think that would make it much clearer that they were not
> following the subset ordering like the other functions do. Thanks!

will do in V5. BTW, it seems we have opposite definition for these terms.
My definition is:

  ordered:   has special ordering like rangeset() and orset()
  unordered: has no special requirement for ordering, follows the subset

From your comment, I should rename the current _list to _(un)?orderedlist
to mark it a function having special ordering requirement.
Yuya Nishihara - June 27, 2016, 3:52 p.m.
On Sun, 26 Jun 2016 08:00:14 -0700, Martin von Zweigbergk wrote:
> On Sun, Jun 26, 2016 at 1:33 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Sat, 25 Jun 2016 06:26:17 -0700, Martin von Zweigbergk wrote:  
> >> On Fri, Jun 24, 2016 at 10:58 PM, Yuya Nishihara <yuya@tcha.org> wrote:  
> >> > On Fri, 24 Jun 2016 10:27:03 -0700, Martin von Zweigbergk wrote:  
> >> >> On Thu, Jun 23, 2016 at 8:59 AM, Yuya Nishihara <yuya@tcha.org> wrote:  
> >> >> > # HG changeset patch
> >> >> > # User Yuya Nishihara <yuya@tcha.org>
> >> >> > # Date 1455629461 -32400
> >> >> > #      Tue Feb 16 22:31:01 2016 +0900
> >> >> > # Node ID bd5e886ceca4257dd843f98539a18ef7c31ec8e0
> >> >> > # Parent  0a1b1af8c9b8aaa0690de2fc0614b531dc03a5c2
> >> >> > revset: insert _unordered to fix ordering of nested 'or' and '_list*()' (BC)  
> >> >  
> >> >> Getting the order right is not an optimization, so I'd really like the
> >> >> order to be fixed before optimize(), and then optimize() can (as you
> >> >> do in later patches) remove some unnecessary ordering.  
> >> >
> >> > optimize() does a bunch of tree rewriting stuffs unrelated to the
> >> > optimization. Do you mean we should split optimize() into two more parts
> >> > so that we can compare the optimized result with the pure result?  
> >>
> >> It seems to me like that would make it clearer if optimize() only did
> >> optimization, and I had assumed that that's how it already was. Since
> >> that is far from the case, that would obviously be another patch if we
> >> even want to do that.  
> >
> > Sure. I like the idea of splitting optimize(). If it is simple enough, let's
> > try that as a separate series.
> >  
> >> > I'm okay to add unorderedor(), _unorderdlist*(), etc. and replace _unordered()
> >> > by them. That will avoid confusion.  
> >>
> >> Yes, I think that would make it much clearer that they were not
> >> following the subset ordering like the other functions do. Thanks!  
> >
> > will do in V5. BTW, it seems we have opposite definition for these terms.
> > My definition is:
> >
> >   ordered:   has special ordering like rangeset() and orset()
> >   unordered: has no special requirement for ordering, follows the subset
> >
> > From your comment, I should rename the current _list to _(un)?orderedlist
> > to mark it a function having special ordering requirement.  
> 
> My view is this:
> 
> * All predicates define some native order that's visible when used at
> the top level, or when used in unions. For most, it's simply ascending
> by rev number. E.g. rangeset, dagset, sorted have a different order.
> * "x & y" should preserve the order of x

Yeah, my view is different:

* Most predicates are like filters, which have no particular order, and they
  take the order specified by the input, which is ascending by rev number
  by default.
* A few predicates (e.g. range, sort, or) have order, which is applied only
  when the predicate is used at top level, "x" of "x & y".

> So when I used "ordered" or "{un,mis}ordered", I meant whether it
> preserves the order of the subset it's intersected with. I don't think
> we should have any functions that return an unspecified order.

I agree on this point. But I think the order should be specified by the subset
argument, so "subset & x" guarantees the result is stable.

> I believe that's what Pierre-Yves's recent commit for sorting sets in
> ascending order was about. Hopefully we don't have any cases left of
> unspecified order.

IIRC, 69c6e9623bdc was necessary to make repr(baseset(...)) stable.

> Do you agree? I got the feeling earlier that you did not think each
> function should necessarily have a native order, as long as the revset
> module as a whole made sure the result made sense.

If a native order means ascending by rev number, I don't think it's necessary.
I think the important property is that each function take the order specified
by the input set.
via Mercurial-devel - Aug. 2, 2016, 4:37 p.m.
I apologize again for the extremely long delay :-(

On Mon, Jun 27, 2016 at 8:52 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Sun, 26 Jun 2016 08:00:14 -0700, Martin von Zweigbergk wrote:
>> On Sun, Jun 26, 2016 at 1:33 AM, Yuya Nishihara <yuya@tcha.org> wrote:
>> > On Sat, 25 Jun 2016 06:26:17 -0700, Martin von Zweigbergk wrote:
>> >> On Fri, Jun 24, 2016 at 10:58 PM, Yuya Nishihara <yuya@tcha.org>
wrote:
>> >> > On Fri, 24 Jun 2016 10:27:03 -0700, Martin von Zweigbergk wrote:
>> >> >> On Thu, Jun 23, 2016 at 8:59 AM, Yuya Nishihara <yuya@tcha.org>
wrote:
>> >> >> > # HG changeset patch
>> >> >> > # User Yuya Nishihara <yuya@tcha.org>
>> >> >> > # Date 1455629461 -32400
>> >> >> > #      Tue Feb 16 22:31:01 2016 +0900
>> >> >> > # Node ID bd5e886ceca4257dd843f98539a18ef7c31ec8e0
>> >> >> > # Parent  0a1b1af8c9b8aaa0690de2fc0614b531dc03a5c2
>> >> >> > revset: insert _unordered to fix ordering of nested 'or' and
'_list*()' (BC)
>> >> >
>> >> >> Getting the order right is not an optimization, so I'd really like
the
>> >> >> order to be fixed before optimize(), and then optimize() can (as
you
>> >> >> do in later patches) remove some unnecessary ordering.
>> >> >
>> >> > optimize() does a bunch of tree rewriting stuffs unrelated to the
>> >> > optimization. Do you mean we should split optimize() into two more
parts
>> >> > so that we can compare the optimized result with the pure result?
>> >>
>> >> It seems to me like that would make it clearer if optimize() only did
>> >> optimization, and I had assumed that that's how it already was. Since
>> >> that is far from the case, that would obviously be another patch if we
>> >> even want to do that.
>> >
>> > Sure. I like the idea of splitting optimize(). If it is simple enough,
let's
>> > try that as a separate series.
>> >
>> >> > I'm okay to add unorderedor(), _unorderdlist*(), etc. and replace
_unordered()
>> >> > by them. That will avoid confusion.
>> >>
>> >> Yes, I think that would make it much clearer that they were not
>> >> following the subset ordering like the other functions do. Thanks!
>> >
>> > will do in V5. BTW, it seems we have opposite definition for these
terms.
>> > My definition is:
>> >
>> >   ordered:   has special ordering like rangeset() and orset()
>> >   unordered: has no special requirement for ordering, follows the
subset
>> >
>> > From your comment, I should rename the current _list to
_(un)?orderedlist
>> > to mark it a function having special ordering requirement.
>>
>> My view is this:
>>
>> * All predicates define some native order that's visible when used at
>> the top level, or when used in unions. For most, it's simply ascending
>> by rev number. E.g. rangeset, dagset, sorted have a different order.
>> * "x & y" should preserve the order of x
>
> Yeah, my view is different:
>
> * Most predicates are like filters, which have no particular order, and
they
>   take the order specified by the input, which is ascending by rev number
>   by default.
> * A few predicates (e.g. range, sort, or) have order, which is applied
only
>   when the predicate is used at top level, "x" of "x & y".
>
>> So when I used "ordered" or "{un,mis}ordered", I meant whether it
>> preserves the order of the subset it's intersected with. I don't think
>> we should have any functions that return an unspecified order.
>
> I agree on this point. But I think the order should be specified by the
subset
> argument, so "subset & x" guarantees the result is stable.
>
>> I believe that's what Pierre-Yves's recent commit for sorting sets in
>> ascending order was about. Hopefully we don't have any cases left of
>> unspecified order.
>
> IIRC, 69c6e9623bdc was necessary to make repr(baseset(...)) stable.
>
>> Do you agree? I got the feeling earlier that you did not think each
>> function should necessarily have a native order, as long as the revset
>> module as a whole made sure the result made sense.
>
> If a native order means ascending by rev number, I don't think it's
necessary.
> I think the important property is that each function take the order
specified
> by the input set.

It seems we agree how it should work from the user's point of view, but
let's confirm with a few examples:

1. heads()
2. 1+2+0
3. sort(1+2+0)
4. reverse(heads()) & sort(heads())

1) will be in ascending order. 2) will be in the given order (1 2 0). 3)
will be in ascending order. 4) will be in descending order

Still from the user's point of view, it's the ascending "native order" of
heads() that shows up in 1) and 4). If you replace heads() by 5::10, you'd
get that predicate's order, and its reverse, of course.

I think we also agree that the subsets passed to the revset predicates are
just there as an optimization. It would have been simpler to do that
intersection outside the revset predicates, but putting it inside gives the
predicate the opportunity to consider only a subset of history when
producing it's own set (before intersecting subset with it). If we had not
been concerned about performance, sort(1+2+0) would not have received a
subset as input and we would simply not have intersected it with
fullreposet. Then fullreposet.__and__() would not have been special, and it
would not have to ignore its own order as it currently does.

However, we do care about performance, and passing in the subset argument
has been a fruitful way of dealing with it. Unfortunately, blindly
intersecting with the subset removes order (as intersection should), so
some revsets that need a particular order avoid intersecting with the
subset. Those who care less about order (who want the usual ascending
order) do intersect with the subset and rely on the odd trick in
fullreposet.__and__ to sort its argument.

So, I wonder if the order argument you defined should be passed to all
revset predicates, just like the subset is. I saw you did something like
that just for rangeset(), but treating rangeset() differently seems wrong
to me. If the goal is to improve BC for extensions, we can probably do that
in a different way (e.g. in the decorator). I really don't know if this is
the right solution. Hopefully you have more good ideas.
Yuya Nishihara - Aug. 3, 2016, 4:05 p.m.
On Tue, 2 Aug 2016 09:37:49 -0700, Martin von Zweigbergk wrote:
> It seems we agree how it should work from the user's point of view, but
> let's confirm with a few examples:
> 
> 1. heads()
> 2. 1+2+0
> 3. sort(1+2+0)
> 4. reverse(heads()) & sort(heads())
> 
> 1) will be in ascending order. 2) will be in the given order (1 2 0). 3)
> will be in ascending order. 4) will be in descending order

Exactly.

> Still from the user's point of view, it's the ascending "native order" of
> heads() that shows up in 1) and 4). If you replace heads() by 5::10, you'd
> get that predicate's order, and its reverse, of course.

I have slightly different view. When I use heads(), I don't care the order.
heads() (and most other predicates) are neutral for ordering, and default to
the natural order of the revset, which is ascending.

> I think we also agree that the subsets passed to the revset predicates are
> just there as an optimization. It would have been simpler to do that
> intersection outside the revset predicates, but putting it inside gives the
> predicate the opportunity to consider only a subset of history when
> producing it's own set (before intersecting subset with it). If we had not
> been concerned about performance, sort(1+2+0) would not have received a
> subset as input and we would simply not have intersected it with
> fullreposet. Then fullreposet.__and__() would not have been special, and it
> would not have to ignore its own order as it currently does.

I agree the subset argument is implementation detail for optimization.

I don't get fullreposet.__and__() thing. IMHO, currently it does nothing
special. It complies to the abstractsmartset interface, which is __and__()
is ordered by self, not by other.

> However, we do care about performance, and passing in the subset argument
> has been a fruitful way of dealing with it. Unfortunately, blindly
> intersecting with the subset removes order (as intersection should), so
> some revsets that need a particular order avoid intersecting with the
> subset. Those who care less about order (who want the usual ascending
> order) do intersect with the subset and rely on the odd trick in
> fullreposet.__and__ to sort its argument.

Maybe we could handle the order separately if we don't care about performance,
something like: (unordered) set operations + a dominant ordering expression.

> So, I wonder if the order argument you defined should be passed to all
> revset predicates, just like the subset is. I saw you did something like
> that just for rangeset(), but treating rangeset() differently seems wrong
> to me. If the goal is to improve BC for extensions, we can probably do that
> in a different way (e.g. in the decorator). I really don't know if this is
> the right solution. Hopefully you have more good ideas.

I thought about that before. I tend to agree with you in that treating some
functions differently is bad, but I do that way because of two reasons:

 a. most functions don't need "order" flag. I don't want to introduce new
    trick because of a few order-sensitive functions.
 b. "order" flag seems not easy to understand, extension authors would be
    confused about how to handle it.
via Mercurial-devel - Aug. 3, 2016, 9:24 p.m.
On Wed, Aug 3, 2016 at 9:05 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> On Tue, 2 Aug 2016 09:37:49 -0700, Martin von Zweigbergk wrote:
>> It seems we agree how it should work from the user's point of view, but
>> let's confirm with a few examples:
>>
>> 1. heads()
>> 2. 1+2+0
>> 3. sort(1+2+0)
>> 4. reverse(heads()) & sort(heads())
>>
>> 1) will be in ascending order. 2) will be in the given order (1 2 0). 3)
>> will be in ascending order. 4) will be in descending order
>
> Exactly.
>
>> Still from the user's point of view, it's the ascending "native order" of
>> heads() that shows up in 1) and 4). If you replace heads() by 5::10, you'd
>> get that predicate's order, and its reverse, of course.
>
> I have slightly different view. When I use heads(), I don't care the order.
> heads() (and most other predicates) are neutral for ordering, and default to
> the natural order of the revset, which is ascending.

How is the "neutral, but default to ascending" order that you think
they should have different from plain "ascending" that I think they
should have? Any observable difference?

>
>> I think we also agree that the subsets passed to the revset predicates are
>> just there as an optimization. It would have been simpler to do that
>> intersection outside the revset predicates, but putting it inside gives the
>> predicate the opportunity to consider only a subset of history when
>> producing it's own set (before intersecting subset with it). If we had not
>> been concerned about performance, sort(1+2+0) would not have received a
>> subset as input and we would simply not have intersected it with
>> fullreposet. Then fullreposet.__and__() would not have been special, and it
>> would not have to ignore its own order as it currently does.
>
> I agree the subset argument is implementation detail for optimization.
>
> I don't get fullreposet.__and__() thing. IMHO, currently it does nothing
> special. It complies to the abstractsmartset interface, which is __and__()
> is ordered by self, not by other.

It does something very special, IMO:
"other.sort(reverse=self.isdescending())". I think sorting the other
set is wrong. If I'm reading the comment in the function right ("XXX
As fullreposet is also used as bootstrap, this is wrong."),
Pierre-Yves (I assume, but I haven't checked) also thinks it's wrong
for it to sort the other set.

>
>> However, we do care about performance, and passing in the subset argument
>> has been a fruitful way of dealing with it. Unfortunately, blindly
>> intersecting with the subset removes order (as intersection should), so
>> some revsets that need a particular order avoid intersecting with the
>> subset. Those who care less about order (who want the usual ascending
>> order) do intersect with the subset and rely on the odd trick in
>> fullreposet.__and__ to sort its argument.
>
> Maybe we could handle the order separately if we don't care about performance,
> something like: (unordered) set operations + a dominant ordering expression.

IMO, if we did not care about performance, each set would define
*some* order. For most it would be ascending order.

>
>> So, I wonder if the order argument you defined should be passed to all
>> revset predicates, just like the subset is. I saw you did something like
>> that just for rangeset(), but treating rangeset() differently seems wrong
>> to me. If the goal is to improve BC for extensions, we can probably do that
>> in a different way (e.g. in the decorator). I really don't know if this is
>> the right solution. Hopefully you have more good ideas.
>
> I thought about that before. I tend to agree with you in that treating some
> functions differently is bad, but I do that way because of two reasons:
>
>  a. most functions don't need "order" flag. I don't want to introduce new
>     trick because of a few order-sensitive functions.
>  b. "order" flag seems not easy to understand, extension authors would be
>     confused about how to handle it.

Perhaps we could make the @predicate function handle some of it? Maybe
@predicate('...', definesorder=True) or so that makes the revset
engine pass an order argument to it and to honor the order it returns
(maybe it was more complicated, I don't remember). It turns out that
most predicates don't care much about the subset either, so we could
even change the predicate to make the subset argument optional.
Yuya Nishihara - Aug. 4, 2016, 1:34 p.m.
On Wed, 3 Aug 2016 14:24:28 -0700, Martin von Zweigbergk wrote:
> On Wed, Aug 3, 2016 at 9:05 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > On Tue, 2 Aug 2016 09:37:49 -0700, Martin von Zweigbergk wrote:
> >> It seems we agree how it should work from the user's point of view, but
> >> let's confirm with a few examples:
> >>
> >> 1. heads()
> >> 2. 1+2+0
> >> 3. sort(1+2+0)
> >> 4. reverse(heads()) & sort(heads())
> >>
> >> 1) will be in ascending order. 2) will be in the given order (1 2 0). 3)
> >> will be in ascending order. 4) will be in descending order
> >
> > Exactly.
> >
> >> Still from the user's point of view, it's the ascending "native order" of
> >> heads() that shows up in 1) and 4). If you replace heads() by 5::10, you'd
> >> get that predicate's order, and its reverse, of course.
> >
> > I have slightly different view. When I use heads(), I don't care the order.
> > heads() (and most other predicates) are neutral for ordering, and default to
> > the natural order of the revset, which is ascending.
> 
> How is the "neutral, but default to ascending" order that you think
> they should have different from plain "ascending" that I think they
> should have? Any observable difference?

No difference. I just wanted to tell that I took many revset predicates as
set operations.

> >> I think we also agree that the subsets passed to the revset predicates are
> >> just there as an optimization. It would have been simpler to do that
> >> intersection outside the revset predicates, but putting it inside gives the
> >> predicate the opportunity to consider only a subset of history when
> >> producing it's own set (before intersecting subset with it). If we had not
> >> been concerned about performance, sort(1+2+0) would not have received a
> >> subset as input and we would simply not have intersected it with
> >> fullreposet. Then fullreposet.__and__() would not have been special, and it
> >> would not have to ignore its own order as it currently does.
> >
> > I agree the subset argument is implementation detail for optimization.
> >
> > I don't get fullreposet.__and__() thing. IMHO, currently it does nothing
> > special. It complies to the abstractsmartset interface, which is __and__()
> > is ordered by self, not by other.
> 
> It does something very special, IMO:
> "other.sort(reverse=self.isdescending())". I think sorting the other
> set is wrong.

Yeah, mutating the other set seems messy, but that is another issue we have
in the current revset code. Given that fullreposet is a spanset, it's valid
to return the other set in self's order.

> If I'm reading the comment in the function right ("XXX
> As fullreposet is also used as bootstrap, this is wrong."),
> Pierre-Yves (I assume, but I haven't checked) also thinks it's wrong
> for it to sort the other set.

IMO, that was a possible workaround for the ordering issue. We could do that
if a) fullreposet were used only for bootstrapping and if b) all revset
predicates had stable order. Since (a) is wrong because of optimize(), I
don't think it's good idea to make fullreposet.__and__() disagree with its
superclass.

> >> However, we do care about performance, and passing in the subset argument
> >> has been a fruitful way of dealing with it. Unfortunately, blindly
> >> intersecting with the subset removes order (as intersection should), so
> >> some revsets that need a particular order avoid intersecting with the
> >> subset. Those who care less about order (who want the usual ascending
> >> order) do intersect with the subset and rely on the odd trick in
> >> fullreposet.__and__ to sort its argument.
> >
> > Maybe we could handle the order separately if we don't care about performance,
> > something like: (unordered) set operations + a dominant ordering expression.
> 
> IMO, if we did not care about performance, each set would define
> *some* order. For most it would be ascending order.

That's okay, but anyway we would have to pick one order from them.

> >> So, I wonder if the order argument you defined should be passed to all
> >> revset predicates, just like the subset is. I saw you did something like
> >> that just for rangeset(), but treating rangeset() differently seems wrong
> >> to me. If the goal is to improve BC for extensions, we can probably do that
> >> in a different way (e.g. in the decorator). I really don't know if this is
> >> the right solution. Hopefully you have more good ideas.
> >
> > I thought about that before. I tend to agree with you in that treating some
> > functions differently is bad, but I do that way because of two reasons:
> >
> >  a. most functions don't need "order" flag. I don't want to introduce new
> >     trick because of a few order-sensitive functions.
> >  b. "order" flag seems not easy to understand, extension authors would be
> >     confused about how to handle it.
> 
> Perhaps we could make the @predicate function handle some of it? Maybe
> @predicate('...', definesorder=True) or so that makes the revset
> engine pass an order argument to it and to honor the order it returns
> (maybe it was more complicated, I don't remember). It turns out that
> most predicates don't care much about the subset either, so we could
> even change the predicate to make the subset argument optional.

I'm sure it won't be simple, but it appears not that complicated. I'll try.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2205,6 +2205,14 @@  def tag(repo, subset, x):
 def tagged(repo, subset, x):
     return tag(repo, subset, x)
 
+# for internal use
+@predicate('_unordered(set)', safe=True)
+def _unordered(repo, subset, x):
+    # Evaluate right-hand-side expression 'set' and reorder the result set
+    # according to the left-hand-side set 'subset'
+    rs = getset(repo, fullreposet(repo), x)
+    return subset & rs
+
 @predicate('unstable()', safe=True)
 def unstable(repo, subset, x):
     """Non-obsolete changesets with obsolete ancestors.
@@ -2346,6 +2354,13 @@  followorder = 'follow'  # must follow th
     followorder: followorder,
 }
 
+def _fixunionorder(x, order):
+    """Wrap 'or' and '_list*()' by '_unordered()' to enforce the ordering
+    requirement"""
+    if order == followorder:
+        return ('func', ('symbol', '_unordered'), x)
+    return x
+
 def _matchonly(revs, bases):
     """
     >>> f = lambda *args: _matchonly(*map(parse, args))
@@ -2444,7 +2459,7 @@  def _optimize(x, small, order):
         # we can't reorder trees by weight because it would change the order.
         # ("sort(a + b)" == "sort(b + a)", but "a + b" != "b + a")
         #   ts = tuple(t for w, t in sorted(zip(ws, ts), key=lambda wt: wt[0]))
-        return max(ws), (op,) + tuple(ts)
+        return max(ws), _fixunionorder((op,) + tuple(ts), order)
     elif op == 'not':
         # Optimize not public() to _notpublic() because we have a fast version
         if x[1] == ('func', ('symbol', 'public'), None):
@@ -2492,7 +2507,10 @@  def _optimize(x, small, order):
             w = 10 # assume most sorts look at changelog
         else:
             w = 1
-        return w + wa, (op, x[1], ta)
+        t = (op, x[1], ta)
+        if f in "_list _intlist _hexlist":
+            t = _fixunionorder(t, order)
+        return w + wa, t
     return 1, x
 
 def optimize(tree):
diff --git a/tests/test-revset.t b/tests/test-revset.t
--- a/tests/test-revset.t
+++ b/tests/test-revset.t
@@ -979,14 +979,17 @@  ordering defined by it.
       ('symbol', '2')
       ('symbol', '0'))
     (func
-      ('symbol', '_list')
-      ('string', '0\x001\x002')))
+      ('symbol', '_unordered')
+      (func
+        ('symbol', '_list')
+        ('string', '0\x001\x002'))))
   * set:
-  <baseset [0, 1, 2]>
+  <filteredset
+    <spanset- 0:2>,
+    <baseset [0, 1, 2]>>
+  2
+  1
   0
-  1
-  2
- BROKEN: should be '2 1 0'
 
  'A + B' should take the ordering of the left expression:
 
@@ -1006,21 +1009,22 @@  ordering defined by it.
     (range
       ('symbol', '2')
       ('symbol', '0'))
-    (or
-      (range
-        ('symbol', '0')
-        ('symbol', '1'))
-      ('symbol', '2')))
+    (func
+      ('symbol', '_unordered')
+      (or
+        (range
+          ('symbol', '0')
+          ('symbol', '1'))
+        ('symbol', '2'))))
   * set:
-  <addset
-    <filteredset
+  <filteredset
+    <spanset- 0:2>,
+    <addset
       <spanset+ 0:1>,
-      <spanset- 0:2>>,
-    <baseset [2]>>
+      <baseset [2]>>>
+  2
+  1
   0
-  1
-  2
- BROKEN: should be '2 1 0'
 
  '_intlist(a b)' should behave like 'a + b':
 
@@ -1035,15 +1039,17 @@  ordering defined by it.
   * optimized:
   (and
     (func
-      ('symbol', '_intlist')
-      ('string', '0\x001\x002'))
+      ('symbol', '_unordered')
+      (func
+        ('symbol', '_intlist')
+        ('string', '0\x001\x002')))
     (range
       ('symbol', '2')
       ('symbol', '0')))
   * set:
   <filteredset
     <spanset- 0:2>,
-    <baseset [0, 1, 2]>>
+    <baseset+ [0, 1, 2]>>
   2
   1
   0
@@ -1089,14 +1095,17 @@  ordering defined by it.
       ('symbol', '2')
       ('symbol', '0'))
     (func
-      ('symbol', '_hexlist')
-      ('string', '*'))) (glob)
+      ('symbol', '_unordered')
+      (func
+        ('symbol', '_hexlist')
+        ('string', '*')))) (glob)
   * set:
-  <baseset [0, 1, 2]>
+  <filteredset
+    <spanset- 0:2>,
+    <baseset [0, 1, 2]>>
+  2
+  1
   0
-  1
-  2
- BROKEN: should be '2 1 0'
 
   $ trylist --optimize --bin '%ln & 2:0' `hg log -T '{node} ' -r0+2+1`
   (and
@@ -1120,6 +1129,67 @@  ordering defined by it.
   2
   1
 
+ '_unordered' should be omitted if order doesn't matter:
+
+  $ try --optimize '2:0 & not (0 + 1)'
+  (and
+    (range
+      ('symbol', '2')
+      ('symbol', '0'))
+    (not
+      (group
+        (or
+          ('symbol', '0')
+          ('symbol', '1')))))
+  * optimized:
+  (difference
+    (range
+      ('symbol', '2')
+      ('symbol', '0'))
+    (func
+      ('symbol', '_list')
+      ('string', '0\x001')))
+  * set:
+  <filteredset
+    <spanset- 0:2>,
+    <not
+      <baseset [0, 1]>>>
+  2
+
+  $ try --optimize '2:0 & not (0:2 & (0 + 1))'
+  (and
+    (range
+      ('symbol', '2')
+      ('symbol', '0'))
+    (not
+      (group
+        (and
+          (range
+            ('symbol', '0')
+            ('symbol', '2'))
+          (group
+            (or
+              ('symbol', '0')
+              ('symbol', '1')))))))
+  * optimized:
+  (difference
+    (range
+      ('symbol', '2')
+      ('symbol', '0'))
+    (and
+      (range
+        ('symbol', '0')
+        ('symbol', '2'))
+      (func
+        ('symbol', '_list')
+        ('string', '0\x001'))))
+  * set:
+  <filteredset
+    <spanset- 0:2>,
+    <not
+      <baseset [0, 1]>>>
+  2
+
  'present()' should do nothing other than suppressing an error:
 
   $ try --optimize '2:0 & present(0 + 1 + 2)'
@@ -1316,9 +1386,10 @@  ordering defined by it.
     <spanset- 0:2>>
   1
 
- 'A & B' can be rewritten as 'B & A' by weight, but the ordering rule should
- be determined before the optimization (i.e. 'B' should take the ordering of
- 'A'):
+ 'A & B' can be rewritten as 'B & A' by weight, but that's fine as long as
+ the ordering rule is determined before the rewrite; in this example,
+ 'B' follows the order of the initial set, which is the same order as 'A'
+ since 'A' also follows the order:
 
   $ try --optimize 'contains("glob:*") & (2 + 0 + 1)'
   (and
@@ -1333,19 +1404,23 @@  ordering defined by it.
   * optimized:
   (and
     (func
-      ('symbol', '_list')
-      ('string', '2\x000\x001'))
+      ('symbol', '_unordered')
+      (func
+        ('symbol', '_list')
+        ('string', '2\x000\x001')))
     (func
       ('symbol', 'contains')
       ('string', 'glob:*')))
   * set:
   <filteredset
-    <baseset [2, 0, 1]>,
+    <baseset+ [0, 1, 2]>,
     <contains 'glob:*'>>
-  2
   0
   1
- BROKEN: should be '0 1 2'
+  2
+
+ and in this example, 'A & B' is rewritten as 'B & A', but 'A' overrides
+ the order appropriately:
 
   $ try --optimize 'reverse(contains("glob:*")) & (0 + 2 + 1)'
   (and
@@ -1362,8 +1437,10 @@  ordering defined by it.
   * optimized:
   (and
     (func
-      ('symbol', '_list')
-      ('string', '0\x002\x001'))
+      ('symbol', '_unordered')
+      (func
+        ('symbol', '_list')
+        ('string', '0\x002\x001')))
     (func
       ('symbol', 'reverse')
       (func
@@ -1371,12 +1448,11 @@  ordering defined by it.
         ('string', 'glob:*'))))
   * set:
   <filteredset
-    <baseset [1, 2, 0]>,
+    <baseset- [0, 1, 2]>,
     <contains 'glob:*'>>
+  2
   1
-  2
   0
- BROKEN: should be '2 1 0'
 
 test sort revset
 --------------------------------------------