Patchwork revset: add experimental set subscript operator

login
register
mail settings
Submitter Yuya Nishihara
Date June 30, 2017, 2:58 p.m.
Message ID <b7f6740b0e58c468ecc8.1498834723@mimosa>
Download mbox | patch
Permalink /patch/21849/
State Accepted
Headers show

Comments

Yuya Nishihara - June 30, 2017, 2:58 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1498302078 -32400
#      Sat Jun 24 20:01:18 2017 +0900
# Node ID b7f6740b0e58c468ecc82296fe1ee0800a7e0582
# Parent  fe3419b41a456661a3d35ae963d06a203d4446a2
revset: add experimental set subscript operator

It only supports integer indices as a beginning. See the following post
for further ideas.

https://www.mercurial-scm.org/wiki/RevsetOperatorPlan#ideas_from_mpm

I'm not sure if the current semantics are the most useful and extendable.
We might want the way to select the nth ancestors/descendants in sub graph,
by e.g. 'follow(filename){-2}', which is completely different so the '{n}'
operator wouldn't be usable.
Jun Wu - July 1, 2017, 12:08 a.m.
Excerpts from Yuya Nishihara's message of 2017-06-30 23:58:43 +0900:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1498302078 -32400
> #      Sat Jun 24 20:01:18 2017 +0900
> # Node ID b7f6740b0e58c468ecc82296fe1ee0800a7e0582
> # Parent  fe3419b41a456661a3d35ae963d06a203d4446a2
> revset: add experimental set subscript operator
> 
> It only supports integer indices as a beginning. See the following post
> for further ideas.
> 
> https://www.mercurial-scm.org/wiki/RevsetOperatorPlan#ideas_from_mpm 

Glad to see progress on new operators! The implementation looks good to me.

Maybe this is unnecessary bike-shedding, but I feel "[]" is more natural for
"indexing" purpose.  "{}" is used in templates which might be confusing.

Templates may seem unrelated, but I think it could be embedded in revsets in
the future.  That enables more flexible queries, like "commits whose author
is the same as (antoher revset)'s author".  Template strings need some
function to "bind" to a ctx, like: author(template(REV, "{author}")).

> I'm not sure if the current semantics are the most useful and extendable.
> We might want the way to select the nth ancestors/descendants in sub graph,
> by e.g. 'follow(filename){-2}', which is completely different so the '{n}'
> operator wouldn't be usable.
> 
> diff --git a/mercurial/revsetlang.py b/mercurial/revsetlang.py
> --- a/mercurial/revsetlang.py
> +++ b/mercurial/revsetlang.py
> @@ -21,6 +21,7 @@ from . import (
>  elements = {
>      # token-type: binding-strength, primary, prefix, infix, suffix
>      "(": (21, None, ("group", 1, ")"), ("func", 1, ")"), None),
> +    "{": (21, None, None, ("subscript", 1, "}"), None),
>      "##": (20, None, None, ("_concat", 20), None),
>      "~": (18, None, None, ("ancestor", 18), None),
>      "^": (18, None, None, ("parent", 18), "parentpost"),
> @@ -39,6 +40,7 @@ elements = {
>      "=": (3, None, None, ("keyvalue", 3), None),
>      ",": (2, None, None, ("list", 2), None),
>      ")": (0, None, None, None, None),
> +    "}": (0, None, None, None, None),
>      "symbol": (0, "symbol", None, None, None),
>      "string": (0, "string", None, None, None),
>      "end": (0, None, None, None, None),
> @@ -47,7 +49,7 @@ elements = {
>  keywords = {'and', 'or', 'not'}
>  
>  _quoteletters = {'"', "'"}
> -_simpleopletters = set(pycompat.iterbytestr("():=,-|&+!~^%"))
> +_simpleopletters = set(pycompat.iterbytestr("(){}:=,-|&+!~^%"))
>  
>  # default set of valid characters for the initial letter of symbols
>  _syminitletters = set(pycompat.iterbytestr(
> @@ -310,6 +312,18 @@ def _matchonly(revs, bases):
>      if _isposargs(ta, 1) and _isposargs(tb, 1):
>          return ('list', ta, tb)
>  
> +def _transformsubscript(x, y):
> +    """Rewrite 'x{y}' operator to evaluatable tree"""
> +    # this is pretty basic implementation of 'x{y}' operator, still
> +    # experimental so undocumented. see the wiki for further ideas.
> +    # https://www.mercurial-scm.org/wiki/RevsetOperatorPlan#ideas_from_mpm 
> +    n = getinteger(y, _("set subscript must be an integer"))
> +    if n <= 0:
> +        y = ('symbol', '%s' % -n)
> +        return ('func', ('symbol', 'ancestors'), ('list', x, y, y))
> +    else:
> +        return ('func', ('symbol', 'descendants'), ('list', x, y, y))
> +
>  def _fixops(x):
>      """Rewrite raw parsed tree to resolve ambiguous syntax which cannot be
>      handled well by our simple top-down parser"""
> @@ -377,6 +391,9 @@ def _analyze(x, order):
>          return (op,) + tuple(_analyze(y, order) for y in x[1:])
>      elif op == 'keyvalue':
>          return (op, x[1], _analyze(x[2], order))
> +    elif op == 'subscript':
> +        t = _transformsubscript(x[1], _analyze(x[2], order))
> +        return _analyze(t, order)
>      elif op == 'func':
>          f = getsymbol(x[1])
>          d = defineorder
> diff --git a/tests/test-revset.t b/tests/test-revset.t
> --- a/tests/test-revset.t
> +++ b/tests/test-revset.t
> @@ -499,6 +499,145 @@ keyword arguments
>    hg: parse error: can't use a key-value pair in this context
>    [255]
>  
> +subscript operator:
> +
> +  $ hg debugrevspec -p parsed -p analyzed 'tip{0}'
> +  * parsed:
> +  (subscript
> +    ('symbol', 'tip')
> +    ('symbol', '0'))
> +  * analyzed:
> +  (func
> +    ('symbol', 'ancestors')
> +    (list
> +      ('symbol', 'tip')
> +      ('symbol', '0')
> +      ('symbol', '0'))
> +    define)
> +  9
> +
> +  $ hg debugrevspec -p parsed -p analyzed '.{-1}'
> +  * parsed:
> +  (subscript
> +    ('symbol', '.')
> +    (negate
> +      ('symbol', '1')))
> +  * analyzed:
> +  (func
> +    ('symbol', 'ancestors')
> +    (list
> +      ('symbol', '.')
> +      ('symbol', '1')
> +      ('symbol', '1'))
> +    define)
> +  8
> +
> +  $ hg debugrevspec -p parsed -p analyzed 'roots(:){2}'
> +  * parsed:
> +  (subscript
> +    (func
> +      ('symbol', 'roots')
> +      (rangeall
> +        None))
> +    ('symbol', '2'))
> +  * analyzed:
> +  (func
> +    ('symbol', 'descendants')
> +    (list
> +      (func
> +        ('symbol', 'roots')
> +        (rangeall
> +          None
> +          define)
> +        define)
> +      ('symbol', '2')
> +      ('symbol', '2'))
> +    define)
> +  2
> +  3
> +
> + has the highest binding strength (as function call)
> +
> +  $ hg debugrevspec -p parsed 'tip:tip^{-1}'
> +  * parsed:
> +  (range
> +    ('symbol', 'tip')
> +    (subscript
> +      (parentpost
> +        ('symbol', 'tip'))
> +      (negate
> +        ('symbol', '1'))))
> +  9
> +  8
> +  7
> +  6
> +  5
> +  4
> +
> +  $ hg debugrevspec -p parsed --no-show-revs 'not public(){0}'
> +  * parsed:
> +  (not
> +    (subscript
> +      (func
> +        ('symbol', 'public')
> +        None)
> +      ('symbol', '0')))
> +
> + left-hand side should be optimized recursively
> +
> +  $ hg debugrevspec -p parsed -p analyzed -p optimized --no-show-revs \
> +  > '(not public()){0}'
> +  * parsed:
> +  (subscript
> +    (group
> +      (not
> +        (func
> +          ('symbol', 'public')
> +          None)))
> +    ('symbol', '0'))
> +  * analyzed:
> +  (func
> +    ('symbol', 'ancestors')
> +    (list
> +      (not
> +        (func
> +          ('symbol', 'public')
> +          None
> +          any)
> +        define)
> +      ('symbol', '0')
> +      ('symbol', '0'))
> +    define)
> +  * optimized:
> +  (func
> +    ('symbol', 'ancestors')
> +    (list
> +      (func
> +        ('symbol', '_notpublic')
> +        None
> +        any)
> +      ('symbol', '0')
> +      ('symbol', '0'))
> +    define)
> +
> + parse errors
> +
> +  $ hg debugrevspec '{0}'
> +  hg: parse error at 0: not a prefix: {
> +  [255]
> +  $ hg debugrevspec '.{0'
> +  hg: parse error at 3: unexpected token: end
> +  [255]
> +  $ hg debugrevspec '.}'
> +  hg: parse error at 1: invalid token
> +  [255]
> +  $ hg debugrevspec '.{a}'
> +  hg: parse error: set subscript must be an integer
> +  [255]
> +  $ hg debugrevspec '.{1-2}'
> +  hg: parse error: set subscript must be an integer
> +  [255]
> +
>  parsed tree at stages:
>  
>    $ hg debugrevspec -p all '()'
Matt Harbison - July 1, 2017, 1:42 a.m.
On Fri, 30 Jun 2017 20:08:31 -0400, Jun Wu <quark@fb.com> wrote:

> Excerpts from Yuya Nishihara's message of 2017-06-30 23:58:43 +0900:
>> # HG changeset patch
>> # User Yuya Nishihara <yuya@tcha.org>
>> # Date 1498302078 -32400
>> #      Sat Jun 24 20:01:18 2017 +0900
>> # Node ID b7f6740b0e58c468ecc82296fe1ee0800a7e0582
>> # Parent  fe3419b41a456661a3d35ae963d06a203d4446a2
>> revset: add experimental set subscript operator
>>
>> It only supports integer indices as a beginning. See the following post
>> for further ideas.
>>
>> https://www.mercurial-scm.org/wiki/RevsetOperatorPlan#ideas_from_mpm
>
> Glad to see progress on new operators! The implementation looks good to  
> me.
>
> Maybe this is unnecessary bike-shedding, but I feel "[]" is more natural  
> for
> "indexing" purpose.  "{}" is used in templates which might be confusing.

That was my first thought too, but then I realized that I misunderstood  
the purpose of this.  "Subscript/indexing" makes me think something like  
'parents(.)[1] == p2(.)'.  But:

$ ../hg log -r 'parents(.){0}' -T "{rev}:{node|short}\n"
36963:578172de2e38
$ ../hg log -r 'parents(.){1}' -T "{rev}:{node|short}\n"
36964:c9925ed4a5f4

... which follows the linked message, now that I re-read it.  IDK if that  
kind of indexing is useful for revsets, but it would be for template  
lists.  I'd rather use [] for that, but Yuya commented about that here:

https://bz.mercurial-scm.org/show_bug.cgi?id=5534

> Templates may seem unrelated, but I think it could be embedded in  
> revsets in
> the future.  That enables more flexible queries, like "commits whose  
> author
> is the same as (antoher revset)'s author".  Template strings need some
> function to "bind" to a ctx, like: author(template(REV, "{author}")).
>
>> I'm not sure if the current semantics are the most useful and  
>> extendable.
>> We might want the way to select the nth ancestors/descendants in sub  
>> graph,
>> by e.g. 'follow(filename){-2}', which is completely different so the  
>> '{n}'
>> operator wouldn't be usable.
>>
Yuya Nishihara - July 1, 2017, 2:04 a.m.
On Fri, 30 Jun 2017 21:42:49 -0400, Matt Harbison wrote:
> On Fri, 30 Jun 2017 20:08:31 -0400, Jun Wu <quark@fb.com> wrote:
> 
> > Excerpts from Yuya Nishihara's message of 2017-06-30 23:58:43 +0900:
> >> # HG changeset patch
> >> # User Yuya Nishihara <yuya@tcha.org>
> >> # Date 1498302078 -32400
> >> #      Sat Jun 24 20:01:18 2017 +0900
> >> # Node ID b7f6740b0e58c468ecc82296fe1ee0800a7e0582
> >> # Parent  fe3419b41a456661a3d35ae963d06a203d4446a2
> >> revset: add experimental set subscript operator
> >>
> >> It only supports integer indices as a beginning. See the following post
> >> for further ideas.
> >>
> >> https://www.mercurial-scm.org/wiki/RevsetOperatorPlan#ideas_from_mpm
> >
> > Glad to see progress on new operators! The implementation looks good to  
> > me.
> >
> > Maybe this is unnecessary bike-shedding, but I feel "[]" is more natural  
> > for
> > "indexing" purpose.  "{}" is used in templates which might be confusing.

Yep. That's why I slightly prefer not using "{}".

  $ hg log -T '{revset(".{0}")}\n'
                         ^^^
                   this is not a revset operator, but expanded to ''

> That was my first thought too, but then I realized that I misunderstood  
> the purpose of this.  "Subscript/indexing" makes me think something like  
> 'parents(.)[1] == p2(.)'.  But:
> 
> $ ../hg log -r 'parents(.){0}' -T "{rev}:{node|short}\n"
> 36963:578172de2e38
> $ ../hg log -r 'parents(.){1}' -T "{rev}:{node|short}\n"
> 36964:c9925ed4a5f4
> 
> ... which follows the linked message, now that I re-read it.

That's mpm's point why he chose {} over [].

http://markmail.org/message/sjnnwa43s4eksu62

FWIW, the use of [] can be justified by considering x[n] as a sort of
a pointer arithmetic in DAG structure.

We need APL keyboard.

> IDK if that
> kind of indexing is useful for revsets, but it would be for template  
> lists.  I'd rather use [] for that, but Yuya commented about that here:
> 
> https://bz.mercurial-scm.org/show_bug.cgi?id=5534
> 
> > Templates may seem unrelated, but I think it could be embedded in  
> > revsets in
> > the future.  That enables more flexible queries, like "commits whose  
> > author
> > is the same as (antoher revset)'s author".  Template strings need some
> > function to "bind" to a ctx, like: author(template(REV, "{author}")).
> >
> >> I'm not sure if the current semantics are the most useful and  
> >> extendable.
> >> We might want the way to select the nth ancestors/descendants in sub  
> >> graph,
> >> by e.g. 'follow(filename){-2}', which is completely different so the  
> >> '{n}'
> >> operator wouldn't be usable.
Sean Farley - July 1, 2017, 11:33 p.m.
Yuya Nishihara <yuya@tcha.org> writes:

> On Fri, 30 Jun 2017 21:42:49 -0400, Matt Harbison wrote:
>> On Fri, 30 Jun 2017 20:08:31 -0400, Jun Wu <quark@fb.com> wrote:
>>
>> > Excerpts from Yuya Nishihara's message of 2017-06-30 23:58:43 +0900:
>> >> # HG changeset patch
>> >> # User Yuya Nishihara <yuya@tcha.org>
>> >> # Date 1498302078 -32400
>> >> #      Sat Jun 24 20:01:18 2017 +0900
>> >> # Node ID b7f6740b0e58c468ecc82296fe1ee0800a7e0582
>> >> # Parent  fe3419b41a456661a3d35ae963d06a203d4446a2
>> >> revset: add experimental set subscript operator
>> >>
>> >> It only supports integer indices as a beginning. See the following post
>> >> for further ideas.
>> >>
>> >> https://www.mercurial-scm.org/wiki/RevsetOperatorPlan#ideas_from_mpm
>> >
>> > Glad to see progress on new operators! The implementation looks good to
>> > me.
>> >
>> > Maybe this is unnecessary bike-shedding, but I feel "[]" is more natural
>> > for
>> > "indexing" purpose.  "{}" is used in templates which might be confusing.
>
> Yep. That's why I slightly prefer not using "{}".
>
>   $ hg log -T '{revset(".{0}")}\n'
>                          ^^^
>                    this is not a revset operator, but expanded to ''

Aw man. Yeah, I have the same feeling as you (but have no better ideas
at the moment).

>> That was my first thought too, but then I realized that I misunderstood
>> the purpose of this.  "Subscript/indexing" makes me think something like
>> 'parents(.)[1] == p2(.)'.  But:
>>
>> $ ../hg log -r 'parents(.){0}' -T "{rev}:{node|short}\n"
>> 36963:578172de2e38
>> $ ../hg log -r 'parents(.){1}' -T "{rev}:{node|short}\n"
>> 36964:c9925ed4a5f4
>>
>> ... which follows the linked message, now that I re-read it.

I like separating {} and [] so that foo{4}[1] is the 4th level of
descendants of foo and the [1] would be the index in that list (ordered
by rev num?).

> That's mpm's point why he chose {} over [].
>
> http://markmail.org/message/sjnnwa43s4eksu62
>
> FWIW, the use of [] can be justified by considering x[n] as a sort of
> a pointer arithmetic in DAG structure.
>
> We need APL keyboard.

So, so true.
Jun Wu - July 7, 2017, 1:31 a.m.
Excerpts from Sean Farley's message of 2017-07-01 16:33:47 -0700:
> > Yep. That's why I slightly prefer not using "{}".
> >
> >   $ hg log -T '{revset(".{0}")}\n'
> >                          ^^^
> >                    this is not a revset operator, but expanded to ''
> 
> I like separating {} and [] so that foo{4}[1] is the 4th level of
> descendants of foo and the [1] would be the index in that list (ordered
> by rev num?).

Could it be foo[+4][1] meaning what you want? An explicit "+" means it's an
offset, the "1" without a "+" means it's a normal index.

> > That's mpm's point why he chose {} over [].
> >
> > http://markmail.org/message/sjnnwa43s4eksu62
> >
> > FWIW, the use of [] can be justified by considering x[n] as a sort of
> > a pointer arithmetic in DAG structure.
> >
> > We need APL keyboard.
> 
> So, so true.
Sean Farley - July 7, 2017, 2:04 a.m.
Jun Wu <quark@fb.com> writes:

> Excerpts from Sean Farley's message of 2017-07-01 16:33:47 -0700:
>> > Yep. That's why I slightly prefer not using "{}".
>> >
>> >   $ hg log -T '{revset(".{0}")}\n'
>> >                          ^^^
>> >                    this is not a revset operator, but expanded to ''
>> 
>> I like separating {} and [] so that foo{4}[1] is the 4th level of
>> descendants of foo and the [1] would be the index in that list (ordered
>> by rev num?).
>
> Could it be foo[+4][1] meaning what you want? An explicit "+" means it's an
> offset, the "1" without a "+" means it's a normal index.

Hmmm, I'll have to mull on that. My gut feeling is that it might be too
magical. Though, we're severely limited on characters so it might be the
least-worst option.
Yuya Nishihara - July 7, 2017, 12:25 p.m.
On Thu, 06 Jul 2017 19:04:48 -0700, Sean Farley wrote:
> Jun Wu <quark@fb.com> writes:
> > Excerpts from Sean Farley's message of 2017-07-01 16:33:47 -0700:
> >> > Yep. That's why I slightly prefer not using "{}".
> >> >
> >> >   $ hg log -T '{revset(".{0}")}\n'
> >> >                          ^^^
> >> >                    this is not a revset operator, but expanded to ''
> >> 
> >> I like separating {} and [] so that foo{4}[1] is the 4th level of
> >> descendants of foo and the [1] would be the index in that list (ordered
> >> by rev num?).
> >
> > Could it be foo[+4][1] meaning what you want? An explicit "+" means it's an
> > offset, the "1" without a "+" means it's a normal index.
> 
> Hmmm, I'll have to mull on that. My gut feeling is that it might be too
> magical. Though, we're severely limited on characters so it might be the
> least-worst option.

It's confusing. You might think foo[-1] will return the last element in foo.

If we don't want to use {} but also want to reserve [] for future use, we'll
need a completely different operator, e.g. foo[[4]], foo~[4] foo#[4].
Jun Wu - July 7, 2017, 3:26 p.m.
Excerpts from Yuya Nishihara's message of 2017-07-07 21:25:17 +0900:
> On Thu, 06 Jul 2017 19:04:48 -0700, Sean Farley wrote:
> > Jun Wu <quark@fb.com> writes:
> > > Could it be foo[+4][1] meaning what you want? An explicit "+" means
> > > it's an offset, the "1" without a "+" means it's a normal index.
> > 
> > Hmmm, I'll have to mull on that. My gut feeling is that it might be too
> > magical. Though, we're severely limited on characters so it might be the
> > least-worst option.
> 
> It's confusing. You might think foo[-1] will return the last element in
> foo.
> 
> If we don't want to use {} but also want to reserve [] for future use,
> we'll need a completely different operator, e.g. foo[[4]], foo~[4]
> foo#[4].

Think it a bit more, they look similar: given some order information (a
list, changelog DAG, or obsolete graph), do indexing, or slicing.

It seems the missing bit here is to specify which "order" to use. Suppose
the operator is "[]", that "order" information may be provided inside the
operator ("[1o]", "[-2g]", like what mpm said), or at the operator itself
(use different operators), or outside the operator.

Inside operator may create some syntax inconsistent with other languages.
Different operators may exhaust our operators.

I'm currently leaning towards "outside the operator". That could make "[]"
closer to existing language. It also looks flexible and clean.  For example,
use "#ordertype", like:

  x#changelog[3]:  x~-3 (approximately)
  x#changelog[-3]: x~3 (approximately)
  x#changelog[0:3]: x + x~-1 + x~-2 + x~-3
  x#changelog[1:]: descendants(x) - x

  x#changeloglinear[-3]: x~3 (only consider p1 to make history linear)
  x#changeloglinear[:0]: _firstancestors(x)

  x#obsolete[1]: successors(x)
  x#obsolete[1:]: allsuccessors(x) - x 
  x#obsolete[:]: allprecursors(x) + allsuccessors(x)

  x#plainlist[1]: pick the second element in the list
  x#plainlist[-1]: pick the last element in the list
  x#plainlist[1:3]: pick a slice in the list

We can specify a default order, like "#changelog" or "#plainlist".

Note: the names (changeloglinear etc.) here are just examples, not final.
Yuya Nishihara - July 7, 2017, 4:11 p.m.
On Fri, 7 Jul 2017 08:26:53 -0700, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-07-07 21:25:17 +0900:
> > On Thu, 06 Jul 2017 19:04:48 -0700, Sean Farley wrote:
> > > Jun Wu <quark@fb.com> writes:
> > > > Could it be foo[+4][1] meaning what you want? An explicit "+" means
> > > > it's an offset, the "1" without a "+" means it's a normal index.
> > > 
> > > Hmmm, I'll have to mull on that. My gut feeling is that it might be too
> > > magical. Though, we're severely limited on characters so it might be the
> > > least-worst option.
> > 
> > It's confusing. You might think foo[-1] will return the last element in
> > foo.
> > 
> > If we don't want to use {} but also want to reserve [] for future use,
> > we'll need a completely different operator, e.g. foo[[4]], foo~[4]
> > foo#[4].
> 
> Think it a bit more, they look similar: given some order information (a
> list, changelog DAG, or obsolete graph), do indexing, or slicing.

Just a nitpick. They are quite different operations if we take the #op
as an order specifier.

 x[y] (indexing) selects the y-th element from x (inside x)
 x{y} goes to the y-th elements from x (outside x)

But, ...

> It seems the missing bit here is to specify which "order" to use. Suppose
> the operator is "[]", that "order" information may be provided inside the
> operator ("[1o]", "[-2g]", like what mpm said), or at the operator itself
> (use different operators), or outside the operator.
> 
> Inside operator may create some syntax inconsistent with other languages.
> Different operators may exhaust our operators.
> 
> I'm currently leaning towards "outside the operator". That could make "[]"
> closer to existing language. It also looks flexible and clean.  For example,
> use "#ordertype", like:
> 
>   x#changelog[3]:  x~-3 (approximately)
>   x#changelog[-3]: x~3 (approximately)
>   x#changelog[0:3]: x + x~-1 + x~-2 + x~-3
>   x#changelog[1:]: descendants(x) - x
> 
>   x#changeloglinear[-3]: x~3 (only consider p1 to make history linear)
>   x#changeloglinear[:0]: _firstancestors(x)
> 
>   x#obsolete[1]: successors(x)
>   x#obsolete[1:]: allsuccessors(x) - x 
>   x#obsolete[:]: allprecursors(x) + allsuccessors(x)
> 
>   x#plainlist[1]: pick the second element in the list
>   x#plainlist[-1]: pick the last element in the list
>   x#plainlist[1:3]: pick a slice in the list
> 
> We can specify a default order, like "#changelog" or "#plainlist".

We could read 'x#op' as 'x->op()' in C++, so 'x#changelog' could be considered
'ancestors(x) + descendants(x)' operation with depth index.

Perhaps, a drawback of this idea is that the syntax is verbose, and parsing
will be a bit complex.
Jun Wu - July 7, 2017, 5:43 p.m.
Excerpts from Yuya Nishihara's message of 2017-07-08 01:11:50 +0900:
> Just a nitpick. They are quite different operations if we take the #op
> as an order specifier.
> 
>  x[y] (indexing) selects the y-th element from x (inside x)
>  x{y} goes to the y-th elements from x (outside x)
> 
> But, ...

Good catch. Then it makes sense to use different operators.

I wonder if we have some real world examples of "y-th element from x". I
couldn't really think of good convincing examples so it might be an
operation we don't provide intentionally.

> We could read 'x#op' as 'x->op()' in C++, so 'x#changelog' could be considered
> 'ancestors(x) + descendants(x)' operation with depth index.
>
> Perhaps, a drawback of this idea is that the syntax is verbose, and parsing
> will be a bit complex.

Maybe allow them to be shorter, like "#c", "#l", "#o". Then it is only one
character longer than "[1c]", "[1l]", "[1o]". I think the latter also looks
good. The key difference would be whether we want "[]" to be like existing
language or not.

Slightly off-topic, I wonder if we can make "#o", "#l" or "#c" without "[]"
something useful, like "hg log -r 'x#o'" could imply "using the obsolete
graph, "hg log -rf 'x#l'" means "using the linear graph". It feels like an
extra bit of information similar to "istopo". I have been thinking about
"log --dag DAGTYPE". This might be a way to embed the flag into revset, just
like other log flags.
Yuya Nishihara - July 8, 2017, 6:29 a.m.
On Fri, 7 Jul 2017 10:43:39 -0700, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2017-07-08 01:11:50 +0900:
> > Just a nitpick. They are quite different operations if we take the #op
> > as an order specifier.
> > 
> >  x[y] (indexing) selects the y-th element from x (inside x)
> >  x{y} goes to the y-th elements from x (outside x)
> > 
> > But, ...
> 
> Good catch. Then it makes sense to use different operators.
> 
> I wonder if we have some real world examples of "y-th element from x". I
> couldn't really think of good convincing examples so it might be an
> operation we don't provide intentionally.

I don't think of any practical example. first() and last() should be good
enough for most cases.

> > We could read 'x#op' as 'x->op()' in C++, so 'x#changelog' could be considered
> > 'ancestors(x) + descendants(x)' operation with depth index.
> >
> > Perhaps, a drawback of this idea is that the syntax is verbose, and parsing
> > will be a bit complex.
> 
> Maybe allow them to be shorter, like "#c", "#l", "#o". Then it is only one
> character longer than "[1c]", "[1l]", "[1o]". I think the latter also looks
> good. The key difference would be whether we want "[]" to be like existing
> language or not.
> 
> Slightly off-topic, I wonder if we can make "#o", "#l" or "#c" without "[]"
> something useful, like "hg log -r 'x#o'" could imply "using the obsolete
> graph, "hg log -rf 'x#l'" means "using the linear graph". It feels like an
> extra bit of information similar to "istopo". I have been thinking about
> "log --dag DAGTYPE". This might be a way to embed the flag into revset, just
> like other log flags.

No idea how confusing that would be, but it seems interesting. Anyway, if
we introduce 'x#o[n]' ternary, 'x#o' should be supported in some way.
Yuya Nishihara - July 12, 2017, 3:50 p.m.
On Fri, 7 Jul 2017 08:26:53 -0700, Jun Wu wrote:
> I'm currently leaning towards "outside the operator". That could make "[]"
> closer to existing language. It also looks flexible and clean.  For example,
> use "#ordertype", like:
> 
>   x#changelog[3]:  x~-3 (approximately)
>   x#changelog[-3]: x~3 (approximately)
>   x#changelog[0:3]: x + x~-1 + x~-2 + x~-3
>   x#changelog[1:]: descendants(x) - x
> 
>   x#changeloglinear[-3]: x~3 (only consider p1 to make history linear)
>   x#changeloglinear[:0]: _firstancestors(x)
> 
>   x#obsolete[1]: successors(x)
>   x#obsolete[1:]: allsuccessors(x) - x 
>   x#obsolete[:]: allprecursors(x) + allsuccessors(x)
> 
>   x#plainlist[1]: pick the second element in the list
>   x#plainlist[-1]: pick the last element in the list
>   x#plainlist[1:3]: pick a slice in the list

Do we want this experimental feature in 4.3? The patches are mostly ready,
but this is completely new operator, so I'm not sure.

> Note: the names (changeloglinear etc.) here are just examples, not final.

It's called as 'x#generations[n]' in my draft patch, could be abbreviated
as 'x#g[n]' (and also remembered as 'g' for 'graph'.)
Sean Farley - July 12, 2017, 6:30 p.m.
Yuya Nishihara <yuya@tcha.org> writes:

> On Fri, 7 Jul 2017 08:26:53 -0700, Jun Wu wrote:
>> I'm currently leaning towards "outside the operator". That could make "[]"
>> closer to existing language. It also looks flexible and clean.  For example,
>> use "#ordertype", like:
>> 
>>   x#changelog[3]:  x~-3 (approximately)
>>   x#changelog[-3]: x~3 (approximately)
>>   x#changelog[0:3]: x + x~-1 + x~-2 + x~-3
>>   x#changelog[1:]: descendants(x) - x
>> 
>>   x#changeloglinear[-3]: x~3 (only consider p1 to make history linear)
>>   x#changeloglinear[:0]: _firstancestors(x)
>> 
>>   x#obsolete[1]: successors(x)
>>   x#obsolete[1:]: allsuccessors(x) - x 
>>   x#obsolete[:]: allprecursors(x) + allsuccessors(x)
>> 
>>   x#plainlist[1]: pick the second element in the list
>>   x#plainlist[-1]: pick the last element in the list
>>   x#plainlist[1:3]: pick a slice in the list
>
> Do we want this experimental feature in 4.3? The patches are mostly ready,
> but this is completely new operator, so I'm not sure.
>
>> Note: the names (changeloglinear etc.) here are just examples, not final.
>
> It's called as 'x#generations[n]' in my draft patch, could be abbreviated
> as 'x#g[n]' (and also remembered as 'g' for 'graph'.)

Is it worth putting behind a flag? Or can undocumented things (e.g.
wdir()) be "hidden" and changed later?
Jun Wu - July 12, 2017, 6:38 p.m.
Excerpts from Yuya Nishihara's message of 2017-07-13 00:50:33 +0900:
> Do we want this experimental feature in 4.3? The patches are mostly ready,
> but this is completely new operator, so I'm not sure.
> 
> > Note: the names (changeloglinear etc.) here are just examples, not final.
> 
> It's called as 'x#generations[n]' in my draft patch, could be abbreviated
> as 'x#g[n]' (and also remembered as 'g' for 'graph'.)

I'd like to try it as long as we can change them in the future.
Sean Farley - July 12, 2017, 6:48 p.m.
Jun Wu <quark@fb.com> writes:

> Excerpts from Yuya Nishihara's message of 2017-07-13 00:50:33 +0900:
>> Do we want this experimental feature in 4.3? The patches are mostly ready,
>> but this is completely new operator, so I'm not sure.
>> 
>> > Note: the names (changeloglinear etc.) here are just examples, not final.
>> 
>> It's called as 'x#generations[n]' in my draft patch, could be abbreviated
>> as 'x#g[n]' (and also remembered as 'g' for 'graph'.)
>
> I'd like to try it as long as we can change them in the future.

Yeah, I agree. Yuya, let's not document this and try to land as an
experiment. Sound good to others?

Patch

diff --git a/mercurial/revsetlang.py b/mercurial/revsetlang.py
--- a/mercurial/revsetlang.py
+++ b/mercurial/revsetlang.py
@@ -21,6 +21,7 @@  from . import (
 elements = {
     # token-type: binding-strength, primary, prefix, infix, suffix
     "(": (21, None, ("group", 1, ")"), ("func", 1, ")"), None),
+    "{": (21, None, None, ("subscript", 1, "}"), None),
     "##": (20, None, None, ("_concat", 20), None),
     "~": (18, None, None, ("ancestor", 18), None),
     "^": (18, None, None, ("parent", 18), "parentpost"),
@@ -39,6 +40,7 @@  elements = {
     "=": (3, None, None, ("keyvalue", 3), None),
     ",": (2, None, None, ("list", 2), None),
     ")": (0, None, None, None, None),
+    "}": (0, None, None, None, None),
     "symbol": (0, "symbol", None, None, None),
     "string": (0, "string", None, None, None),
     "end": (0, None, None, None, None),
@@ -47,7 +49,7 @@  elements = {
 keywords = {'and', 'or', 'not'}
 
 _quoteletters = {'"', "'"}
-_simpleopletters = set(pycompat.iterbytestr("():=,-|&+!~^%"))
+_simpleopletters = set(pycompat.iterbytestr("(){}:=,-|&+!~^%"))
 
 # default set of valid characters for the initial letter of symbols
 _syminitletters = set(pycompat.iterbytestr(
@@ -310,6 +312,18 @@  def _matchonly(revs, bases):
     if _isposargs(ta, 1) and _isposargs(tb, 1):
         return ('list', ta, tb)
 
+def _transformsubscript(x, y):
+    """Rewrite 'x{y}' operator to evaluatable tree"""
+    # this is pretty basic implementation of 'x{y}' operator, still
+    # experimental so undocumented. see the wiki for further ideas.
+    # https://www.mercurial-scm.org/wiki/RevsetOperatorPlan#ideas_from_mpm
+    n = getinteger(y, _("set subscript must be an integer"))
+    if n <= 0:
+        y = ('symbol', '%s' % -n)
+        return ('func', ('symbol', 'ancestors'), ('list', x, y, y))
+    else:
+        return ('func', ('symbol', 'descendants'), ('list', x, y, y))
+
 def _fixops(x):
     """Rewrite raw parsed tree to resolve ambiguous syntax which cannot be
     handled well by our simple top-down parser"""
@@ -377,6 +391,9 @@  def _analyze(x, order):
         return (op,) + tuple(_analyze(y, order) for y in x[1:])
     elif op == 'keyvalue':
         return (op, x[1], _analyze(x[2], order))
+    elif op == 'subscript':
+        t = _transformsubscript(x[1], _analyze(x[2], order))
+        return _analyze(t, order)
     elif op == 'func':
         f = getsymbol(x[1])
         d = defineorder
diff --git a/tests/test-revset.t b/tests/test-revset.t
--- a/tests/test-revset.t
+++ b/tests/test-revset.t
@@ -499,6 +499,145 @@  keyword arguments
   hg: parse error: can't use a key-value pair in this context
   [255]
 
+subscript operator:
+
+  $ hg debugrevspec -p parsed -p analyzed 'tip{0}'
+  * parsed:
+  (subscript
+    ('symbol', 'tip')
+    ('symbol', '0'))
+  * analyzed:
+  (func
+    ('symbol', 'ancestors')
+    (list
+      ('symbol', 'tip')
+      ('symbol', '0')
+      ('symbol', '0'))
+    define)
+  9
+
+  $ hg debugrevspec -p parsed -p analyzed '.{-1}'
+  * parsed:
+  (subscript
+    ('symbol', '.')
+    (negate
+      ('symbol', '1')))
+  * analyzed:
+  (func
+    ('symbol', 'ancestors')
+    (list
+      ('symbol', '.')
+      ('symbol', '1')
+      ('symbol', '1'))
+    define)
+  8
+
+  $ hg debugrevspec -p parsed -p analyzed 'roots(:){2}'
+  * parsed:
+  (subscript
+    (func
+      ('symbol', 'roots')
+      (rangeall
+        None))
+    ('symbol', '2'))
+  * analyzed:
+  (func
+    ('symbol', 'descendants')
+    (list
+      (func
+        ('symbol', 'roots')
+        (rangeall
+          None
+          define)
+        define)
+      ('symbol', '2')
+      ('symbol', '2'))
+    define)
+  2
+  3
+
+ has the highest binding strength (as function call)
+
+  $ hg debugrevspec -p parsed 'tip:tip^{-1}'
+  * parsed:
+  (range
+    ('symbol', 'tip')
+    (subscript
+      (parentpost
+        ('symbol', 'tip'))
+      (negate
+        ('symbol', '1'))))
+  9
+  8
+  7
+  6
+  5
+  4
+
+  $ hg debugrevspec -p parsed --no-show-revs 'not public(){0}'
+  * parsed:
+  (not
+    (subscript
+      (func
+        ('symbol', 'public')
+        None)
+      ('symbol', '0')))
+
+ left-hand side should be optimized recursively
+
+  $ hg debugrevspec -p parsed -p analyzed -p optimized --no-show-revs \
+  > '(not public()){0}'
+  * parsed:
+  (subscript
+    (group
+      (not
+        (func
+          ('symbol', 'public')
+          None)))
+    ('symbol', '0'))
+  * analyzed:
+  (func
+    ('symbol', 'ancestors')
+    (list
+      (not
+        (func
+          ('symbol', 'public')
+          None
+          any)
+        define)
+      ('symbol', '0')
+      ('symbol', '0'))
+    define)
+  * optimized:
+  (func
+    ('symbol', 'ancestors')
+    (list
+      (func
+        ('symbol', '_notpublic')
+        None
+        any)
+      ('symbol', '0')
+      ('symbol', '0'))
+    define)
+
+ parse errors
+
+  $ hg debugrevspec '{0}'
+  hg: parse error at 0: not a prefix: {
+  [255]
+  $ hg debugrevspec '.{0'
+  hg: parse error at 3: unexpected token: end
+  [255]
+  $ hg debugrevspec '.}'
+  hg: parse error at 1: invalid token
+  [255]
+  $ hg debugrevspec '.{a}'
+  hg: parse error: set subscript must be an integer
+  [255]
+  $ hg debugrevspec '.{1-2}'
+  hg: parse error: set subscript must be an integer
+  [255]
+
 parsed tree at stages:
 
   $ hg debugrevspec -p all '()'