Patchwork [3,of,3,V5] revset: skip legacy lookup for revspec wrapped in 'revset(…)'

login
register
mail settings
Submitter Boris Feld
Date April 16, 2018, 6:58 a.m.
Message ID <109ca88347d7b531d4b4.1523861911@FB>
Download mbox | patch
Permalink /patch/31088/
State New
Headers show

Comments

Boris Feld - April 16, 2018, 6:58 a.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1523369212 -7200
#      Tue Apr 10 16:06:52 2018 +0200
# Node ID 109ca88347d7b531d4b48370efcbc4d6e850cf92
# Parent  f363552ced37ae028bbcf2cba1f02ac623385f54
# EXP-Topic noname
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 109ca88347d7
revset: skip legacy lookup for revspec wrapped in 'revset(…)'

Currently, multiple labels can take forms that can be confused with revset
(eg: "rev(0)" is a valid tag). Since we look up for tags before evaluating
revset, this means a tag can shadow a valid revset at any time.

We now enforce the strict revset parsing when wrapped with 'revset(…)'. For
now, This only work on a whole revspec (but can be used within the revset
without effect). This might change in the future if we improve the
implementation.

The feature is undocumented for now, keeping it in the experimental namespace.
In case a better approach to achieve the same goal is found.

The syntax looks like a revset but is not implemented as such for now. Since the
goal is to avoid some preprocessing that happens before revset parsing, we
cannot simply implement it as a revset predicate.

There was other approaches discussed over the mailing-list but they were less
convincing.

Having a configuration flag to disable legacy lookup have been considered but
discarded. There are too many common uses of ambiguous identifier (eg: '+',
'-' or '..') to have the legacy lookup mechanism turned off.

In addition, the  approach can control the parsing of each revset, making
it more flexible. For example, a revset used as the value of an existing
configuration option (eg: pushrev) could enforce its resolution as a revset (by
using the prefix) while user inputs would still use the legacy lookup.

In addition of offering a way to unambiguously input a revset, this prefix
allow skipping the name lookup providing a significant speedup in some case.
Yuya Nishihara - April 16, 2018, 12:24 p.m.
On Mon, 16 Apr 2018 08:58:31 +0200, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1523369212 -7200
> #      Tue Apr 10 16:06:52 2018 +0200
> # Node ID 109ca88347d7b531d4b48370efcbc4d6e850cf92
> # Parent  f363552ced37ae028bbcf2cba1f02ac623385f54
> # EXP-Topic noname
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 109ca88347d7
> revset: skip legacy lookup for revspec wrapped in 'revset(…)'

> @@ -2191,7 +2194,12 @@ def matchany(ui, specs, lookup=None, loc
>          raise error.ParseError(_("empty query"))
>      parsedspecs = []
>      for s in specs:
> -        parsedspecs.append(revsetlang.parse(s, lookup))
> +        lookupthis = lookup
> +        stripped = s.strip()
> +        if (stripped.startswith(prefixrevset)
> +                and stripped.endswith(postfixrevset)):
> +            lookupthis = None
> +        parsedspecs.append(revsetlang.parse(s, lookupthis))

Is it okay to move this hack to revsetlang._parsewith?

@@ -482,6 +485,8 @@ def _parsewith(spec, lookup=None, symini
       ...
     ParseError: ('invalid token', 4)
     """
+    if spec.startswith('revset(') and spec.endswith(')'):
+        lookup = None
     p = parser.parser(elements)
     tree, pos = p.parse(tokenize(spec, lookup=lookup,
                                  syminitletters=syminitletters))

I don't think revset.match*() is the right place to do parsing stuff, and
we'll need a tokenizer to make it more correctly handle variants such as
' revset ( ... )' or '... and revset(...)'.
Boris Feld - April 16, 2018, 5:49 p.m.
On 16/04/2018 14:24, Yuya Nishihara wrote:
> On Mon, 16 Apr 2018 08:58:31 +0200, Boris Feld wrote:
>> # HG changeset patch
>> # User Boris Feld <boris.feld@octobus.net>
>> # Date 1523369212 -7200
>> #      Tue Apr 10 16:06:52 2018 +0200
>> # Node ID 109ca88347d7b531d4b48370efcbc4d6e850cf92
>> # Parent  f363552ced37ae028bbcf2cba1f02ac623385f54
>> # EXP-Topic noname
>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 109ca88347d7
>> revset: skip legacy lookup for revspec wrapped in 'revset(…)'
>> @@ -2191,7 +2194,12 @@ def matchany(ui, specs, lookup=None, loc
>>           raise error.ParseError(_("empty query"))
>>       parsedspecs = []
>>       for s in specs:
>> -        parsedspecs.append(revsetlang.parse(s, lookup))
>> +        lookupthis = lookup
>> +        stripped = s.strip()
>> +        if (stripped.startswith(prefixrevset)
>> +                and stripped.endswith(postfixrevset)):
>> +            lookupthis = None
>> +        parsedspecs.append(revsetlang.parse(s, lookupthis))
> Is it okay to move this hack to revsetlang._parsewith?
>
> @@ -482,6 +485,8 @@ def _parsewith(spec, lookup=None, symini
>         ...
>       ParseError: ('invalid token', 4)
>       """
> +    if spec.startswith('revset(') and spec.endswith(')'):
> +        lookup = None
>       p = parser.parser(elements)
>       tree, pos = p.parse(tokenize(spec, lookup=lookup,
>                                    syminitletters=syminitletters))
>
> I don't think revset.match*() is the right place to do parsing stuff, and
> we'll need a tokenizer to make it more correctly handle variants such as
> ' revset ( ... )' or '... and revset(...)'.
You're are right, moving it lower in the stack makes sense. Would it be 
possible to implement it even lower in revsetlang.tokenize?

We tried preparing a V6 to move the code but we didn't find the queued 
version of the first changeset. Were you waiting for us to move the code 
yourself?
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Boris Feld - April 17, 2018, 10:16 a.m.
On 16/04/2018 19:49, Feld Boris wrote:
> On 16/04/2018 14:24, Yuya Nishihara wrote:
>> On Mon, 16 Apr 2018 08:58:31 +0200, Boris Feld wrote:
>>> # HG changeset patch
>>> # User Boris Feld <boris.feld@octobus.net>
>>> # Date 1523369212 -7200
>>> #      Tue Apr 10 16:06:52 2018 +0200
>>> # Node ID 109ca88347d7b531d4b48370efcbc4d6e850cf92
>>> # Parent  f363552ced37ae028bbcf2cba1f02ac623385f54
>>> # EXP-Topic noname
>>> # Available At https://bitbucket.org/octobus/mercurial-devel/
>>> #              hg pull 
>>> https://bitbucket.org/octobus/mercurial-devel/ -r 109ca88347d7
>>> revset: skip legacy lookup for revspec wrapped in 'revset(…)'
>>> @@ -2191,7 +2194,12 @@ def matchany(ui, specs, lookup=None, loc
>>>           raise error.ParseError(_("empty query"))
>>>       parsedspecs = []
>>>       for s in specs:
>>> -        parsedspecs.append(revsetlang.parse(s, lookup))
>>> +        lookupthis = lookup
>>> +        stripped = s.strip()
>>> +        if (stripped.startswith(prefixrevset)
>>> +                and stripped.endswith(postfixrevset)):
>>> +            lookupthis = None
>>> +        parsedspecs.append(revsetlang.parse(s, lookupthis))
>> Is it okay to move this hack to revsetlang._parsewith?
>>
>> @@ -482,6 +485,8 @@ def _parsewith(spec, lookup=None, symini
>>         ...
>>       ParseError: ('invalid token', 4)
>>       """
>> +    if spec.startswith('revset(') and spec.endswith(')'):
>> +        lookup = None
>>       p = parser.parser(elements)
>>       tree, pos = p.parse(tokenize(spec, lookup=lookup,
>> syminitletters=syminitletters))
>>
>> I don't think revset.match*() is the right place to do parsing stuff, 
>> and
>> we'll need a tokenizer to make it more correctly handle variants such as
>> ' revset ( ... )' or '... and revset(...)'.
> You're are right, moving it lower in the stack makes sense. Would it 
> be possible to implement it even lower in revsetlang.tokenize?
>
> We tried preparing a V6 to move the code but we didn't find the queued 
> version of the first changeset. Were you waiting for us to move the 
> code yourself?
We sent the V6 version with the renamed test file (we hope it's the 
right name) so you can take the second updated changeset.
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - April 17, 2018, 11:06 a.m.
On Mon, 16 Apr 2018 19:49:43 +0200, Feld Boris wrote:
> On 16/04/2018 14:24, Yuya Nishihara wrote:
> > On Mon, 16 Apr 2018 08:58:31 +0200, Boris Feld wrote:
> >> # HG changeset patch
> >> # User Boris Feld <boris.feld@octobus.net>
> >> # Date 1523369212 -7200
> >> #      Tue Apr 10 16:06:52 2018 +0200
> >> # Node ID 109ca88347d7b531d4b48370efcbc4d6e850cf92
> >> # Parent  f363552ced37ae028bbcf2cba1f02ac623385f54
> >> # EXP-Topic noname
> >> # Available At https://bitbucket.org/octobus/mercurial-devel/
> >> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r 109ca88347d7
> >> revset: skip legacy lookup for revspec wrapped in 'revset(…)'
> >> @@ -2191,7 +2194,12 @@ def matchany(ui, specs, lookup=None, loc
> >>           raise error.ParseError(_("empty query"))
> >>       parsedspecs = []
> >>       for s in specs:
> >> -        parsedspecs.append(revsetlang.parse(s, lookup))
> >> +        lookupthis = lookup
> >> +        stripped = s.strip()
> >> +        if (stripped.startswith(prefixrevset)
> >> +                and stripped.endswith(postfixrevset)):
> >> +            lookupthis = None
> >> +        parsedspecs.append(revsetlang.parse(s, lookupthis))
> > Is it okay to move this hack to revsetlang._parsewith?
> >
> > @@ -482,6 +485,8 @@ def _parsewith(spec, lookup=None, symini
> >         ...
> >       ParseError: ('invalid token', 4)
> >       """
> > +    if spec.startswith('revset(') and spec.endswith(')'):
> > +        lookup = None
> >       p = parser.parser(elements)
> >       tree, pos = p.parse(tokenize(spec, lookup=lookup,
> >                                    syminitletters=syminitletters))
> >
> > I don't think revset.match*() is the right place to do parsing stuff, and
> > we'll need a tokenizer to make it more correctly handle variants such as
> > ' revset ( ... )' or '... and revset(...)'.
> You're are right, moving it lower in the stack makes sense. Would it be 
> possible to implement it even lower in revsetlang.tokenize?

Maybe no. We'll need the tokenize() function to handle e.g. 'revset ( ... )'.

  if firsttwotokens(tokenize(s)) == ['revset', '(']:
      return tokenize(s)
  # otherwise, wrap lookup() to count parens and conditionally disable lookup?

> We tried preparing a V6 to move the code but we didn't find the queued 
> version of the first changeset. Were you waiting for us to move the code 
> yourself?

I've already fixed it locally, so will push soon.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -2173,6 +2173,9 @@  def match(ui, spec, lookup=None):
     """Create a matcher for a single revision spec"""
     return matchany(ui, [spec], lookup=None)
 
+prefixrevset = 'revset('
+postfixrevset = ')'
+
 def matchany(ui, specs, lookup=None, localalias=None):
     """Create a matcher that will include any revisions matching one of the
     given specs
@@ -2191,7 +2194,12 @@  def matchany(ui, specs, lookup=None, loc
         raise error.ParseError(_("empty query"))
     parsedspecs = []
     for s in specs:
-        parsedspecs.append(revsetlang.parse(s, lookup))
+        lookupthis = lookup
+        stripped = s.strip()
+        if (stripped.startswith(prefixrevset)
+                and stripped.endswith(postfixrevset)):
+            lookupthis = None
+        parsedspecs.append(revsetlang.parse(s, lookupthis))
     if len(parsedspecs) == 1:
         tree = parsedspecs[0]
     else:
diff --git a/mercurial/revsetlang.py b/mercurial/revsetlang.py
--- a/mercurial/revsetlang.py
+++ b/mercurial/revsetlang.py
@@ -352,6 +352,9 @@  def _analyze(x):
     elif op == 'keyvalue':
         return (op, x[1], _analyze(x[2]))
     elif op == 'func':
+        f = getsymbol(x[1])
+        if f == 'revset':
+            return _analyze(x[2])
         return (op, x[1], _analyze(x[2]))
     raise ValueError('invalid operator %r' % op)
 
diff --git a/tests/test-legacy-lookup.t b/tests/test-legacy-lookup.t
--- a/tests/test-legacy-lookup.t
+++ b/tests/test-legacy-lookup.t
@@ -62,6 +62,12 @@  within a more advances revset
   $ hg log -r 'rev(0) and branch(default)'
   0:a87874c6ec31 first []
 
+with explicit revset resolution
+(still resolved as the label)
+
+  $ hg log -r 'revset(rev(0))'
+  0:a87874c6ec31 first []
+
 some of the above with quote to force its resolution as a label
 
   $ hg log -r ':"rev(0)"'
@@ -91,8 +97,13 @@  Test label with quote in them.
   $ hg log -r '("foo")'
   abort: unknown revision 'foo'!
   [255]
+  $ hg log -r 'revset("foo")'
+  abort: unknown revision 'foo'!
+  [255]
   $ hg log -r '("\"foo\"")'
   2:fb616635b18f Added tag rev(0) for changeset 43114e71eddd ["foo"]
+  $ hg log -r 'revset("\"foo\"")'
+  2:fb616635b18f Added tag rev(0) for changeset 43114e71eddd ["foo"]
 
 Test label with dash in them.
 
@@ -116,6 +127,9 @@  Test label with + in them.
   $ hg log -r '(foo+bar)'
   abort: unknown revision 'foo'!
   [255]
+  $ hg log -r 'revset(foo+bar)'
+  abort: unknown revision 'foo'!
+  [255]
   $ hg log -r '"foo+bar"'
   4:bbf52b87b370 Added tag foo-bar for changeset a50aae922707 [foo+bar]
   $ hg log -r '("foo+bar")'
@@ -129,6 +143,8 @@  Test tag with numeric version number.
   5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
   $ hg log -r '(1.2)'
   5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
+  $ hg log -r 'revset(1.2)'
+  5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
   $ hg log -r '"1.2"'
   5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
   $ hg log -r '("1.2")'
@@ -157,6 +173,9 @@  Test tag with parenthesis (but not a val
   $ hg log -r '(release_4.1(candidate1))'
   hg: parse error: unknown identifier: release_4.1
   [255]
+  $ hg log -r 'revset(release_4.1(candidate1))'
+  hg: parse error: unknown identifier: release_4.1
+  [255]
   $ hg log -r '"release_4.1(candidate1)"'
   6:db72e24fe069 Added tag 1.2 for changeset ff42fde8edbb [release_4.1(candidate1)]
   $ hg log -r '("release_4.1(candidate1)")'
@@ -182,6 +201,9 @@  Test tag with parenthesis and other func
   $ hg log -r '(release_4.1(arch=x86,arm))'
   hg: parse error: unknown identifier: release_4.1
   [255]
+  $ hg log -r 'revset(release_4.1(arch=x86,arm))'
+  hg: parse error: unknown identifier: release_4.1
+  [255]
   $ hg log -r '"release_4.1(arch=x86,arm)"'
   7:b29b25d7d687 Added tag release_4.1(candidate1) for changeset db72e24fe069 [release_4.1(arch=x86,arm)]
   $ hg log -r '("release_4.1(arch=x86,arm)")'
@@ -208,6 +230,9 @@  Test tag conflicting with revset functio
   $ hg log -r '(secret(team=foo,project=bar))'
   hg: parse error: secret takes no arguments
   [255]
+  $ hg log -r 'revset(secret(team=foo,project=bar))'
+  hg: parse error: secret takes no arguments
+  [255]
   $ hg log -r '"secret(team=foo,project=bar)"'
   8:6b2e2d4ea455 Added tag release_4.1(arch=x86,arm) for changeset b29b25d7d687 [secret(team=foo,project=bar)]
   $ hg log -r '("secret(team=foo,project=bar)")'
@@ -237,6 +262,11 @@  Test tag with space
   ((my little version)
        ^ here)
   [255]
+  $ hg log -r 'revset(my little version)'
+  hg: parse error at 10: unexpected token: symbol
+  (revset(my little version)
+             ^ here)
+  [255]
   $ hg log -r '"my little version"'
   9:269192bf8fc3 Added tag secret(team=foo,project=bar) for changeset 6b2e2d4ea455 [my little version]
   $ hg log -r '("my little version")'
diff --git a/tests/test-revset.t b/tests/test-revset.t
--- a/tests/test-revset.t
+++ b/tests/test-revset.t
@@ -2801,3 +2801,43 @@  test multiline revset with errors
   ( . + .^ +
             ^ here)
   [255]
+  $ hg debugrevspec -v 'revset(first(rev(0)))' -p all
+  * parsed:
+  (func
+    (symbol 'revset')
+    (func
+      (symbol 'first')
+      (func
+        (symbol 'rev')
+        (symbol '0'))))
+  * expanded:
+  (func
+    (symbol 'revset')
+    (func
+      (symbol 'first')
+      (func
+        (symbol 'rev')
+        (symbol '0'))))
+  * concatenated:
+  (func
+    (symbol 'revset')
+    (func
+      (symbol 'first')
+      (func
+        (symbol 'rev')
+        (symbol '0'))))
+  * analyzed:
+  (func
+    (symbol 'first')
+    (func
+      (symbol 'rev')
+      (symbol '0')))
+  * optimized:
+  (func
+    (symbol 'first')
+    (func
+      (symbol 'rev')
+      (symbol '0')))
+  * set:
+  <baseset+ [0]>
+  0