Patchwork [2,of,2,V6] revset: skip legacy lookup for revspec wrapped in 'revset(…)'

login
register
mail settings
Submitter Boris Feld
Date April 17, 2018, 10:14 a.m.
Message ID <a8bfeca77f8e59c9f9e8.1523960053@FB>
Download mbox | patch
Permalink /patch/31165/
State New
Headers show

Comments

Boris Feld - April 17, 2018, 10:14 a.m.
# HG changeset patch
# User Boris Feld <boris.feld@octobus.net>
# Date 1523369212 -7200
#      Tue Apr 10 16:06:52 2018 +0200
# Node ID a8bfeca77f8e59c9f9e8552f04ec2f553d59e7fa
# Parent  6501c9e162939da32371fa67da710af506f85140
# EXP-Topic noname
# Available At https://bitbucket.org/octobus/mercurial-devel/
#              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r a8bfeca77f8e
revset: skip legacy lookup for revspec wrapped in 'revset(…)'

Currently, multiple labels can take forms that can be confused with revset
(eg: "rev(0)" is a valid tag). Since we look up for tags before evaluating
revset, this means a tag can shadow a valid revset at any time.

We now enforce the strict revset parsing when wrapped with 'revset(…)'. For
now, This only work on a whole revspec (but can be used within the revset
without effect). This might change in the future if we improve the
implementation.

The feature is undocumented for now, keeping it in the experimental namespace.
In case a better approach to achieve the same goal is found.

The syntax looks like a revset but is not implemented as such for now. Since
the goal is to avoid some preprocessing that happens before revset parsing, we
cannot simply implement it as a revset predicate.

There were other approaches discussed over the mailing-list but they were less
convincing.

Having a configuration flag to disable legacy lookup have been considered but
discarded. There are too many common uses of ambiguous identifier (eg: '+',
'-' or '..') to have the legacy lookup mechanism turned off.

In addition, the approach can control the parsing of each revset, making it
more flexible. For example, a revset used as the value of an existing
configuration option (eg: pushrev) could enforce its resolution as a revset
(by using the prefix) while user inputs would still use the legacy lookup.

In addition to offering a way to unambiguously input a revset, this prefix
allow skipping the name lookup providing a significant speedup in some case.
Boris Feld - April 18, 2018, 9:16 a.m.
The V5 modified in-flight has been taken, sorry for the noise.


On 17/04/2018 12:14, Boris Feld wrote:
> # HG changeset patch
> # User Boris Feld <boris.feld@octobus.net>
> # Date 1523369212 -7200
> #      Tue Apr 10 16:06:52 2018 +0200
> # Node ID a8bfeca77f8e59c9f9e8552f04ec2f553d59e7fa
> # Parent  6501c9e162939da32371fa67da710af506f85140
> # EXP-Topic noname
> # Available At https://bitbucket.org/octobus/mercurial-devel/
> #              hg pull https://bitbucket.org/octobus/mercurial-devel/ -r a8bfeca77f8e
> revset: skip legacy lookup for revspec wrapped in 'revset(…)'
>
> Currently, multiple labels can take forms that can be confused with revset
> (eg: "rev(0)" is a valid tag). Since we look up for tags before evaluating
> revset, this means a tag can shadow a valid revset at any time.
>
> We now enforce the strict revset parsing when wrapped with 'revset(…)'. For
> now, This only work on a whole revspec (but can be used within the revset
> without effect). This might change in the future if we improve the
> implementation.
>
> The feature is undocumented for now, keeping it in the experimental namespace.
> In case a better approach to achieve the same goal is found.
>
> The syntax looks like a revset but is not implemented as such for now. Since
> the goal is to avoid some preprocessing that happens before revset parsing, we
> cannot simply implement it as a revset predicate.
>
> There were other approaches discussed over the mailing-list but they were less
> convincing.
>
> Having a configuration flag to disable legacy lookup have been considered but
> discarded. There are too many common uses of ambiguous identifier (eg: '+',
> '-' or '..') to have the legacy lookup mechanism turned off.
>
> In addition, the approach can control the parsing of each revset, making it
> more flexible. For example, a revset used as the value of an existing
> configuration option (eg: pushrev) could enforce its resolution as a revset
> (by using the prefix) while user inputs would still use the legacy lookup.
>
> In addition to offering a way to unambiguously input a revset, this prefix
> allow skipping the name lookup providing a significant speedup in some case.
>
> diff --git a/mercurial/revsetlang.py b/mercurial/revsetlang.py
> --- a/mercurial/revsetlang.py
> +++ b/mercurial/revsetlang.py
> @@ -68,6 +68,9 @@ symbols = {}
>   # default set of valid characters for non-initial letters of symbols
>   _symletters = _syminitletters | set(pycompat.iterbytestr('-/'))
>   
> +prefixrevset = 'revset('
> +postfixrevset = ')'
> +
>   def tokenize(program, lookup=None, syminitletters=None, symletters=None):
>       '''
>       Parse a revset statement into a stream of tokens
> @@ -95,6 +98,11 @@ def tokenize(program, lookup=None, symin
>       if symletters is None:
>           symletters = _symletters
>   
> +    stripped = program.strip()
> +    if (stripped.startswith(prefixrevset)
> +            and stripped.endswith(postfixrevset)):
> +        lookup = None
> +
>       if program and lookup:
>           # attempt to parse old-style ranges first to deal with
>           # things like old-tag which contain query metacharacters
> @@ -352,6 +360,9 @@ def _analyze(x):
>       elif op == 'keyvalue':
>           return (op, x[1], _analyze(x[2]))
>       elif op == 'func':
> +        f = getsymbol(x[1])
> +        if f == 'revset':
> +            return _analyze(x[2])
>           return (op, x[1], _analyze(x[2]))
>       raise ValueError('invalid operator %r' % op)
>   
> diff --git a/tests/test-revset-legacy-lookup.t b/tests/test-revset-legacy-lookup.t
> --- a/tests/test-revset-legacy-lookup.t
> +++ b/tests/test-revset-legacy-lookup.t
> @@ -1,4 +1,3 @@
> -
>     $ cat >> $HGRCPATH << EOF
>     > [ui]
>     > logtemplate="{rev}:{node|short} {desc} [{tags}]\n"
> @@ -62,6 +61,12 @@ within a more advances revset
>     $ hg log -r 'rev(0) and branch(default)'
>     0:a87874c6ec31 first []
>   
> +with explicit revset resolution
> +(still resolved as the label)
> +
> +  $ hg log -r 'revset(rev(0))'
> +  0:a87874c6ec31 first []
> +
>   some of the above with quote to force its resolution as a label
>   
>     $ hg log -r ':"rev(0)"'
> @@ -91,8 +96,13 @@ Test label with quote in them.
>     $ hg log -r '("foo")'
>     abort: unknown revision 'foo'!
>     [255]
> +  $ hg log -r 'revset("foo")'
> +  abort: unknown revision 'foo'!
> +  [255]
>     $ hg log -r '("\"foo\"")'
>     2:fb616635b18f Added tag rev(0) for changeset 43114e71eddd ["foo"]
> +  $ hg log -r 'revset("\"foo\"")'
> +  2:fb616635b18f Added tag rev(0) for changeset 43114e71eddd ["foo"]
>   
>   Test label with dash in them.
>   
> @@ -116,6 +126,9 @@ Test label with + in them.
>     $ hg log -r '(foo+bar)'
>     abort: unknown revision 'foo'!
>     [255]
> +  $ hg log -r 'revset(foo+bar)'
> +  abort: unknown revision 'foo'!
> +  [255]
>     $ hg log -r '"foo+bar"'
>     4:bbf52b87b370 Added tag foo-bar for changeset a50aae922707 [foo+bar]
>     $ hg log -r '("foo+bar")'
> @@ -129,6 +142,8 @@ Test tag with numeric version number.
>     5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
>     $ hg log -r '(1.2)'
>     5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
> +  $ hg log -r 'revset(1.2)'
> +  5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
>     $ hg log -r '"1.2"'
>     5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
>     $ hg log -r '("1.2")'
> @@ -157,6 +172,9 @@ Test tag with parenthesis (but not a val
>     $ hg log -r '(release_4.1(candidate1))'
>     hg: parse error: unknown identifier: release_4.1
>     [255]
> +  $ hg log -r 'revset(release_4.1(candidate1))'
> +  hg: parse error: unknown identifier: release_4.1
> +  [255]
>     $ hg log -r '"release_4.1(candidate1)"'
>     6:db72e24fe069 Added tag 1.2 for changeset ff42fde8edbb [release_4.1(candidate1)]
>     $ hg log -r '("release_4.1(candidate1)")'
> @@ -182,6 +200,9 @@ Test tag with parenthesis and other func
>     $ hg log -r '(release_4.1(arch=x86,arm))'
>     hg: parse error: unknown identifier: release_4.1
>     [255]
> +  $ hg log -r 'revset(release_4.1(arch=x86,arm))'
> +  hg: parse error: unknown identifier: release_4.1
> +  [255]
>     $ hg log -r '"release_4.1(arch=x86,arm)"'
>     7:b29b25d7d687 Added tag release_4.1(candidate1) for changeset db72e24fe069 [release_4.1(arch=x86,arm)]
>     $ hg log -r '("release_4.1(arch=x86,arm)")'
> @@ -208,6 +229,9 @@ Test tag conflicting with revset functio
>     $ hg log -r '(secret(team=foo,project=bar))'
>     hg: parse error: secret takes no arguments
>     [255]
> +  $ hg log -r 'revset(secret(team=foo,project=bar))'
> +  hg: parse error: secret takes no arguments
> +  [255]
>     $ hg log -r '"secret(team=foo,project=bar)"'
>     8:6b2e2d4ea455 Added tag release_4.1(arch=x86,arm) for changeset b29b25d7d687 [secret(team=foo,project=bar)]
>     $ hg log -r '("secret(team=foo,project=bar)")'
> @@ -237,6 +261,11 @@ Test tag with space
>     ((my little version)
>          ^ here)
>     [255]
> +  $ hg log -r 'revset(my little version)'
> +  hg: parse error at 10: unexpected token: symbol
> +  (revset(my little version)
> +             ^ here)
> +  [255]
>     $ hg log -r '"my little version"'
>     9:269192bf8fc3 Added tag secret(team=foo,project=bar) for changeset 6b2e2d4ea455 [my little version]
>     $ hg log -r '("my little version")'
> diff --git a/tests/test-revset.t b/tests/test-revset.t
> --- a/tests/test-revset.t
> +++ b/tests/test-revset.t
> @@ -2801,3 +2801,43 @@ test multiline revset with errors
>     ( . + .^ +
>               ^ here)
>     [255]
> +  $ hg debugrevspec -v 'revset(first(rev(0)))' -p all
> +  * parsed:
> +  (func
> +    (symbol 'revset')
> +    (func
> +      (symbol 'first')
> +      (func
> +        (symbol 'rev')
> +        (symbol '0'))))
> +  * expanded:
> +  (func
> +    (symbol 'revset')
> +    (func
> +      (symbol 'first')
> +      (func
> +        (symbol 'rev')
> +        (symbol '0'))))
> +  * concatenated:
> +  (func
> +    (symbol 'revset')
> +    (func
> +      (symbol 'first')
> +      (func
> +        (symbol 'rev')
> +        (symbol '0'))))
> +  * analyzed:
> +  (func
> +    (symbol 'first')
> +    (func
> +      (symbol 'rev')
> +      (symbol '0')))
> +  * optimized:
> +  (func
> +    (symbol 'first')
> +    (func
> +      (symbol 'rev')
> +      (symbol '0')))
> +  * set:
> +  <baseset+ [0]>
> +  0
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/revsetlang.py b/mercurial/revsetlang.py
--- a/mercurial/revsetlang.py
+++ b/mercurial/revsetlang.py
@@ -68,6 +68,9 @@  symbols = {}
 # default set of valid characters for non-initial letters of symbols
 _symletters = _syminitletters | set(pycompat.iterbytestr('-/'))
 
+prefixrevset = 'revset('
+postfixrevset = ')'
+
 def tokenize(program, lookup=None, syminitletters=None, symletters=None):
     '''
     Parse a revset statement into a stream of tokens
@@ -95,6 +98,11 @@  def tokenize(program, lookup=None, symin
     if symletters is None:
         symletters = _symletters
 
+    stripped = program.strip()
+    if (stripped.startswith(prefixrevset)
+            and stripped.endswith(postfixrevset)):
+        lookup = None
+
     if program and lookup:
         # attempt to parse old-style ranges first to deal with
         # things like old-tag which contain query metacharacters
@@ -352,6 +360,9 @@  def _analyze(x):
     elif op == 'keyvalue':
         return (op, x[1], _analyze(x[2]))
     elif op == 'func':
+        f = getsymbol(x[1])
+        if f == 'revset':
+            return _analyze(x[2])
         return (op, x[1], _analyze(x[2]))
     raise ValueError('invalid operator %r' % op)
 
diff --git a/tests/test-revset-legacy-lookup.t b/tests/test-revset-legacy-lookup.t
--- a/tests/test-revset-legacy-lookup.t
+++ b/tests/test-revset-legacy-lookup.t
@@ -1,4 +1,3 @@ 
-
   $ cat >> $HGRCPATH << EOF
   > [ui]
   > logtemplate="{rev}:{node|short} {desc} [{tags}]\n"
@@ -62,6 +61,12 @@  within a more advances revset
   $ hg log -r 'rev(0) and branch(default)'
   0:a87874c6ec31 first []
 
+with explicit revset resolution
+(still resolved as the label)
+
+  $ hg log -r 'revset(rev(0))'
+  0:a87874c6ec31 first []
+
 some of the above with quote to force its resolution as a label
 
   $ hg log -r ':"rev(0)"'
@@ -91,8 +96,13 @@  Test label with quote in them.
   $ hg log -r '("foo")'
   abort: unknown revision 'foo'!
   [255]
+  $ hg log -r 'revset("foo")'
+  abort: unknown revision 'foo'!
+  [255]
   $ hg log -r '("\"foo\"")'
   2:fb616635b18f Added tag rev(0) for changeset 43114e71eddd ["foo"]
+  $ hg log -r 'revset("\"foo\"")'
+  2:fb616635b18f Added tag rev(0) for changeset 43114e71eddd ["foo"]
 
 Test label with dash in them.
 
@@ -116,6 +126,9 @@  Test label with + in them.
   $ hg log -r '(foo+bar)'
   abort: unknown revision 'foo'!
   [255]
+  $ hg log -r 'revset(foo+bar)'
+  abort: unknown revision 'foo'!
+  [255]
   $ hg log -r '"foo+bar"'
   4:bbf52b87b370 Added tag foo-bar for changeset a50aae922707 [foo+bar]
   $ hg log -r '("foo+bar")'
@@ -129,6 +142,8 @@  Test tag with numeric version number.
   5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
   $ hg log -r '(1.2)'
   5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
+  $ hg log -r 'revset(1.2)'
+  5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
   $ hg log -r '"1.2"'
   5:ff42fde8edbb Added tag foo+bar for changeset bbf52b87b370 [1.2]
   $ hg log -r '("1.2")'
@@ -157,6 +172,9 @@  Test tag with parenthesis (but not a val
   $ hg log -r '(release_4.1(candidate1))'
   hg: parse error: unknown identifier: release_4.1
   [255]
+  $ hg log -r 'revset(release_4.1(candidate1))'
+  hg: parse error: unknown identifier: release_4.1
+  [255]
   $ hg log -r '"release_4.1(candidate1)"'
   6:db72e24fe069 Added tag 1.2 for changeset ff42fde8edbb [release_4.1(candidate1)]
   $ hg log -r '("release_4.1(candidate1)")'
@@ -182,6 +200,9 @@  Test tag with parenthesis and other func
   $ hg log -r '(release_4.1(arch=x86,arm))'
   hg: parse error: unknown identifier: release_4.1
   [255]
+  $ hg log -r 'revset(release_4.1(arch=x86,arm))'
+  hg: parse error: unknown identifier: release_4.1
+  [255]
   $ hg log -r '"release_4.1(arch=x86,arm)"'
   7:b29b25d7d687 Added tag release_4.1(candidate1) for changeset db72e24fe069 [release_4.1(arch=x86,arm)]
   $ hg log -r '("release_4.1(arch=x86,arm)")'
@@ -208,6 +229,9 @@  Test tag conflicting with revset functio
   $ hg log -r '(secret(team=foo,project=bar))'
   hg: parse error: secret takes no arguments
   [255]
+  $ hg log -r 'revset(secret(team=foo,project=bar))'
+  hg: parse error: secret takes no arguments
+  [255]
   $ hg log -r '"secret(team=foo,project=bar)"'
   8:6b2e2d4ea455 Added tag release_4.1(arch=x86,arm) for changeset b29b25d7d687 [secret(team=foo,project=bar)]
   $ hg log -r '("secret(team=foo,project=bar)")'
@@ -237,6 +261,11 @@  Test tag with space
   ((my little version)
        ^ here)
   [255]
+  $ hg log -r 'revset(my little version)'
+  hg: parse error at 10: unexpected token: symbol
+  (revset(my little version)
+             ^ here)
+  [255]
   $ hg log -r '"my little version"'
   9:269192bf8fc3 Added tag secret(team=foo,project=bar) for changeset 6b2e2d4ea455 [my little version]
   $ hg log -r '("my little version")'
diff --git a/tests/test-revset.t b/tests/test-revset.t
--- a/tests/test-revset.t
+++ b/tests/test-revset.t
@@ -2801,3 +2801,43 @@  test multiline revset with errors
   ( . + .^ +
             ^ here)
   [255]
+  $ hg debugrevspec -v 'revset(first(rev(0)))' -p all
+  * parsed:
+  (func
+    (symbol 'revset')
+    (func
+      (symbol 'first')
+      (func
+        (symbol 'rev')
+        (symbol '0'))))
+  * expanded:
+  (func
+    (symbol 'revset')
+    (func
+      (symbol 'first')
+      (func
+        (symbol 'rev')
+        (symbol '0'))))
+  * concatenated:
+  (func
+    (symbol 'revset')
+    (func
+      (symbol 'first')
+      (func
+        (symbol 'rev')
+        (symbol '0'))))
+  * analyzed:
+  (func
+    (symbol 'first')
+    (func
+      (symbol 'rev')
+      (symbol '0')))
+  * optimized:
+  (func
+    (symbol 'first')
+    (func
+      (symbol 'rev')
+      (symbol '0')))
+  * set:
+  <baseset+ [0]>
+  0