Patchwork [4,of,6,V2] revset: introduce "_parsealiasdecl" to parse alias declarations strictly

login
register
mail settings
Submitter Katsunori FUJIWARA
Date Jan. 10, 2015, 2:25 p.m.
Message ID <94f89514bcc23eb6f55b.1420899944@feefifofum>
Download mbox | patch
Permalink /patch/7421/
State Accepted
Headers show

Comments

Katsunori FUJIWARA - Jan. 10, 2015, 2:25 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1420899491 -32400
#      Sat Jan 10 23:18:11 2015 +0900
# Node ID 94f89514bcc23eb6f55b9919a46a35405cb5bc16
# Parent  39f801397570f104e70f4a0f6472386cfd95608a
revset: introduce "_parsealiasdecl" to parse alias declarations strictly

This patch introduces "_parsealiasdecl" to parse alias declarations
strictly. For example, "_parsealiasdecl" can detect problems below,
which current implementation can't.

  - un-closed parenthesis causes being treated as "alias symbol"

    because all of declarations not in "func(....)" style are
    recognized as "alias symbol".

    for example, "foo($1, $2" is treated as the alias symbol.

  - alias symbol/function names aren't examined whether they are valid
    as symbol or not

    for example, "foo bar" can be treated as the alias symbol, but of
    course such invalid symbol can't be referred in revset.

  - just splitting argument list by "," causes overlooking syntax
    problems in the declaration

    for example, all of invalid declarations below are overlooked:

    - foo("bar")     => taking one argument named as '"bar"'
    - foo("unclosed) => taking one argument named as '"unclosed'
    - foo(bar::baz)  => taking one argument named as 'bar::baz'
    - foo(bar($1))   => taking one argument named as 'bar($1)'

To decrease complication of patch, current implementation for alias
declarations is replaced by "_parsealiasdecl" in the subsequent
patch. This patch just introduces it.

This patch defines "_parsealiasdecl" not as a method of "revsetalias"
class but as a one of "revset" module, because of ease of testing by
doctest.

This patch factors some helper functions for "tree" out, because:

  - direct accessing like "if tree[0] == 'func' and len(tree) > 1"
    decreases readability

  - subsequent patch (and also existing code paths, in the future) can
    use them for readability

This patch also factors "_tokenizealias" out, because it can be used
also for parsing alias definitions strictly.
Pierre-Yves David - Jan. 14, 2015, 9:56 p.m.
On 01/10/2015 06:25 AM, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1420899491 -32400
> #      Sat Jan 10 23:18:11 2015 +0900
> # Node ID 94f89514bcc23eb6f55b9919a46a35405cb5bc16
> # Parent  39f801397570f104e70f4a0f6472386cfd95608a
> revset: introduce "_parsealiasdecl" to parse alias declarations strictly
>
> This patch introduces "_parsealiasdecl" to parse alias declarations
> strictly. For example, "_parsealiasdecl" can detect problems below,
> which current implementation can't.
>
>    - un-closed parenthesis causes being treated as "alias symbol"
>
>      because all of declarations not in "func(....)" style are
>      recognized as "alias symbol".
>
>      for example, "foo($1, $2" is treated as the alias symbol.
>
>    - alias symbol/function names aren't examined whether they are valid
>      as symbol or not
>
>      for example, "foo bar" can be treated as the alias symbol, but of
>      course such invalid symbol can't be referred in revset.
>
>    - just splitting argument list by "," causes overlooking syntax
>      problems in the declaration
>
>      for example, all of invalid declarations below are overlooked:
>
>      - foo("bar")     => taking one argument named as '"bar"'
>      - foo("unclosed) => taking one argument named as '"unclosed'
>      - foo(bar::baz)  => taking one argument named as 'bar::baz'
>      - foo(bar($1))   => taking one argument named as 'bar($1)'
>
> To decrease complication of patch, current implementation for alias
> declarations is replaced by "_parsealiasdecl" in the subsequent
> patch. This patch just introduces it.
>
> This patch defines "_parsealiasdecl" not as a method of "revsetalias"
> class but as a one of "revset" module, because of ease of testing by
> doctest.
>
> This patch factors some helper functions for "tree" out, because:
>
>    - direct accessing like "if tree[0] == 'func' and len(tree) > 1"
>      decreases readability
>
>    - subsequent patch (and also existing code paths, in the future) can
>      use them for readability
>
> This patch also factors "_tokenizealias" out, because it can be used
> also for parsing alias definitions strictly.
>
> diff --git a/mercurial/revset.py b/mercurial/revset.py
> --- a/mercurial/revset.py
> +++ b/mercurial/revset.py
> @@ -267,6 +267,40 @@
>           raise error.ParseError(err)
>       return l
>
> +def isvalidsymbol(tree):
> +    """Examine whether specified ``tree`` is valid ``symbol`` or not
> +    """
> +    return tree[0] == 'symbol' and len(tree) > 1
> +
> +def getsymbol(tree):
> +    """Get symbol name from valid ``symbol`` in ``tree``
> +
> +    This assumes that ``tree`` is already examined by ``isvalidsymbol``.
> +    """

given how cheap `isvalidsymbol` is. we should probably add a 'assert 
isvalidsymbol(tree) here. This also apply to the other getter. I would 
be happy to take that in a follow up patch.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -267,6 +267,40 @@ 
         raise error.ParseError(err)
     return l
 
+def isvalidsymbol(tree):
+    """Examine whether specified ``tree`` is valid ``symbol`` or not
+    """
+    return tree[0] == 'symbol' and len(tree) > 1
+
+def getsymbol(tree):
+    """Get symbol name from valid ``symbol`` in ``tree``
+
+    This assumes that ``tree`` is already examined by ``isvalidsymbol``.
+    """
+    return tree[1]
+
+def isvalidfunc(tree):
+    """Examine whether specified ``tree`` is valid ``func`` or not
+    """
+    return tree[0] == 'func' and len(tree) > 1 and isvalidsymbol(tree[1])
+
+def getfuncname(tree):
+    """Get function name from valid ``func`` in ``tree``
+
+    This assumes that ``tree`` is already examined by ``isvalidfunc``.
+    """
+    return getsymbol(tree[1])
+
+def getfuncargs(tree):
+    """Get list of function arguments from valid ``func`` in ``tree``
+
+    This assumes that ``tree`` is already examined by ``isvalidfunc``.
+    """
+    if len(tree) > 2:
+        return getlist(tree[2])
+    else:
+        return []
+
 def getset(repo, subset, x):
     if not x:
         raise error.ParseError(_("missing argument"))
@@ -2081,6 +2115,87 @@ 
         for t in tree:
             _checkaliasarg(t, known)
 
+# the set of valid characters for the initial letter of symbols in
+# alias declarations and definitions
+_aliassyminitletters = set(c for c in [chr(i) for i in xrange(256)]
+                           if c.isalnum() or c in '._@$' or ord(c) > 127)
+
+def _tokenizealias(program, lookup=None):
+    """Parse alias declaration/definition into a stream of tokens
+
+    This allows symbol names to use also ``$`` as an initial letter
+    (for backward compatibility), and callers of this function should
+    examine whether ``$`` is used also for unexpected symbols or not.
+    """
+    return tokenize(program, lookup=lookup,
+                    syminitletters=_aliassyminitletters)
+
+def _parsealiasdecl(decl):
+    """Parse alias declaration ``decl``
+
+    This returns ``(name, tree, args, errorstr)`` tuple:
+
+    - ``name``: of declared alias (may be ``decl`` itself at error)
+    - ``tree``: parse result (or ``None`` at error)
+    - ``args``: list of alias argument names (or None for symbol declaration)
+    - ``errorstr``: detail about detected error (or None)
+
+    >>> _parsealiasdecl('foo')
+    ('foo', ('symbol', 'foo'), None, None)
+    >>> _parsealiasdecl('$foo')
+    ('$foo', None, None, "'$' not for alias arguments")
+    >>> _parsealiasdecl('foo::bar')
+    ('foo::bar', None, None, 'invalid format')
+    >>> _parsealiasdecl('foo bar')
+    ('foo bar', None, None, 'at 4: invalid token')
+    >>> _parsealiasdecl('foo()')
+    ('foo', ('func', ('symbol', 'foo')), [], None)
+    >>> _parsealiasdecl('$foo()')
+    ('$foo()', None, None, "'$' not for alias arguments")
+    >>> _parsealiasdecl('foo($1, $2)')
+    ('foo', ('func', ('symbol', 'foo')), ['$1', '$2'], None)
+    >>> _parsealiasdecl('foo(bar_bar, baz.baz)')
+    ('foo', ('func', ('symbol', 'foo')), ['bar_bar', 'baz.baz'], None)
+    >>> _parsealiasdecl('foo($1, $2, nested($1, $2))')
+    ('foo($1, $2, nested($1, $2))', None, None, 'invalid argument list')
+    >>> _parsealiasdecl('foo(bar($1, $2))')
+    ('foo(bar($1, $2))', None, None, 'invalid argument list')
+    >>> _parsealiasdecl('foo("string")')
+    ('foo("string")', None, None, 'invalid argument list')
+    >>> _parsealiasdecl('foo($1, $2')
+    ('foo($1, $2', None, None, 'at 10: unexpected token: end')
+    >>> _parsealiasdecl('foo("string')
+    ('foo("string', None, None, 'at 5: unterminated string')
+    """
+    p = parser.parser(_tokenizealias, elements)
+    try:
+        tree, pos = p.parse(decl)
+        if (pos != len(decl)):
+            raise error.ParseError(_('invalid token'), pos)
+
+        if isvalidsymbol(tree):
+            # "name = ...." style
+            name = getsymbol(tree)
+            if name.startswith('$'):
+                return (decl, None, None, _("'$' not for alias arguments"))
+            return (name, ('symbol', name), None, None)
+
+        if isvalidfunc(tree):
+            # "name(arg, ....) = ...." style
+            name = getfuncname(tree)
+            if name.startswith('$'):
+                return (decl, None, None, _("'$' not for alias arguments"))
+            args = []
+            for arg in getfuncargs(tree):
+                if not isvalidsymbol(arg):
+                    return (decl, None, None, _("invalid argument list"))
+                args.append(getsymbol(arg))
+            return (name, ('func', ('symbol', name)), args, None)
+
+        return (decl, None, None, _("invalid format"))
+    except error.ParseError, inst:
+        return (decl, None, None, parseerrordetail(inst))
+
 class revsetalias(object):
     funcre = re.compile('^([^(]+)\(([^)]+)\)$')
     args = None