Patchwork [4,of,6] revset: introduce "_parsealiasdecl" to parse alias declarations strictly

login
register
mail settings
Submitter Katsunori FUJIWARA
Date Jan. 8, 2015, 10:37 a.m.
Message ID <9d7f9c153a0af1eb9ea2.1420713432@feefifofum>
Download mbox | patch
Permalink /patch/7382/
State Superseded
Commit 0a7fd54d4e60a098dca99727da1e99a4191d73dc
Headers show

Comments

Katsunori FUJIWARA - Jan. 8, 2015, 10:37 a.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1420712143 -32400
#      Thu Jan 08 19:15:43 2015 +0900
# Node ID 9d7f9c153a0af1eb9ea201e9ca784f50670b3676
# Parent  680a889529a06d4999357c2d2b73e8f9e9735dc7
revset: introduce "_parsealiasdecl" to parse alias declarations strictly

This patch introduces "_parsealiasdecl" to parse alias declarations
strictly. For example, "_parsealiasdecl" can detect problems below,
which current implementation can't.

  - un-closed parenthesis causes being treated as "alias symbol"

    because all of declarations not in "func(....)" style are
    recognized as "alias symbol".

    for example, "foo($1, $2" is treated as the alias symbol.

  - alias symbol/function names aren't examined whether they are valid
    as symbol or not

    for example, "foo bar" can be treated as the alias symbol, but of
    course such invalid symbol can't be referred in revset.

  - just splitting argument list by "," causes overlooking syntax
    problems in the declaration

    for example, all of invalid declarations below are overlooked:

    - foo("bar")     => taking one argument named as '"bar"'
    - foo("unclosed) => taking one argument named as '"unclosed'
    - foo(bar::baz)  => taking one argument named as 'bar::baz'
    - foo(bar($1))   => taking one argument named as 'bar($1)'

To decrease complication of patch, current implementation for alias
declarations is replaced by "_parsealiasdecl" in the subsequent
patch. This patch just introduces it.

This patch defines "_parsealiasdecl" not as a method of "revsetalias"
class but as a one of "revset" module, because of ease of testing by
doctest.

This patch factors some helper functions for "tree" out, because:

  - direct accessing like "if tree[0] == 'func' and len(tree) > 1"
    decreases readability

  - subsequent patch (and also existing code paths, in the future) can
    use them for readability

This patch also factors "_tokenizealias" out, because subsequent patch
uses also it.

Patch

diff --git a/mercurial/revset.py b/mercurial/revset.py
--- a/mercurial/revset.py
+++ b/mercurial/revset.py
@@ -260,6 +260,40 @@ 
         raise error.ParseError(err)
     return l
 
+def isvalidsymbol(tree):
+    """Examine whether specified ``tree`` is valid ``symbol`` or not
+    """
+    return tree[0] == 'symbol' and len(tree) > 1
+
+def getsymbol(tree):
+    """Get symbol name from valid ``symbol`` in ``tree``
+
+    This assumes that ``tree`` is already examined by ``isvalidsymbol``.
+    """
+    return tree[1]
+
+def isvalidfunc(tree):
+    """Examine whether specified ``tree`` is valid ``func`` or not
+    """
+    return tree[0] == 'func' and len(tree) > 1 and isvalidsymbol(tree[1])
+
+def getfuncname(tree):
+    """Get function name from valid ``func`` in ``tree``
+
+    This assumes that ``tree`` is already examined by ``isvalidfunc``.
+    """
+    return getsymbol(tree[1])
+
+def getfuncargs(tree):
+    """Get list of function arguments from valid ``func`` in ``tree``
+
+    This assumes that ``tree`` is already examined by ``isvalidfunc``.
+    """
+    if len(tree) > 2:
+        return getlist(tree[2])
+    else:
+        return []
+
 def getset(repo, subset, x):
     if not x:
         raise error.ParseError(_("missing argument"))
@@ -2064,6 +2098,82 @@ 
         for t in tree:
             _checkaliasarg(t, known)
 
+def _tokenizealias(program, lookup=None):
+    """Parse alias declaration/definition into a stream of tokens
+
+    This allows symbol names to use also ``$`` as an initial letter
+    (for backward compatibility), and callers of this function should
+    examine whether ``$`` is used also for unexpected symbols or not.
+    """
+    syminitletter = lambda c: c.isalnum() or c in '._@$' or ord(c) > 127
+    return tokenize(program, lookup=lookup, syminitletter=syminitletter)
+
+def _parsealiasdecl(decl):
+    """Parse alias declaration ``decl``
+
+    This returns ``(name, tree, args, errorstr)`` tuple:
+
+    - ``name``: of declared alias (may be ``decl`` itself at error)
+    - ``tree``: parse result (or ``None`` at error)
+    - ``args``: list of alias argument names (or None for symbol declaration)
+    - ``errorstr``: detail about detected error (or None)
+
+    >>> _parsealiasdecl('foo')
+    ('foo', ('symbol', 'foo'), None, None)
+    >>> _parsealiasdecl('$foo')
+    ('$foo', None, None, "'$' not for alias arguments")
+    >>> _parsealiasdecl('foo::bar')
+    ('foo::bar', None, None, 'invalid format')
+    >>> _parsealiasdecl('foo bar')
+    ('foo bar', None, None, 'at 4: invalid token')
+    >>> _parsealiasdecl('foo()')
+    ('foo', ('func', ('symbol', 'foo')), [], None)
+    >>> _parsealiasdecl('$foo()')
+    ('$foo()', None, None, "'$' not for alias arguments")
+    >>> _parsealiasdecl('foo($1, $2)')
+    ('foo', ('func', ('symbol', 'foo')), ['$1', '$2'], None)
+    >>> _parsealiasdecl('foo(bar_bar, baz.baz)')
+    ('foo', ('func', ('symbol', 'foo')), ['bar_bar', 'baz.baz'], None)
+    >>> _parsealiasdecl('foo($1, $2, nested($1, $2))')
+    ('foo($1, $2, nested($1, $2))', None, None, 'invalid argument list')
+    >>> _parsealiasdecl('foo(bar($1, $2))')
+    ('foo(bar($1, $2))', None, None, 'invalid argument list')
+    >>> _parsealiasdecl('foo("string")')
+    ('foo("string")', None, None, 'invalid argument list')
+    >>> _parsealiasdecl('foo($1, $2')
+    ('foo($1, $2', None, None, 'at 10: unexpected token: end')
+    >>> _parsealiasdecl('foo("string')
+    ('foo("string', None, None, 'at 5: unterminated string')
+    """
+    p = parser.parser(_tokenizealias, elements)
+    try:
+        tree, pos = p.parse(decl)
+        if (pos != len(decl)):
+            raise error.ParseError(_('invalid token'), pos)
+
+        if isvalidsymbol(tree):
+            # "name = ...." style
+            name = getsymbol(tree)
+            if name.startswith('$'):
+                return (decl, None, None, _("'$' not for alias arguments"))
+            return (name, ('symbol', name), None, None)
+
+        if isvalidfunc(tree):
+            # "name(arg, ....) = ...." style
+            name = getfuncname(tree)
+            if name.startswith('$'):
+                return (decl, None, None, _("'$' not for alias arguments"))
+            args = []
+            for arg in getfuncargs(tree):
+                if not isvalidsymbol(arg):
+                    return (decl, None, None, _("invalid argument list"))
+                args.append(getsymbol(arg))
+            return (name, ('func', ('symbol', name)), args, None)
+
+        return (decl, None, None, _("invalid format"))
+    except error.ParseError, inst:
+        return (decl, None, None, parseerrordetail(inst))
+
 class revsetalias(object):
     funcre = re.compile('^([^(]+)\(([^)]+)\)$')
     args = None