Patchwork revsetlang: do not pass in non-bytes to parse()

login
register
mail settings
Submitter Yuya Nishihara
Date April 17, 2018, 1:21 p.m.
Message ID <235258eb2600f6d41ce9.1523971279@mimosa>
Download mbox | patch
Permalink /patch/31167/
State Accepted
Headers show

Comments

Yuya Nishihara - April 17, 2018, 1:21 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1523969998 -32400
#      Tue Apr 17 21:59:58 2018 +0900
# Node ID 235258eb2600f6d41ce9bc5c8ab5a2601b19dce8
# Parent  925707ac2855944b0607bec68986a273fb5321ae
revsetlang: do not pass in non-bytes to parse()

Since parse() isn't a simple function, we shouldn't expect it would raise
TypeError or ValueError for invalid inputs. Before, TypeError was raised
at 'if pos != len(spec)', which was quite late to report an error.

This patch also makes tokenize() detect invalid object before converting
it to a py3-safe bytes.

Spotted while adding the 'revset(...)' hack to _parsewith().
Augie Fackler - April 18, 2018, 5:42 p.m.
On Tue, Apr 17, 2018 at 10:21:19PM +0900, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1523969998 -32400
> #      Tue Apr 17 21:59:58 2018 +0900
> # Node ID 235258eb2600f6d41ce9bc5c8ab5a2601b19dce8
> # Parent  925707ac2855944b0607bec68986a273fb5321ae
> revsetlang: do not pass in non-bytes to parse()

queued, thanks

Patch

diff --git a/mercurial/revsetlang.py b/mercurial/revsetlang.py
--- a/mercurial/revsetlang.py
+++ b/mercurial/revsetlang.py
@@ -89,6 +89,9 @@  def tokenize(program, lookup=None, symin
     [('symbol', '@', 0), ('::', None, 1), ('end', None, 3)]
 
     '''
+    if not isinstance(program, bytes):
+        raise error.ProgrammingError('revset statement must be bytes, got %r'
+                                     % program)
     program = pycompat.bytestr(program)
     if syminitletters is None:
         syminitletters = _syminitletters
@@ -581,6 +584,8 @@  def _formatargtype(c, arg):
     elif c == 's':
         return _quote(arg)
     elif c == 'r':
+        if not isinstance(arg, bytes):
+            raise TypeError
         parse(arg) # make sure syntax errors are confined
         return '(%s)' % arg
     elif c == 'n':