Patchwork [3,of,3,STABLE] check-code: detect incorrect format operation on the string not including "%"

login
register
mail settings
Submitter Katsunori FUJIWARA
Date April 13, 2014, 1:32 p.m.
Message ID <49c3352c68df5e29ad05.1397395939@feefifofum>
Download mbox | patch
Permalink /patch/4315/
State Superseded
Headers show

Comments

Katsunori FUJIWARA - April 13, 2014, 1:32 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1397394948 -32400
#      Sun Apr 13 22:15:48 2014 +0900
# Branch stable
# Node ID 49c3352c68df5e29ad05db9d2087bc73e2d84372
# Parent  98b776640d7939e3a8d6ac6601d5c7cd173c0539
check-code: detect incorrect format operation on the string not including "%"

This patch introduces new patterns to detect incorrect format
operations on the string not including any "%" specifiers: it causes
TypeError at runtime.

Patterns newly introduced by this patch treat format operations in
"format-string % value" style.

This patch uses "[ xnopq]" instead of "[^%']" to match against
'characters other than "%" in string' correctly.

For example, Python interpreter (and programmers) can recognize two
strings 'preceding string' and '%s format' in the code below:

    func('preceding string', '%s format')

But regexp matching may (mis-)recognize that string ', ' (not
including any "%" specifiers) and formatting operation "%s" (from '%s
format') on it in this code unexpectedly.

After normalizing by "repquote()", strings in Python code doesn't
contain characters other than ' ', 'x', 'n', 'o', 'p', 'q' or '%'. For
example, the example code above is normalized like as below:

    func('xxxxxxxxx xxxxxx', '%x xxxxxx')

Then , "'[ xnopq]+'" can avoid unexpected matching against ', ', but
"'[^%']+'" still can't. This is because of using "[ xnopq]" instead of
"[^%']" in this patch.

"[^%']'" is simple enough for "(format-string) % value" style treated
in preceding patch, because anchoring to "(" and ")" avoids unexpected
matching described above.

Newly added patterns mis-recognize that format operation '%x' is
applied on string ' ' in the example code below (after normalizing by
"repquote()"), because "[ xnopq]" matches against it: in the example
above, comma between strings prevents from mis-recognizing.

    print 'xxxxxxxxx xxxxxx' '%x xxxxxx'

Changing normalization result for ' ' in "repquote()" from ' ' to 'x'
(or so) can resolve this problem, but it breaks examinations for rst
syntax.

This patch chooses to show hint information "(or use '+' for string
concatenation to avoid ambiguous cases)", because this is very rare
case and current Mercurial implementation doesn't have such code.
Katsunori FUJIWARA - April 13, 2014, 1:35 p.m.
At Sun, 13 Apr 2014 22:32:19 +0900,
FUJIWARA Katsunori wrote:
> 
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1397394948 -32400
> #      Sun Apr 13 22:15:48 2014 +0900
> # Branch stable
> # Node ID 49c3352c68df5e29ad05db9d2087bc73e2d84372
> # Parent  98b776640d7939e3a8d6ac6601d5c7cd173c0539
> check-code: detect incorrect format operation on the string not including "%"
> 
> This patch introduces new patterns to detect incorrect format
> operations on the string not including any "%" specifiers: it causes
> TypeError at runtime.
> 
> Patterns newly introduced by this patch treat format operations in
> "format-string % value" style.

This patch may have to be posted for "default", because this
doesn't fix any existing problems.

But I chose to post this as one of series for "stable", because
preceding patch (#2) seems to make it easier to understand.




> This patch uses "[ xnopq]" instead of "[^%']" to match against
> 'characters other than "%" in string' correctly.
> 
> For example, Python interpreter (and programmers) can recognize two
> strings 'preceding string' and '%s format' in the code below:
> 
>     func('preceding string', '%s format')
> 
> But regexp matching may (mis-)recognize that string ', ' (not
> including any "%" specifiers) and formatting operation "%s" (from '%s
> format') on it in this code unexpectedly.
> 
> After normalizing by "repquote()", strings in Python code doesn't
> contain characters other than ' ', 'x', 'n', 'o', 'p', 'q' or '%'. For
> example, the example code above is normalized like as below:
> 
>     func('xxxxxxxxx xxxxxx', '%x xxxxxx')
> 
> Then , "'[ xnopq]+'" can avoid unexpected matching against ', ', but
> "'[^%']+'" still can't. This is because of using "[ xnopq]" instead of
> "[^%']" in this patch.
> 
> "[^%']'" is simple enough for "(format-string) % value" style treated
> in preceding patch, because anchoring to "(" and ")" avoids unexpected
> matching described above.
> 
> Newly added patterns mis-recognize that format operation '%x' is
> applied on string ' ' in the example code below (after normalizing by
> "repquote()"), because "[ xnopq]" matches against it: in the example
> above, comma between strings prevents from mis-recognizing.
> 
>     print 'xxxxxxxxx xxxxxx' '%x xxxxxx'
> 
> Changing normalization result for ' ' in "repquote()" from ' ' to 'x'
> (or so) can resolve this problem, but it breaks examinations for rst
> syntax.
> 
> This patch chooses to show hint information "(or use '+' for string
> concatenation to avoid ambiguous cases)", because this is very rare
> case and current Mercurial implementation doesn't have such code.
> 
> diff --git a/contrib/check-code.py b/contrib/check-code.py
> --- a/contrib/check-code.py
> +++ b/contrib/check-code.py
> @@ -204,6 +204,12 @@
>       "don't apply % on non-format string"),
>      (r"\([ \t\n]*(?:'(?:[^%']|%%)+'[ \t\n+]*)+\)[ \t\n]*%",
>       "don't apply % on non-format string"),
> +    (r'[^"\t ][ \t\n]*(?:"(?:[ xnopq]|%%)+"[ \t\n+]*)+%',
> +     "don't apply % on non-format string\n" +
> +     " (or use '+' for string concatenation to avoid ambiguous cases)"),
> +    (r"[^'\t ][ \t\n]*(?:'(?:[ xnopq]|%%)+'[ \t\n+]*)+%",
> +     "don't apply % on non-format string\n" +
> +     " (or use '+' for string concatenation to avoid ambiguous cases)"),
>      (r'(\w|\)),\w', "missing whitespace after ,"),
>      (r'(\w|\))[+/*\-<>]\w', "missing whitespace in expression"),
>      (r'^\s+(\w|\.)+=\w[^,()\n]*$', "missing whitespace in assignment"),
> diff --git a/tests/test-check-code.t b/tests/test-check-code.t
> --- a/tests/test-check-code.t
> +++ b/tests/test-check-code.t
> @@ -279,3 +279,40 @@
>     > print ("%% doesn't work as percent") % v
>     don't apply % on non-format string
>    [1]
> +
> +  $ cat > ./invalid-formatting2.py <<EOF
> +  > print "no percent character" % v
> +  > print "no " "percent " "character" % v
> +  > print "no " + "percent " + "character" % v
> +  > 
> +  > print "%% doesn't work as percent" % v
> +  > 
> +  > print "%s " "ambiguous " "case" % v
> +  > print "ambiguous case, %% and %s" % v
> +  > print "ambiguous case (", " %Y-%M-%D"
> +  > 
> +  > print "    ambiguous case"   " %Y-%M-%D"
> +  > print "non-ambiguous case" + " %Y-%M-%D"
> +  > EOF
> +  $ "$check_code" ./invalid-formatting2.py
> +  ./invalid-formatting2.py:1:
> +   > print "no percent character" % v
> +   don't apply % on non-format string
> +   (or use '+' for string concatenation to avoid ambiguous cases)
> +  ./invalid-formatting2.py:2:
> +   > print "no " "percent " "character" % v
> +   don't apply % on non-format string
> +   (or use '+' for string concatenation to avoid ambiguous cases)
> +  ./invalid-formatting2.py:3:
> +   > print "no " + "percent " + "character" % v
> +   don't apply % on non-format string
> +   (or use '+' for string concatenation to avoid ambiguous cases)
> +  ./invalid-formatting2.py:5:
> +   > print "%% doesn't work as percent" % v
> +   don't apply % on non-format string
> +   (or use '+' for string concatenation to avoid ambiguous cases)
> +  ./invalid-formatting2.py:11:
> +   > print "    ambiguous case"   " %Y-%M-%D"
> +   don't apply % on non-format string
> +   (or use '+' for string concatenation to avoid ambiguous cases)
> +  [1]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp

Patch

diff --git a/contrib/check-code.py b/contrib/check-code.py
--- a/contrib/check-code.py
+++ b/contrib/check-code.py
@@ -204,6 +204,12 @@ 
      "don't apply % on non-format string"),
     (r"\([ \t\n]*(?:'(?:[^%']|%%)+'[ \t\n+]*)+\)[ \t\n]*%",
      "don't apply % on non-format string"),
+    (r'[^"\t ][ \t\n]*(?:"(?:[ xnopq]|%%)+"[ \t\n+]*)+%',
+     "don't apply % on non-format string\n" +
+     " (or use '+' for string concatenation to avoid ambiguous cases)"),
+    (r"[^'\t ][ \t\n]*(?:'(?:[ xnopq]|%%)+'[ \t\n+]*)+%",
+     "don't apply % on non-format string\n" +
+     " (or use '+' for string concatenation to avoid ambiguous cases)"),
     (r'(\w|\)),\w', "missing whitespace after ,"),
     (r'(\w|\))[+/*\-<>]\w', "missing whitespace in expression"),
     (r'^\s+(\w|\.)+=\w[^,()\n]*$', "missing whitespace in assignment"),
diff --git a/tests/test-check-code.t b/tests/test-check-code.t
--- a/tests/test-check-code.t
+++ b/tests/test-check-code.t
@@ -279,3 +279,40 @@ 
    > print ("%% doesn't work as percent") % v
    don't apply % on non-format string
   [1]
+
+  $ cat > ./invalid-formatting2.py <<EOF
+  > print "no percent character" % v
+  > print "no " "percent " "character" % v
+  > print "no " + "percent " + "character" % v
+  > 
+  > print "%% doesn't work as percent" % v
+  > 
+  > print "%s " "ambiguous " "case" % v
+  > print "ambiguous case, %% and %s" % v
+  > print "ambiguous case (", " %Y-%M-%D"
+  > 
+  > print "    ambiguous case"   " %Y-%M-%D"
+  > print "non-ambiguous case" + " %Y-%M-%D"
+  > EOF
+  $ "$check_code" ./invalid-formatting2.py
+  ./invalid-formatting2.py:1:
+   > print "no percent character" % v
+   don't apply % on non-format string
+   (or use '+' for string concatenation to avoid ambiguous cases)
+  ./invalid-formatting2.py:2:
+   > print "no " "percent " "character" % v
+   don't apply % on non-format string
+   (or use '+' for string concatenation to avoid ambiguous cases)
+  ./invalid-formatting2.py:3:
+   > print "no " + "percent " + "character" % v
+   don't apply % on non-format string
+   (or use '+' for string concatenation to avoid ambiguous cases)
+  ./invalid-formatting2.py:5:
+   > print "%% doesn't work as percent" % v
+   don't apply % on non-format string
+   (or use '+' for string concatenation to avoid ambiguous cases)
+  ./invalid-formatting2.py:11:
+   > print "    ambiguous case"   " %Y-%M-%D"
+   don't apply % on non-format string
+   (or use '+' for string concatenation to avoid ambiguous cases)
+  [1]