Patchwork [4,of,5,STABLE,V2] check-code: detect incorrect format operation on the string not including "%"

login
register
mail settings
Submitter Katsunori FUJIWARA
Date April 15, 2014, 6:12 p.m.
Message ID <565620d1b409dcd0c46c.1397585552@juju>
Download mbox | patch
Permalink /patch/4376/
State Deferred
Headers show

Comments

Katsunori FUJIWARA - April 15, 2014, 6:12 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1397585100 -32400
#      Wed Apr 16 03:05:00 2014 +0900
# Branch stable
# Node ID 565620d1b409dcd0c46c462e1ad66c622b2a8b85
# Parent  90d3ab077aa7a5192eeb0cdaf9d9e4228fbb7cb8
check-code: detect incorrect format operation on the string not including "%"

This patch introduces new patterns to detect incorrect format
operations on the string not including any "%" specifiers: it causes
TypeError at runtime.

Patterns newly introduced by this patch treat format operations in the
style below. This mainly focuses on format operations applied on the
value returned from "_()".

    ("format string") % v

Newly introduced patterns base on the ones for "% inside _()"
detection, and should avoid known problems fixed in preceding patches.

  - the format string may consist of multiple string components
    concatenated implicitly (by whitespaces) or explicitly (by "+")

        ("format"   " string") % v
        ("format" + " string") % v

  - the format string and mapping operator may not be placed in the
    same line

        (("format string")
         % v)

  - there may be some leading whitespaces before the format string

        (
         "format string") % v

To aware "%" in strings, this patch prevents "repquote()" from
normalizing "%" into another character.
Siddharth Agarwal - April 17, 2014, 4:19 a.m.
On 04/15/2014 11:12 AM, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1397585100 -32400
> #      Wed Apr 16 03:05:00 2014 +0900
> # Branch stable
> # Node ID 565620d1b409dcd0c46c462e1ad66c622b2a8b85
> # Parent  90d3ab077aa7a5192eeb0cdaf9d9e4228fbb7cb8
> check-code: detect incorrect format operation on the string not including "%"

The first three patches look good, but it might be time to break out an 
actual parser for these instead.

>
> This patch introduces new patterns to detect incorrect format
> operations on the string not including any "%" specifiers: it causes
> TypeError at runtime.
>
> Patterns newly introduced by this patch treat format operations in the
> style below. This mainly focuses on format operations applied on the
> value returned from "_()".
>
>      ("format string") % v
>
> Newly introduced patterns base on the ones for "% inside _()"
> detection, and should avoid known problems fixed in preceding patches.
>
>    - the format string may consist of multiple string components
>      concatenated implicitly (by whitespaces) or explicitly (by "+")
>
>          ("format"   " string") % v
>          ("format" + " string") % v
>
>    - the format string and mapping operator may not be placed in the
>      same line
>
>          (("format string")
>           % v)
>
>    - there may be some leading whitespaces before the format string
>
>          (
>           "format string") % v
>
> To aware "%" in strings, this patch prevents "repquote()" from
> normalizing "%" into another character.
>
> diff --git a/contrib/check-code.py b/contrib/check-code.py
> --- a/contrib/check-code.py
> +++ b/contrib/check-code.py
> @@ -44,7 +44,7 @@
>           if i > 255:
>               return 'u'
>           c = chr(i)
> -        if c in ' \n':
> +        if c in ' \n%':
>               return c
>           if c.isalpha():
>               return 'x'
> @@ -200,6 +200,10 @@
>       (r'\S;\s*\n', "semicolon"),
>       (r'[^_]_\([ \t\n]*(?:"[^"]+"[ \t\n+]*)+%', "don't use % inside _()"),
>       (r"[^_]_\([ \t\n]*(?:'[^']+'[ \t\n+]*)+%", "don't use % inside _()"),
> +    (r'\([ \t\n]*(?:"(?:[^%"]|%%)+"[ \t\n+]*)+\)[ \t\n]*%',
> +     "don't apply % on non-format string"),
> +    (r"\([ \t\n]*(?:'(?:[^%']|%%)+'[ \t\n+]*)+\)[ \t\n]*%",
> +     "don't apply % on non-format string"),
>       (r'(\w|\)),\w', "missing whitespace after ,"),
>       (r'(\w|\))[+/*\-<>]\w', "missing whitespace in expression"),
>       (r'^\s+(\w|\.)+=\w[^,()\n]*$', "missing whitespace in assignment"),
> diff --git a/tests/test-check-code.t b/tests/test-check-code.t
> --- a/tests/test-check-code.t
> +++ b/tests/test-check-code.t
> @@ -280,3 +280,37 @@
>      > print _(
>      don't use % inside _()
>     [1]
> +
> +  $ cat > ./invalid-formatting.py <<EOF
> +  > print ("no percent character") % v
> +  > print ("no "
> +  >        "percent"
> +  >        "character") % v
> +  > print (
> +  >        "no " +
> +  >        "percent" +
> +  >        "character") % v
> +  > print (("no percent character")
> +  >        % v)
> +  > print ("%% doesn't work as percent") % v
> +  >
> +  > print ("ambiguous case, %% and %s") % v
> +  > print ("ambiguous case (" " %Y-%M-%D")
> +  > EOF
> +  $ "$check_code" ./invalid-formatting.py
> +  ./invalid-formatting.py:1:
> +   > print ("no percent character") % v
> +   don't apply % on non-format string
> +  ./invalid-formatting.py:2:
> +   > print ("no "
> +   don't apply % on non-format string
> +  ./invalid-formatting.py:5:
> +   > print (
> +   don't apply % on non-format string
> +  ./invalid-formatting.py:9:
> +   > print (("no percent character")
> +   don't apply % on non-format string
> +  ./invalid-formatting.py:11:
> +   > print ("%% doesn't work as percent") % v
> +   don't apply % on non-format string
> +  [1]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel

Patch

diff --git a/contrib/check-code.py b/contrib/check-code.py
--- a/contrib/check-code.py
+++ b/contrib/check-code.py
@@ -44,7 +44,7 @@ 
         if i > 255:
             return 'u'
         c = chr(i)
-        if c in ' \n':
+        if c in ' \n%':
             return c
         if c.isalpha():
             return 'x'
@@ -200,6 +200,10 @@ 
     (r'\S;\s*\n', "semicolon"),
     (r'[^_]_\([ \t\n]*(?:"[^"]+"[ \t\n+]*)+%', "don't use % inside _()"),
     (r"[^_]_\([ \t\n]*(?:'[^']+'[ \t\n+]*)+%", "don't use % inside _()"),
+    (r'\([ \t\n]*(?:"(?:[^%"]|%%)+"[ \t\n+]*)+\)[ \t\n]*%',
+     "don't apply % on non-format string"),
+    (r"\([ \t\n]*(?:'(?:[^%']|%%)+'[ \t\n+]*)+\)[ \t\n]*%",
+     "don't apply % on non-format string"),
     (r'(\w|\)),\w', "missing whitespace after ,"),
     (r'(\w|\))[+/*\-<>]\w', "missing whitespace in expression"),
     (r'^\s+(\w|\.)+=\w[^,()\n]*$', "missing whitespace in assignment"),
diff --git a/tests/test-check-code.t b/tests/test-check-code.t
--- a/tests/test-check-code.t
+++ b/tests/test-check-code.t
@@ -280,3 +280,37 @@ 
    > print _(
    don't use % inside _()
   [1]
+
+  $ cat > ./invalid-formatting.py <<EOF
+  > print ("no percent character") % v
+  > print ("no "
+  >        "percent"
+  >        "character") % v
+  > print (
+  >        "no " +
+  >        "percent" +
+  >        "character") % v
+  > print (("no percent character")
+  >        % v)
+  > print ("%% doesn't work as percent") % v
+  > 
+  > print ("ambiguous case, %% and %s") % v
+  > print ("ambiguous case (" " %Y-%M-%D")
+  > EOF
+  $ "$check_code" ./invalid-formatting.py
+  ./invalid-formatting.py:1:
+   > print ("no percent character") % v
+   don't apply % on non-format string
+  ./invalid-formatting.py:2:
+   > print ("no "
+   don't apply % on non-format string
+  ./invalid-formatting.py:5:
+   > print (
+   don't apply % on non-format string
+  ./invalid-formatting.py:9:
+   > print (("no percent character")
+   don't apply % on non-format string
+  ./invalid-formatting.py:11:
+   > print ("%% doesn't work as percent") % v
+   don't apply % on non-format string
+  [1]