Patchwork [5,of,6] check-code: detect "missing _() in ui message" more exactly

login
register
mail settings
Submitter Katsunori FUJIWARA
Date May 31, 2016, 12:15 p.m.
Message ID <d01e8eeec80bcd1057fc.1464696939@feefifofum>
Download mbox | patch
Permalink /patch/15280/
State Changes Requested
Headers show

Comments

Katsunori FUJIWARA - May 31, 2016, 12:15 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1464696150 -32400
#      Tue May 31 21:02:30 2016 +0900
# Node ID d01e8eeec80bcd1057fce6d70dba942bc92100a5
# Parent  353c034224f9e213ae82594eaecf40cb49638e19
check-code: detect "missing _() in ui message" more exactly

Before this patch, "missing _() in ui message" rule overlooks
translatable message, which starts with other than alphabet.

To detect "missing _() in ui message" more exactly, this patch
improves the regexp with assumptions below.

  - sequence consisting of below might precede "translatable message"
    in same string token

    - formatting string, which starts with '%'
    - escaped character, which starts with 'b' (as replacement of '\\'), or
    - characters other than '%', 'b' and 'x' (as replacement of alphabet)

  - any string tokens might precede a string token, which contains
    "translatable message"

This patch also applies "()" instead of "_()" on messages below to
hide false-positives:

  - messages for ui.debug() or debug commands/tools
    - contrib/debugshell.py
    - hgext/win32mbcs.py (ui.write() is used, though)
    - mercurial/commands.py
      - _debugchangegroup
      - debugindex
      - debugrevlog
      - debugrevspec
      - debugtemplate

  - untranslatable messages
    - doc/gendoc.py (ReST specific text)
    - hgext/hgk.py (permission string)
    - hgext/keyword.py (text written into configuration file)
    - mercurial/cmdutil.py (formatting strings for JSON)
timeless - May 31, 2016, 6:57 p.m.
FUJIWARA Katsunori wrote:
> This patch also applies "()" instead of "_()" on messages below to
> hide false-positives:

I'd really rather have a function for this. See the other thread where
you wanted to remove `_()`.

It's also best as a distinct commit.
Yuya Nishihara - June 1, 2016, 2:56 p.m.
On Tue, 31 May 2016 14:57:52 -0400, timeless wrote:
> FUJIWARA Katsunori wrote:
> > This patch also applies "()" instead of "_()" on messages below to
> > hide false-positives:
> 
> I'd really rather have a function for this. See the other thread where
> you wanted to remove `_()`.

I don't get it. Why do we need a function for py3k? Can you elaborate?

> +    (r'ui\.(status|progress|write|note|warn)\('
> +     # any strings might precede a string, which contains translatable message
> +     '[ \t\n]*(([\'"]|\'\'\'|""")[^\'"]*([\'"]|\'\'\'|""")[ \t\n]+)*'
> +     # sequence consisting of below might precede translatable message
> +     # - formatting string ("% 10s", "%05d", "% -3.2f", "%*s" and so on)
> +     # - escaped character ("\\", "\n", "\0", and so on)
> +     # - character other than '%', 'b' (as '\') and 'x' (as alphabet)
> +     #   (this can't use [^...] style, because _preparepats() forcibly
> +     #    adds '\n' to such pattern )
> +     '([\'"]|\'\'\'|""")'
> +     '((%([ n]?[PM]?([0-9p]+|A))?x)|b[bnx]|[ \nnpqAPMo])*x',

r'' prefix is necessary for all string fragments. You might be interested in
extended format (?x).

https://docs.python.org/2.7/library/re.html#re.X
timeless - June 1, 2016, 4:25 p.m.
Yuya Nishihara wrote:
> timeless wrote:
> > FUJIWARA Katsunori wrote:
> > > This patch also applies "()" instead of "_()" on messages below to
> > > hide false-positives:
> >
> > I'd really rather have a function for this. See the other thread where
> > you wanted to remove `_()`.
>
> I don't get it. Why do we need a function for py3k? Can you elaborate?

While we like to think about localized strings as Unicode and unlocalized
strings as bytes, at the end of the day our output stream (stdout/stderr)
can only be one or the other. It's going to be Unicode, having some bytes
sent to the thing that tries to generate output in Unicode means that in
one code path we're stuck doing a conversion. We can do it there (in warn),
I suppose, but I don't think we are today, and in some ways it's easier to
just have all callers provide the same encoding.
Katsunori FUJIWARA - June 1, 2016, 6:54 p.m.
At Wed, 1 Jun 2016 23:56:10 +0900,
Yuya Nishihara wrote:
> 
> On Tue, 31 May 2016 14:57:52 -0400, timeless wrote:
> > FUJIWARA Katsunori wrote:
> > > This patch also applies "()" instead of "_()" on messages below to
> > > hide false-positives:
> > 
> > I'd really rather have a function for this. See the other thread where
> > you wanted to remove `_()`.
> 
> I don't get it. Why do we need a function for py3k? Can you elaborate?
> 
> > +    (r'ui\.(status|progress|write|note|warn)\('
> > +     # any strings might precede a string, which contains translatable message
> > +     '[ \t\n]*(([\'"]|\'\'\'|""")[^\'"]*([\'"]|\'\'\'|""")[ \t\n]+)*'
> > +     # sequence consisting of below might precede translatable message
> > +     # - formatting string ("% 10s", "%05d", "% -3.2f", "%*s" and so on)
> > +     # - escaped character ("\\", "\n", "\0", and so on)
> > +     # - character other than '%', 'b' (as '\') and 'x' (as alphabet)
> > +     #   (this can't use [^...] style, because _preparepats() forcibly
> > +     #    adds '\n' to such pattern )
> > +     '([\'"]|\'\'\'|""")'
> > +     '((%([ n]?[PM]?([0-9p]+|A))?x)|b[bnx]|[ \nnpqAPMo])*x',
> 
> r'' prefix is necessary for all string fragments. You might be interested in
> extended format (?x).
> 
> https://docs.python.org/2.7/library/re.html#re.X

I'll post revised one.

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp
Yuya Nishihara - June 2, 2016, 1:36 p.m.
On Wed, 1 Jun 2016 12:25:09 -0400, timeless wrote:
> Yuya Nishihara wrote:
> > timeless wrote:  
> > > FUJIWARA Katsunori wrote:  
> > > > This patch also applies "()" instead of "_()" on messages below to
> > > > hide false-positives:  
> > >
> > > I'd really rather have a function for this. See the other thread where
> > > you wanted to remove `_()`.  
> >
> > I don't get it. Why do we need a function for py3k? Can you elaborate?  
> 
> While we like to think about localized strings as Unicode and unlocalized
> strings as bytes,

Why do they have to be different? I think it should be simpler if they are
both bytes, (or unicodes if we want.)

> at the end of the day our output stream (stdout/stderr)
> can only be one or the other. It's going to be Unicode, having some bytes
> sent to the thing that tries to generate output in Unicode means that in
> one code path we're stuck doing a conversion. We can do it there (in warn),
> I suppose, but I don't think we are today, and in some ways it's easier to
> just have all callers provide the same encoding.

I'm not sure if I can get what you mean, but if you're going to make the
underlying stdout/err streams accept unicodes, that is wrong. We need raw
streams because we have to process binary data such as reading/writing patches
from/to stdio. Encoding conversion is lossy even if it raises no error.

So, ui.fout and ui.write() should accept bytes. We could make ui.warn(),
.note(), etc. accept unicodes, but it would be source of trouble to mix bytes
and unicodes in weakly-typed language. Also, from my experience, many
developers don't understand unicode nor character encoding.
timeless - June 2, 2016, 5:29 p.m.
On Thu, Jun 2, 2016 at 9:36 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> Why do they have to be different? I think it should be simpler if they are
> both bytes, (or unicodes if we want.)

They shouldn't be different.

today, in py3, _() takes one sort of thing and returns a different thing.

(It's possible it's actually taking Unicode and returning bytes, I
haven't looked at the types much recently.)

It's also possible that indygreg will save me.

> I'm not sure if I can get what you mean, but if you're going to make the
> underlying stdout/err streams accept unicodes, that is wrong. We need raw
> streams because we have to process binary data such as reading/writing patches
> from/to stdio. Encoding conversion is lossy even if it raises no error.

I understand this.

All I care about is not having mixed types. I don't care if we only
use bytes or only use Unicode.
I object to having two APIs that are siblings taking different types
and trying to write to the same stream.

> So, ui.fout and ui.write() should accept bytes. We could make ui.warn(),
> .note(), etc. accept unicodes,

> but it would be source of trouble to mix bytes
> and unicodes in weakly-typed language.

this is the main thrust of my complaint. We should not use different
types for things, but by using _() for some calls and not _() for
other calls, the result is that we are mixing bytes and Unicode.

note: it's possible this is moot by indygreg's work. I need to apply
it and check... That'd be really nice.
Yuya Nishihara - June 3, 2016, 3:27 p.m.
On Thu, 2 Jun 2016 13:29:54 -0400, timeless wrote:
> On Thu, Jun 2, 2016 at 9:36 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > but it would be source of trouble to mix bytes
> > and unicodes in weakly-typed language.  
> 
> this is the main thrust of my complaint. We should not use different
> types for things, but by using _() for some calls and not _() for
> other calls, the result is that we are mixing bytes and Unicode.

Perhaps both input and output of _() can be bytes.

> note: it's possible this is moot by indygreg's work. I need to apply
> it and check... That'd be really nice.

I hope so. IIRC, he's trying to rewrite '' to b'' automagically, and work
around non-trivial cases manually.
timeless - June 3, 2016, 4:23 p.m.
*sigh*, at this point I can't figure out what the _() story was doing.
I'm just going to give up and hope that indygreg can save me quickly.

Patch

diff --git a/contrib/check-code.py b/contrib/check-code.py
--- a/contrib/check-code.py
+++ b/contrib/check-code.py
@@ -329,7 +329,17 @@  pypats = [
     # rules depending on implementation of repquote()
     (r' x+[xpqo%APM][\'"]\n\s+[\'"]x',
      'string join across lines with no space'),
-    (r'ui\.(status|progress|write|note|warn)\([\'\"]x',
+    (r'ui\.(status|progress|write|note|warn)\('
+     # any strings might precede a string, which contains translatable message
+     '[ \t\n]*(([\'"]|\'\'\'|""")[^\'"]*([\'"]|\'\'\'|""")[ \t\n]+)*'
+     # sequence consisting of below might precede translatable message
+     # - formatting string ("% 10s", "%05d", "% -3.2f", "%*s" and so on)
+     # - escaped character ("\\", "\n", "\0", and so on)
+     # - character other than '%', 'b' (as '\') and 'x' (as alphabet)
+     #   (this can't use [^...] style, because _preparepats() forcibly
+     #    adds '\n' to such pattern )
+     '([\'"]|\'\'\'|""")'
+     '((%([ n]?[PM]?([0-9p]+|A))?x)|b[bnx]|[ \nnpqAPMo])*x',
      "missing _() in ui message (use () to hide false-positives)"),
   ],
   # warnings
diff --git a/contrib/debugshell.py b/contrib/debugshell.py
--- a/contrib/debugshell.py
+++ b/contrib/debugshell.py
@@ -52,7 +52,7 @@  def debugshell(ui, repo, **opts):
         with demandimport.deactivated():
             __import__(pdbmap[debugger])
     except ImportError:
-        ui.warn("%s debugger specified but %s module was not found\n"
+        ui.warn(("%s debugger specified but %s module was not found\n")
                 % (debugger, pdbmap[debugger]))
         debugger = 'pdb'
 
diff --git a/doc/gendoc.py b/doc/gendoc.py
--- a/doc/gendoc.py
+++ b/doc/gendoc.py
@@ -117,11 +117,11 @@  def showdoc(ui):
     ui.write(_("This section contains help for extensions that are "
                "distributed together with Mercurial. Help for other "
                "extensions is available in the help system."))
-    ui.write("\n\n"
+    ui.write(("\n\n"
              ".. contents::\n"
              "   :class: htmlonly\n"
              "   :local:\n"
-             "   :depth: 1\n\n")
+             "   :depth: 1\n\n"))
 
     for extensionname in sorted(allextensionnames()):
         mod = extensions.load(ui, extensionname, None)
diff --git a/hgext/hgk.py b/hgext/hgk.py
--- a/hgext/hgk.py
+++ b/hgext/hgk.py
@@ -81,13 +81,13 @@  def difftree(ui, repo, node1=None, node2
 
         for f in modified:
             # TODO get file permissions
-            ui.write(":100664 100664 %s %s M\t%s\t%s\n" %
+            ui.write((":100664 100664 %s %s M\t%s\t%s\n") %
                      (short(mmap[f]), short(mmap2[f]), f, f))
         for f in added:
-            ui.write(":000000 100664 %s %s N\t%s\t%s\n" %
+            ui.write((":000000 100664 %s %s N\t%s\t%s\n") %
                      (empty, short(mmap2[f]), f, f))
         for f in removed:
-            ui.write(":100664 000000 %s %s D\t%s\t%s\n" %
+            ui.write((":100664 000000 %s %s D\t%s\t%s\n") %
                      (short(mmap[f]), empty, f, f))
     ##
 
diff --git a/hgext/keyword.py b/hgext/keyword.py
--- a/hgext/keyword.py
+++ b/hgext/keyword.py
@@ -455,7 +455,7 @@  def demo(ui, repo, *args, **opts):
 
     uisetup(ui)
     reposetup(ui, repo)
-    ui.write('[extensions]\nkeyword =\n')
+    ui.write(('[extensions]\nkeyword =\n'))
     demoitems('keyword', ui.configitems('keyword'))
     demoitems('keywordset', ui.configitems('keywordset'))
     demoitems('keywordmaps', kwmaps.iteritems())
diff --git a/hgext/win32mbcs.py b/hgext/win32mbcs.py
--- a/hgext/win32mbcs.py
+++ b/hgext/win32mbcs.py
@@ -192,5 +192,5 @@  def extsetup(ui):
         # command line options is not yet applied when
         # extensions.loadall() is called.
         if '--debug' in sys.argv:
-            ui.write("[win32mbcs] activated with encoding: %s\n"
+            ui.write(("[win32mbcs] activated with encoding: %s\n")
                      % _encoding)
diff --git a/mercurial/cmdutil.py b/mercurial/cmdutil.py
--- a/mercurial/cmdutil.py
+++ b/mercurial/cmdutil.py
@@ -1405,24 +1405,24 @@  class jsonchangeset(changeset_printer):
             self.ui.write(",\n {")
 
         if self.ui.quiet:
-            self.ui.write('\n  "rev": %s' % jrev)
-            self.ui.write(',\n  "node": %s' % jnode)
+            self.ui.write(('\n  "rev": %s') % jrev)
+            self.ui.write((',\n  "node": %s') % jnode)
             self.ui.write('\n }')
             return
 
-        self.ui.write('\n  "rev": %s' % jrev)
-        self.ui.write(',\n  "node": %s' % jnode)
-        self.ui.write(',\n  "branch": "%s"' % j(ctx.branch()))
-        self.ui.write(',\n  "phase": "%s"' % ctx.phasestr())
-        self.ui.write(',\n  "user": "%s"' % j(ctx.user()))
-        self.ui.write(',\n  "date": [%d, %d]' % ctx.date())
-        self.ui.write(',\n  "desc": "%s"' % j(ctx.description()))
-
-        self.ui.write(',\n  "bookmarks": [%s]' %
+        self.ui.write(('\n  "rev": %s') % jrev)
+        self.ui.write((',\n  "node": %s') % jnode)
+        self.ui.write((',\n  "branch": "%s"') % j(ctx.branch()))
+        self.ui.write((',\n  "phase": "%s"') % ctx.phasestr())
+        self.ui.write((',\n  "user": "%s"') % j(ctx.user()))
+        self.ui.write((',\n  "date": [%d, %d]') % ctx.date())
+        self.ui.write((',\n  "desc": "%s"') % j(ctx.description()))
+
+        self.ui.write((',\n  "bookmarks": [%s]') %
                       ", ".join('"%s"' % j(b) for b in ctx.bookmarks()))
-        self.ui.write(',\n  "tags": [%s]' %
+        self.ui.write((',\n  "tags": [%s]') %
                       ", ".join('"%s"' % j(t) for t in ctx.tags()))
-        self.ui.write(',\n  "parents": [%s]' %
+        self.ui.write((',\n  "parents": [%s]') %
                       ", ".join('"%s"' % c.hex() for c in ctx.parents()))
 
         if self.ui.debugflag:
@@ -1430,26 +1430,26 @@  class jsonchangeset(changeset_printer):
                 jmanifestnode = 'null'
             else:
                 jmanifestnode = '"%s"' % hex(ctx.manifestnode())
-            self.ui.write(',\n  "manifest": %s' % jmanifestnode)
-
-            self.ui.write(',\n  "extra": {%s}' %
+            self.ui.write((',\n  "manifest": %s') % jmanifestnode)
+
+            self.ui.write((',\n  "extra": {%s}') %
                           ", ".join('"%s": "%s"' % (j(k), j(v))
                                     for k, v in ctx.extra().items()))
 
             files = ctx.p1().status(ctx)
-            self.ui.write(',\n  "modified": [%s]' %
+            self.ui.write((',\n  "modified": [%s]') %
                           ", ".join('"%s"' % j(f) for f in files[0]))
-            self.ui.write(',\n  "added": [%s]' %
+            self.ui.write((',\n  "added": [%s]') %
                           ", ".join('"%s"' % j(f) for f in files[1]))
-            self.ui.write(',\n  "removed": [%s]' %
+            self.ui.write((',\n  "removed": [%s]') %
                           ", ".join('"%s"' % j(f) for f in files[2]))
 
         elif self.ui.verbose:
-            self.ui.write(',\n  "files": [%s]' %
+            self.ui.write((',\n  "files": [%s]') %
                           ", ".join('"%s"' % j(f) for f in ctx.files()))
 
             if copies:
-                self.ui.write(',\n  "copies": {%s}' %
+                self.ui.write((',\n  "copies": {%s}') %
                               ", ".join('"%s": "%s"' % (j(k), j(v))
                                                         for k, v in copies))
 
@@ -1463,12 +1463,13 @@  class jsonchangeset(changeset_printer):
                 self.ui.pushbuffer()
                 diffordiffstat(self.ui, self.repo, diffopts, prev, node,
                                match=matchfn, stat=True)
-                self.ui.write(',\n  "diffstat": "%s"' % j(self.ui.popbuffer()))
+                self.ui.write((',\n  "diffstat": "%s"')
+                              % j(self.ui.popbuffer()))
             if diff:
                 self.ui.pushbuffer()
                 diffordiffstat(self.ui, self.repo, diffopts, prev, node,
                                match=matchfn, stat=False)
-                self.ui.write(',\n  "diff": "%s"' % j(self.ui.popbuffer()))
+                self.ui.write((',\n  "diff": "%s"') % j(self.ui.popbuffer()))
 
         self.ui.write("\n }")
 
diff --git a/mercurial/commands.py b/mercurial/commands.py
--- a/mercurial/commands.py
+++ b/mercurial/commands.py
@@ -2095,7 +2095,7 @@  def debugbundle(ui, bundlepath, all=None
 def _debugchangegroup(ui, gen, all=None, indent=0, **opts):
     indent_string = ' ' * indent
     if all:
-        ui.write("%sformat: id, p1, p2, cset, delta base, len(delta)\n"
+        ui.write(("%sformat: id, p1, p2, cset, delta base, len(delta)\n")
                  % indent_string)
 
         def showchunks(named):
@@ -2562,12 +2562,12 @@  def debugindex(ui, repo, file_=None, **o
         break
 
     if format == 0:
-        ui.write("   rev    offset  length " + basehdr + " linkrev"
-                 " %s %s p2\n" % ("nodeid".ljust(idlen), "p1".ljust(idlen)))
+        ui.write(("   rev    offset  length " + basehdr + " linkrev"
+                 " %s %s p2\n") % ("nodeid".ljust(idlen), "p1".ljust(idlen)))
     elif format == 1:
-        ui.write("   rev flag   offset   length"
+        ui.write(("   rev flag   offset   length"
                  "     size " + basehdr + "   link     p1     p2"
-                 " %s\n" % "nodeid".rjust(idlen))
+                 " %s\n") % "nodeid".rjust(idlen))
 
     for i in r:
         node = r.node(i)
@@ -3329,8 +3329,8 @@  def debugrevlog(ui, repo, file_=None, **
 
     if opts.get("dump"):
         numrevs = len(r)
-        ui.write("# rev p1rev p2rev start   end deltastart base   p1   p2"
-                 " rawsize totalsize compression heads chainlen\n")
+        ui.write(("# rev p1rev p2rev start   end deltastart base   p1   p2"
+                 " rawsize totalsize compression heads chainlen\n"))
         ts = 0
         heads = set()
 
@@ -3519,18 +3519,19 @@  def debugrevspec(ui, repo, expr, **opts)
         ui.note(revset.prettyformat(tree), "\n")
         newtree = revset.expandaliases(ui, tree)
         if newtree != tree:
-            ui.note("* expanded:\n", revset.prettyformat(newtree), "\n")
+            ui.note(("* expanded:\n"), revset.prettyformat(newtree), "\n")
         tree = newtree
         newtree = revset.foldconcat(tree)
         if newtree != tree:
-            ui.note("* concatenated:\n", revset.prettyformat(newtree), "\n")
+            ui.note(("* concatenated:\n"), revset.prettyformat(newtree), "\n")
         if opts["optimize"]:
             optimizedtree = revset.optimize(newtree)
-            ui.note("* optimized:\n", revset.prettyformat(optimizedtree), "\n")
+            ui.note(("* optimized:\n"),
+                    revset.prettyformat(optimizedtree), "\n")
     func = revset.match(ui, expr, repo)
     revs = func(repo)
     if ui.verbose:
-        ui.note("* set:\n", revset.prettyformatset(revs), "\n")
+        ui.note(("* set:\n"), revset.prettyformatset(revs), "\n")
     for c in revs:
         ui.write("%s\n" % c)
 
@@ -3685,7 +3686,7 @@  def debugtemplate(ui, repo, tmpl, **opts
         ui.note(templater.prettyformat(tree), '\n')
         newtree = templater.expandaliases(tree, aliases)
         if newtree != tree:
-            ui.note("* expanded:\n", templater.prettyformat(newtree), '\n')
+            ui.note(("* expanded:\n"), templater.prettyformat(newtree), '\n')
 
     mapfile = None
     if revs is None: