Patchwork [STABLE] templatefilters: try round-trip utf-8 conversion by json filter (issue4933)

login
register
mail settings
Submitter Yuya Nishihara
Date Nov. 4, 2015, 3:29 p.m.
Message ID <26bb878bb7fafdb208c7.1446650957@mimosa>
Download mbox | patch
Permalink /patch/11290/
State Accepted
Headers show

Comments

Yuya Nishihara - Nov. 4, 2015, 3:29 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1446648495 -32400
#      Wed Nov 04 23:48:15 2015 +0900
# Branch stable
# Node ID 26bb878bb7fafdb208c7279da22ee415212b4dc6
# Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
templatefilters: try round-trip utf-8 conversion by json filter (issue4933)

As JSON string is known to be a unicode, we should try round-trip conversion
for localstr type. This patch tests localstr type explicitly because
encoding.fromlocal() may raise Abort for undecodable str, which is probably
not what we want. Maybe we can refactor json filter to use encoding module
more later.

Still "{desc|json}" can't round-trip because showdescription() modifies a
localstr object.
Matt Mackall - Nov. 4, 2015, 8:03 p.m.
On Thu, 2015-11-05 at 00:29 +0900, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1446648495 -32400
> #      Wed Nov 04 23:48:15 2015 +0900
> # Branch stable
> # Node ID 26bb878bb7fafdb208c7279da22ee415212b4dc6
> # Parent  58b7f3e93bbab749ab16c09df12aae5ba7880708
> templatefilters: try round-trip utf-8 conversion by json filter
> (issue4933)

Queued for stable, thanks.

> +json filter should not abort if it can't decode bytes:
> +(not sure the current behavior is right; we might want to use utf-8b 
> encoding?)
> +
> +  $ HGENCODING=ascii hg log -T "{'`cat utf-8`'|json}\n" -l1
> +  "\ufffd\ufffd"

Yep.

-- 
Mathematics is the supreme nostalgia of our time.

Patch

diff --git a/mercurial/templatefilters.py b/mercurial/templatefilters.py
--- a/mercurial/templatefilters.py
+++ b/mercurial/templatefilters.py
@@ -197,7 +197,11 @@  def json(obj):
         return {None: 'null', False: 'false', True: 'true'}[obj]
     elif isinstance(obj, int) or isinstance(obj, float):
         return str(obj)
+    elif isinstance(obj, encoding.localstr):
+        u = encoding.fromlocal(obj).decode('utf-8')  # can round-trip
+        return '"%s"' % jsonescape(u)
     elif isinstance(obj, str):
+        # no encoding.fromlocal() because it may abort if obj can't be decoded
         u = unicode(obj, encoding.encoding, 'replace')
         return '"%s"' % jsonescape(u)
     elif isinstance(obj, unicode):
diff --git a/tests/test-command-template.t b/tests/test-command-template.t
--- a/tests/test-command-template.t
+++ b/tests/test-command-template.t
@@ -3479,3 +3479,26 @@  Test broken string escapes:
   $ hg log -T "\\xy" -R a
   hg: parse error: invalid \x escape
   [255]
+
+Set up repository for non-ascii encoding tests:
+
+  $ hg init nonascii
+  $ cd nonascii
+  $ python <<EOF
+  > open('utf-8', 'w').write('\xc3\xa9')
+  > EOF
+  $ HGENCODING=utf-8 hg branch -q `cat utf-8`
+  $ HGENCODING=utf-8 hg ci -qAm 'non-ascii branch' utf-8
+
+json filter should try round-trip conversion to utf-8:
+
+  $ HGENCODING=ascii hg log -T "{branch|json}\n" -r0
+  "\u00e9"
+
+json filter should not abort if it can't decode bytes:
+(not sure the current behavior is right; we might want to use utf-8b encoding?)
+
+  $ HGENCODING=ascii hg log -T "{'`cat utf-8`'|json}\n" -l1
+  "\ufffd\ufffd"
+
+  $ cd ..