Patchwork [Bug,4926] New: Cannot roundtrip multibyte utf8 char with the templater json escaper

login
register
mail settings
Submitter mercurial-bugs@selenic.com
Date Nov. 2, 2015, 1:36 p.m.
Message ID <bug-4926-285@https.bz.mercurial-scm.org/>
Download mbox | patch
Permalink /patch/11257/
State Changes Requested
Headers show

Comments

mercurial-bugs@selenic.com - Nov. 2, 2015, 1:36 p.m.
https://bz.mercurial-scm.org/show_bug.cgi?id=4926

            Bug ID: 4926
           Summary: Cannot roundtrip multibyte utf8 char with the
                    templater json escaper
           Product: Mercurial
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: UNCONFIRMED
          Severity: bug
          Priority: wish
         Component: templater
          Assignee: bugzilla@selenic.com
          Reporter: pierre-yves.david@ens-lyon.org
                CC: mercurial-devel@selenic.com

Add this diff to the fuzzy testing of templatefilter.jsonescape:


Produce failure for string like: '\xc2\x80'

+  AssertionError: ('\xc2\x80', '\xc3\x82\xc2\x80', u'\xc2\x80')

What seems to happen is that the '\xc2\x80' byte (single unicode char) are
encoded (using '\u####' syntax) as two unicode chars. The Json decodeur read
that as two different unicode char. Trying to retrieve the byte version
(re-encoding to utf8) then fails to produce the same byte as the input.

Our test may be faultly here (retrieving the byte should maybe be achieved
another way, but I've no idea what that other way would be).

Patch

diff --git a/tests/test-template-engine.t b/tests/test-template-engine.t
--- a/tests/test-template-engine.t
+++ b/tests/test-template-engine.t
@@ -51,10 +51,12 @@  Fuzzing the unicode escaper to ensure it
   >>> from hypothesishelpers import *
   >>> import mercurial.templatefilters as tf
   >>> import json
   >>> @check(st.text().map(lambda s: s.encode('utf-8')))
   ... def testtfescapeproducesvalidjson(text):
-  ...     json.loads('"' + tf.jsonescape(text) + '"')
+  ...     uni = json.loads('"' + tf.jsonescape(text) + '"')
+  ...     result = uni.encode('utf-8')
+  ...     assert text == result, (text, result, uni)

 #endif

   $ cd ..