Patchwork [Bug,4926] New: Cannot roundtrip multibyte utf8 char with the templater json escaper

mail settings
Date Nov. 2, 2015, 1:36 p.m.
Message ID <>
Download mbox | patch
Permalink /patch/11257/
State Changes Requested
Headers show

Comments - Nov. 2, 2015, 1:36 p.m.

            Bug ID: 4926
           Summary: Cannot roundtrip multibyte utf8 char with the
                    templater json escaper
           Product: Mercurial
           Version: unspecified
          Hardware: PC
                OS: Linux
            Status: UNCONFIRMED
          Severity: bug
          Priority: wish
         Component: templater

Add this diff to the fuzzy testing of templatefilter.jsonescape:

Produce failure for string like: '\xc2\x80'

+  AssertionError: ('\xc2\x80', '\xc3\x82\xc2\x80', u'\xc2\x80')

What seems to happen is that the '\xc2\x80' byte (single unicode char) are
encoded (using '\u####' syntax) as two unicode chars. The Json decodeur read
that as two different unicode char. Trying to retrieve the byte version
(re-encoding to utf8) then fails to produce the same byte as the input.

Our test may be faultly here (retrieving the byte should maybe be achieved
another way, but I've no idea what that other way would be).


diff --git a/tests/test-template-engine.t b/tests/test-template-engine.t
--- a/tests/test-template-engine.t
+++ b/tests/test-template-engine.t
@@ -51,10 +51,12 @@  Fuzzing the unicode escaper to ensure it
   >>> from hypothesishelpers import *
   >>> import mercurial.templatefilters as tf
   >>> import json
   >>> @check(st.text().map(lambda s: s.encode('utf-8')))
   ... def testtfescapeproducesvalidjson(text):
-  ...     json.loads('"' + tf.jsonescape(text) + '"')
+  ...     uni = json.loads('"' + tf.jsonescape(text) + '"')
+  ...     result = uni.encode('utf-8')
+  ...     assert text == result, (text, result, uni)


   $ cd ..