Patchwork [1,of,4,V2] encoding: change jsonmap to a list indexed by code point

login
register
mail settings
Submitter Yuya Nishihara
Date Feb. 9, 2016, 3:40 p.m.
Message ID <a5b995cfb26de1e1f90d.1455032416@mimosa>
Download mbox | patch
Permalink /patch/13060/
State Accepted
Headers show

Comments

Yuya Nishihara - Feb. 9, 2016, 3:40 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1454150494 -32400
#      Sat Jan 30 19:41:34 2016 +0900
# Node ID a5b995cfb26de1e1f90db90382e3332e227acf2e
# Parent  54c896f8bb799e98f1bfe0a4eb087a72b7e7345d
encoding: change jsonmap to a list indexed by code point

This is slightly faster and convenient to implement a paranoid escaping.

  $ python -m timeit \
  -s 'from mercurial import encoding; data = str(bytearray(xrange(128)))' \
  'encoding.jsonescape(data)'

  original:   100000 loops, best of 3: 15.1 usec per loop
  this patch: 100000 loops, best of 3: 13.7 usec per loop

Patch

diff --git a/mercurial/encoding.py b/mercurial/encoding.py
--- a/mercurial/encoding.py
+++ b/mercurial/encoding.py
@@ -378,7 +378,7 @@  class normcasespecs(object):
     upper = 1
     other = 0
 
-_jsonmap = {}
+_jsonmap = []
 
 def jsonescape(s):
     '''returns a string suitable for JSON
@@ -408,21 +408,18 @@  def jsonescape(s):
     '''
 
     if not _jsonmap:
-        for x in xrange(32):
-            _jsonmap[chr(x)] = "\\u%04x" % x
-        for x in xrange(32, 256):
-            c = chr(x)
-            _jsonmap[c] = c
-        _jsonmap['\x7f'] = '\\u007f'
-        _jsonmap['\t'] = '\\t'
-        _jsonmap['\n'] = '\\n'
-        _jsonmap['\"'] = '\\"'
-        _jsonmap['\\'] = '\\\\'
-        _jsonmap['\b'] = '\\b'
-        _jsonmap['\f'] = '\\f'
-        _jsonmap['\r'] = '\\r'
+        _jsonmap.extend("\\u%04x" % x for x in xrange(32))
+        _jsonmap.extend(chr(x) for x in xrange(32, 256))
+        _jsonmap[0x7f] = '\\u007f'
+        _jsonmap[0x09] = '\\t'
+        _jsonmap[0x0a] = '\\n'
+        _jsonmap[0x22] = '\\"'
+        _jsonmap[0x5c] = '\\\\'
+        _jsonmap[0x08] = '\\b'
+        _jsonmap[0x0c] = '\\f'
+        _jsonmap[0x0d] = '\\r'
 
-    return ''.join(_jsonmap[c] for c in toutf8b(s))
+    return ''.join(_jsonmap[x] for x in bytearray(toutf8b(s)))
 
 _utf8len = [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 3, 4]