Patchwork [5,of,5] encoding: add fast path of from/toutf8b() for ASCII strings

login
register
mail settings
Submitter Yuya Nishihara
Date Aug. 18, 2017, 2:14 p.m.
Message ID <04fd41cda7ed93b480f5.1503065653@mimosa>
Download mbox | patch
Permalink /patch/23117/
State Accepted
Headers show

Comments

Yuya Nishihara - Aug. 18, 2017, 2:14 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1492920538 -32400
#      Sun Apr 23 13:08:58 2017 +0900
# Node ID 04fd41cda7ed93b480f59f62cd4d4278d365dd0e
# Parent  a359d30ebd4a3d7a26717d85564716290bc028c9
encoding: add fast path of from/toutf8b() for ASCII strings

See the previous patch for why.

The added test seems not making much sense because ASCII strings should
never contain "\xed" and be valid UTF-8.

  (with mercurial repo)
  $ export HGRCPATH=/dev/null HGPLAIN=
  $ hg log --time --config experimental.stabilization=all -Tjson > /dev/null

  (original)
  time: real 6.830 secs (user 6.740+0.000 sys 0.080+0.000)
  time: real 6.690 secs (user 6.650+0.000 sys 0.040+0.000)
  time: real 6.700 secs (user 6.640+0.000 sys 0.060+0.000)

  (fast jsonescape)
  time: real 5.630 secs (user 5.550+0.000 sys 0.070+0.000)
  time: real 5.700 secs (user 5.650+0.000 sys 0.050+0.000)
  time: real 5.690 secs (user 5.640+0.000 sys 0.050+0.000)

  (this patch)
  time: real 5.190 secs (user 5.120+0.000 sys 0.070+0.000)
  time: real 5.230 secs (user 5.170+0.000 sys 0.050+0.000)
  time: real 5.220 secs (user 5.150+0.000 sys 0.070+0.000)
Augie Fackler - Aug. 25, 2017, 3:54 p.m.
On Fri, Aug 18, 2017 at 11:14:13PM +0900, Yuya Nishihara wrote:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1492920538 -32400
> #      Sun Apr 23 13:08:58 2017 +0900
> # Node ID 04fd41cda7ed93b480f59f62cd4d4278d365dd0e
> # Parent  a359d30ebd4a3d7a26717d85564716290bc028c9
> encoding: add fast path of from/toutf8b() for ASCII strings

queued, thanks

Patch

diff --git a/mercurial/encoding.py b/mercurial/encoding.py
--- a/mercurial/encoding.py
+++ b/mercurial/encoding.py
@@ -494,6 +494,8 @@  def toutf8b(s):
     internal surrogate encoding as a UTF-8 string.)
     '''
 
+    if not isinstance(s, localstr) and isasciistr(s):
+        return s
     if "\xed" not in s:
         if isinstance(s, localstr):
             return s._utf8
@@ -544,6 +546,8 @@  def fromutf8b(s):
     True
     '''
 
+    if isasciistr(s):
+        return s
     # fast path - look for uDxxx prefixes in s
     if "\xed" not in s:
         return s
diff --git a/tests/test-encoding-func.py b/tests/test-encoding-func.py
--- a/tests/test-encoding-func.py
+++ b/tests/test-encoding-func.py
@@ -34,6 +34,12 @@  class LocalEncodingTest(unittest.TestCas
         self.assertTrue(s is encoding.tolocal(s))
         self.assertTrue(s is encoding.fromlocal(s))
 
+class Utf8bEncodingTest(unittest.TestCase):
+    def testasciifastpath(self):
+        s = b'\0' * 100
+        self.assertTrue(s is encoding.toutf8b(s))
+        self.assertTrue(s is encoding.fromutf8b(s))
+
 if __name__ == '__main__':
     import silenttestrunner
     silenttestrunner.main(__name__)