Patchwork [3,of,3] keepalive: rewrite readline()

login
register
mail settings
Submitter Gregory Szorc
Date Dec. 26, 2016, 6:53 p.m.
Message ID <335fa630b5aa1b6d39ca.1482778407@gps-mbp.local>
Download mbox | patch
Permalink /patch/18043/
State Accepted
Headers show

Comments

Gregory Szorc - Dec. 26, 2016, 6:53 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1444258965 25200
#      Wed Oct 07 16:02:45 2015 -0700
# Node ID 335fa630b5aa1b6d39ca1bebaa0155896140b2f1
# Parent  d238f7c120d82a68f39ccb82488370a5191957bb
keepalive: rewrite readline()

The old method was performing string concatenation, which is slower than
collecting raw chunks in a list and joining at the end.
Augie Fackler - Dec. 26, 2016, 8:42 p.m.
> On Dec 26, 2016, at 1:53 PM, Gregory Szorc <gregory.szorc@gmail.com> wrote:
> 
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1444258965 25200
> #      Wed Oct 07 16:02:45 2015 -0700
> # Node ID 335fa630b5aa1b6d39ca1bebaa0155896140b2f1
> # Parent  d238f7c120d82a68f39ccb82488370a5191957bb
> keepalive: rewrite readline()

Sure, why not? Queued.

News on this front: urllib3 is considering making an intentional breaking release and not using httplib anymore, and instead using a new HTTP library written using njs’ fantastic h11[0]. I’m hopeful that it’ll pan out and that we can jettison our tragically large stack of hacks on top of httplib and just vendor urllib3 entirely.

0: http://github.com/njsmith/h11

> 
> The old method was performing string concatenation, which is slower than
> collecting raw chunks in a list and joining at the end.
> 
> diff --git a/mercurial/keepalive.py b/mercurial/keepalive.py
> --- a/mercurial/keepalive.py
> +++ b/mercurial/keepalive.py
> @@ -451,23 +451,41 @@ class HTTPResponse(httplib.HTTPResponse)
> 
>         return ''.join(parts)
> 
>     def readline(self):
> +        # Fast path for a line is already available in read buffer.
>         i = self._rbuf.find('\n')
> -        while i < 0:
> -            new = self._raw_read(self._rbufsize)
> +        if i >= 0:
> +            i += 1
> +            line = self._rbuf[:i]
> +            self._rbuf = self._rbuf[i:]
> +            return line
> +
> +        # No newline in local buffer. Read until we find one.
> +        chunks = [self._rbuf]
> +        i = -1
> +        readsize = self._rbufsize
> +        while True:
> +            new = self._raw_read(readsize)
>             if not new:
>                 break
> +
> +            chunks.append(new)
>             i = new.find('\n')
>             if i >= 0:
> -                i = i + len(self._rbuf)
> -            self._rbuf = self._rbuf + new
> -        if i < 0:
> -            i = len(self._rbuf)
> -        else:
> -            i = i + 1
> -        data, self._rbuf = self._rbuf[:i], self._rbuf[i:]
> -        return data
> +                break
> +
> +        # We either have exhausted the stream or have a newline in chunks[-1].
> +
> +        # EOF
> +        if i == -1:
> +            self._rbuf = ''
> +            return ''.join(chunks)
> +
> +        i += 1
> +        self._rbuf = chunks[-1][i:]
> +        chunks[-1] = chunks[-1][:i]
> +        return ''.join(chunks)
> 
>     def readlines(self, sizehint=0):
>         total = 0
>         list = []
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Patch

diff --git a/mercurial/keepalive.py b/mercurial/keepalive.py
--- a/mercurial/keepalive.py
+++ b/mercurial/keepalive.py
@@ -451,23 +451,41 @@  class HTTPResponse(httplib.HTTPResponse)
 
         return ''.join(parts)
 
     def readline(self):
+        # Fast path for a line is already available in read buffer.
         i = self._rbuf.find('\n')
-        while i < 0:
-            new = self._raw_read(self._rbufsize)
+        if i >= 0:
+            i += 1
+            line = self._rbuf[:i]
+            self._rbuf = self._rbuf[i:]
+            return line
+
+        # No newline in local buffer. Read until we find one.
+        chunks = [self._rbuf]
+        i = -1
+        readsize = self._rbufsize
+        while True:
+            new = self._raw_read(readsize)
             if not new:
                 break
+
+            chunks.append(new)
             i = new.find('\n')
             if i >= 0:
-                i = i + len(self._rbuf)
-            self._rbuf = self._rbuf + new
-        if i < 0:
-            i = len(self._rbuf)
-        else:
-            i = i + 1
-        data, self._rbuf = self._rbuf[:i], self._rbuf[i:]
-        return data
+                break
+
+        # We either have exhausted the stream or have a newline in chunks[-1].
+
+        # EOF
+        if i == -1:
+            self._rbuf = ''
+            return ''.join(chunks)
+
+        i += 1
+        self._rbuf = chunks[-1][i:]
+        chunks[-1] = chunks[-1][:i]
+        return ''.join(chunks)
 
     def readlines(self, sizehint=0):
         total = 0
         list = []