Patchwork [09,of,13] largefiles: getlfile must hit end of HTTP chunked streams to reuse connections

login
register
mail settings
Submitter Mads Kiilerich
Date April 16, 2013, 2:43 a.m.
Message ID <22879b41a70b922fed21.1366080206@mk-desktop>
Download mbox | patch
Permalink /patch/1339/
State Accepted
Commit 0b3b84222a2dab968331c5c7b913bb45a7cbe4ba
Headers show

Comments

Mads Kiilerich - April 16, 2013, 2:43 a.m.
# HG changeset patch
# User Mads Kiilerich <madski@unity3d.com>
# Date 1366079710 -7200
#      Tue Apr 16 04:35:10 2013 +0200
# Node ID 22879b41a70b922fed215dcec5b8886c968658d7
# Parent  0aaf7db2201975bcf9f5d82da481167b4c4c2ad6
largefiles: getlfile must hit end of HTTP chunked streams to reuse connections

We did read the exactly the right number of bytes from the response body. But
if the response came in chunked encoding then that meant that the HTTP layer
still hadn't read the last 0-sized chunk and expected the app layer to read
more data from the stream. The app layer was however happy and sent another
request which had to be sent on another HTTP connection while the old one was
lingering until some other event closed the connection.

Adding an extra read where we expect to hit the end of file makes the HTTP
connection ready for reuse. This thus plugs a real socket leak.

To distinguish HTTP from SSH we look at self's class, just like it is done in
putlfile.
Martin Geisler - April 16, 2013, 6:37 a.m.
Mads Kiilerich <mads@kiilerich.com> writes:

> # HG changeset patch
> # User Mads Kiilerich <madski@unity3d.com>
> # Date 1366079710 -7200
> #      Tue Apr 16 04:35:10 2013 +0200
> # Node ID 22879b41a70b922fed215dcec5b8886c968658d7
> # Parent  0aaf7db2201975bcf9f5d82da481167b4c4c2ad6
> largefiles: getlfile must hit end of HTTP chunked streams to reuse connections
>
> We did read the exactly the right number of bytes from the response body. But
> if the response came in chunked encoding then that meant that the HTTP layer
> still hadn't read the last 0-sized chunk and expected the app layer to read
> more data from the stream. The app layer was however happy and sent another
> request which had to be sent on another HTTP connection while the old one was
> lingering until some other event closed the connection.
>
> Adding an extra read where we expect to hit the end of file makes the HTTP
> connection ready for reuse. This thus plugs a real socket leak.
>
> To distinguish HTTP from SSH we look at self's class, just like it is
> done in putlfile.

Would

  if isinstance(self, httppeer.httppeer)

not be equivalent and the usual way to write this? Or did you write it
like this to mimic putlfile?

> diff --git a/hgext/largefiles/proto.py b/hgext/largefiles/proto.py
> --- a/hgext/largefiles/proto.py
> +++ b/hgext/largefiles/proto.py
> @@ -126,6 +126,13 @@
>              # SSH streams will block if reading more than length
>              for chunk in util.filechunkiter(stream, 128 * 1024, length):
>                  yield chunk
> +            # HTTP streams must hit the end to process the last empty
> +            # chunk of Chunked-Encoding so the connection can be reused.
> +            if issubclass(self.__class__, httppeer.httppeer):
> +                chunk = stream.read(1)
> +                if chunk:
> +                    self._abort(error.ResponseError(_("unexpected response:"),
> +                                                    chunk))
>  
>          @batchable
>          def statlfile(self, sha):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Mads Kiilerich - April 16, 2013, 11:41 a.m.
On 04/16/2013 08:37 AM, Martin Geisler wrote:
> Mads Kiilerich <mads@kiilerich.com> writes:
>
>> # HG changeset patch
>> # User Mads Kiilerich <madski@unity3d.com>
>> # Date 1366079710 -7200
>> #      Tue Apr 16 04:35:10 2013 +0200
>> # Node ID 22879b41a70b922fed215dcec5b8886c968658d7
>> # Parent  0aaf7db2201975bcf9f5d82da481167b4c4c2ad6
>> largefiles: getlfile must hit end of HTTP chunked streams to reuse connections
>>
>> We did read the exactly the right number of bytes from the response body. But
>> if the response came in chunked encoding then that meant that the HTTP layer
>> still hadn't read the last 0-sized chunk and expected the app layer to read
>> more data from the stream. The app layer was however happy and sent another
>> request which had to be sent on another HTTP connection while the old one was
>> lingering until some other event closed the connection.
>>
>> Adding an extra read where we expect to hit the end of file makes the HTTP
>> connection ready for reuse. This thus plugs a real socket leak.
>>
>> To distinguish HTTP from SSH we look at self's class, just like it is
>> done in putlfile.
> Would
>
>    if isinstance(self, httppeer.httppeer)
>
> not be equivalent and the usual way to write this? Or did you write it
> like this to mimic putlfile?

Yes, that would probably be better. It was cut'n'pasted from putlfile 
where the pattern has been used since day 0 and seems to work. I decided 
to change as little as possible and leave that clean-up for another day.

/Mads

Patch

diff --git a/hgext/largefiles/proto.py b/hgext/largefiles/proto.py
--- a/hgext/largefiles/proto.py
+++ b/hgext/largefiles/proto.py
@@ -126,6 +126,13 @@ 
             # SSH streams will block if reading more than length
             for chunk in util.filechunkiter(stream, 128 * 1024, length):
                 yield chunk
+            # HTTP streams must hit the end to process the last empty
+            # chunk of Chunked-Encoding so the connection can be reused.
+            if issubclass(self.__class__, httppeer.httppeer):
+                chunk = stream.read(1)
+                if chunk:
+                    self._abort(error.ResponseError(_("unexpected response:"),
+                                                    chunk))
 
         @batchable
         def statlfile(self, sha):