Patchwork [4,of,4,stable] py3: stop using block-buffered stdout on Python 3

login
register
mail settings
Submitter Manuel Jacob
Date June 5, 2020, 9:44 a.m.
Message ID <92af1f5877d02bd79420.1591350291@tmp>
Download mbox | patch
Permalink /patch/46468/
State Accepted
Headers show

Comments

Manuel Jacob - June 5, 2020, 9:44 a.m.
# HG changeset patch
# User Manuel Jacob <me@manueljacob.de>
# Date 1591333053 -7200
#      Fri Jun 05 06:57:33 2020 +0200
# Branch stable
# Node ID 92af1f5877d02bd79420c04b1177accce27a61ce
# Parent  618173e6b7115a93588b7f1a62f9b9b8460ca84b
# EXP-Topic streams
py3: stop using block-buffered stdout on Python 3

The previous code (added in 227ba1afcb65) incorrectly assumed that
sys.stdout.buffer is line-buffered. However the interpreter initializes it
with a block-buffered stream or an unbuffered stream (when the -u option or
the PYTHONUNBUFFERED environment variable is set), never with a line-buffered
stream.

To see the difference with the hg command, you need to do something slow, e.g.
`hg pull https://www.mercurial-scm.org/repo/hg`. On Python 3 on Linux, without
this patch, all output is printed right before the command ends. With this
patch, output is printed almost immediately.

I did some quick and dirty benchmarking, printing on my terminal.


Unfavorable case:
* for 100 times:
  * for 1000 times:
    * call `ui.write(b'x')`
  * call `ui.write(b'\n')`
* ui.flush()

Results:
CPython 2.7 is unaffected (line-buffered): wall 0.231795 comb 0.230000 user 0.230000 sys 0.000000
CPython 3.8 before patch (block-buffered): wall 0.190882 comb 0.190000 user 0.190000 sys 0.000000
CPython 3.8 after patch (unbuffered):      wall 0.654466 comb 0.650000 user 0.480000 sys 0.170000
CPython 3.8 hand-rolled line-buffered [*]: wall 0.218919 comb 0.220000 user 0.220000 sys 0.000000


Favorable case:
* for 100000 times:
  * call `ui.write(b'Testing write performance\n')`

Results:
CPython 2.7 is unaffected (line-buffered): wall 0.717060 comb 0.710000 user 0.510000 sys 0.200000
CPython 3.8 before patch (block-buffered): wall 0.255527 comb 0.260000 user 0.180000 sys 0.080000
CPython 3.8 after patch (unbuffered):      wall 0.616500 comb 0.620000 user 0.450000 sys 0.170000
CPython 3.8 hand-rolled line-buffered [*]: wall 0.811805 comb 0.810000 user 0.570000 sys 0.240000


[*] For this, I implemented a simplified version of what is suggested in the
TODO: stdout was set to block-buffered and ui.write() flushed the stream if
the output contains a newline character.
Yuya Nishihara - June 5, 2020, 12:08 p.m.
On Fri, 05 Jun 2020 11:44:51 +0200, Manuel Jacob wrote:
> # HG changeset patch
> # User Manuel Jacob <me@manueljacob.de>
> # Date 1591333053 -7200
> #      Fri Jun 05 06:57:33 2020 +0200
> # Branch stable
> # Node ID 92af1f5877d02bd79420c04b1177accce27a61ce
> # Parent  618173e6b7115a93588b7f1a62f9b9b8460ca84b
> # EXP-Topic streams
> py3: stop using block-buffered stdout on Python 3
> 
> The previous code (added in 227ba1afcb65) incorrectly assumed that
> sys.stdout.buffer is line-buffered. However the interpreter initializes it
> with a block-buffered stream or an unbuffered stream (when the -u option or
> the PYTHONUNBUFFERED environment variable is set), never with a line-buffered
> stream.
> 
> To see the difference with the hg command, you need to do something slow, e.g.
> `hg pull https://www.mercurial-scm.org/repo/hg`. On Python 3 on Linux, without
> this patch, all output is printed right before the command ends. With this
> patch, output is printed almost immediately.
> 
> I did some quick and dirty benchmarking, printing on my terminal.
> 
> 
> Unfavorable case:
> * for 100 times:
>   * for 1000 times:
>     * call `ui.write(b'x')`
>   * call `ui.write(b'\n')`
> * ui.flush()
> 
> Results:
> CPython 2.7 is unaffected (line-buffered): wall 0.231795 comb 0.230000 user 0.230000 sys 0.000000
> CPython 3.8 before patch (block-buffered): wall 0.190882 comb 0.190000 user 0.190000 sys 0.000000
> CPython 3.8 after patch (unbuffered):      wall 0.654466 comb 0.650000 user 0.480000 sys 0.170000
> CPython 3.8 hand-rolled line-buffered [*]: wall 0.218919 comb 0.220000 user 0.220000 sys 0.000000

I'm not strongly against this change, but I hesitate to take in this patch
for stable. ui.fout or procutil.stdout is used in various ways and I can't
say the worst-case pattern would never happen.

> Favorable case:
> * for 100000 times:
>   * call `ui.write(b'Testing write performance\n')`
> 
> Results:
> CPython 2.7 is unaffected (line-buffered): wall 0.717060 comb 0.710000 user 0.510000 sys 0.200000
> CPython 3.8 before patch (block-buffered): wall 0.255527 comb 0.260000 user 0.180000 sys 0.080000
> CPython 3.8 after patch (unbuffered):      wall 0.616500 comb 0.620000 user 0.450000 sys 0.170000
> CPython 3.8 hand-rolled line-buffered [*]: wall 0.811805 comb 0.810000 user 0.570000 sys 0.240000

And this "hand-rolled" result makes me feel we'd better off giving up
line-buffered/unbuffered io and inserting flush() (or flush_if_tty())
where it should be.
Manuel Jacob - June 6, 2020, 10:33 p.m.
On 2020-06-05 14:08, Yuya Nishihara wrote:
> On Fri, 05 Jun 2020 11:44:51 +0200, Manuel Jacob wrote:
>> # HG changeset patch
>> # User Manuel Jacob <me@manueljacob.de>
>> # Date 1591333053 -7200
>> #      Fri Jun 05 06:57:33 2020 +0200
>> # Branch stable
>> # Node ID 92af1f5877d02bd79420c04b1177accce27a61ce
>> # Parent  618173e6b7115a93588b7f1a62f9b9b8460ca84b
>> # EXP-Topic streams
>> py3: stop using block-buffered stdout on Python 3
>> 
>> The previous code (added in 227ba1afcb65) incorrectly assumed that
>> sys.stdout.buffer is line-buffered. However the interpreter 
>> initializes it
>> with a block-buffered stream or an unbuffered stream (when the -u 
>> option or
>> the PYTHONUNBUFFERED environment variable is set), never with a 
>> line-buffered
>> stream.
>> 
>> To see the difference with the hg command, you need to do something 
>> slow, e.g.
>> `hg pull https://www.mercurial-scm.org/repo/hg`. On Python 3 on Linux, 
>> without
>> this patch, all output is printed right before the command ends. With 
>> this
>> patch, output is printed almost immediately.
>> 
>> I did some quick and dirty benchmarking, printing on my terminal.
>> 
>> 
>> Unfavorable case:
>> * for 100 times:
>>   * for 1000 times:
>>     * call `ui.write(b'x')`
>>   * call `ui.write(b'\n')`
>> * ui.flush()
>> 
>> Results:
>> CPython 2.7 is unaffected (line-buffered): wall 0.231795 comb 0.230000 
>> user 0.230000 sys 0.000000
>> CPython 3.8 before patch (block-buffered): wall 0.190882 comb 0.190000 
>> user 0.190000 sys 0.000000
>> CPython 3.8 after patch (unbuffered):      wall 0.654466 comb 0.650000 
>> user 0.480000 sys 0.170000
>> CPython 3.8 hand-rolled line-buffered [*]: wall 0.218919 comb 0.220000 
>> user 0.220000 sys 0.000000
> 
> I'm not strongly against this change, but I hesitate to take in this 
> patch
> for stable. ui.fout or procutil.stdout is used in various ways and I 
> can't
> say the worst-case pattern would never happen.

Non-TTY stdout will be unaffected. On TTYs, the output will anyway be 
much faster than the user can read it, whereas withholding status output 
from the user (as is the case before this patch) is IMHO a serious bug, 
especially before a blocking operation like consulting a server. 
Debugging a hanging connection would be very hard for users who are not 
aware of this bug and know how to configure Python to use unbuffered 
I/O.

>> Favorable case:
>> * for 100000 times:
>>   * call `ui.write(b'Testing write performance\n')`
>> 
>> Results:
>> CPython 2.7 is unaffected (line-buffered): wall 0.717060 comb 0.710000 
>> user 0.510000 sys 0.200000
>> CPython 3.8 before patch (block-buffered): wall 0.255527 comb 0.260000 
>> user 0.180000 sys 0.080000
>> CPython 3.8 after patch (unbuffered):      wall 0.616500 comb 0.620000 
>> user 0.450000 sys 0.170000
>> CPython 3.8 hand-rolled line-buffered [*]: wall 0.811805 comb 0.810000 
>> user 0.570000 sys 0.240000
> 
> And this "hand-rolled" result makes me feel we'd better off giving up
> line-buffered/unbuffered io and inserting flush() (or flush_if_tty())
> where it should be.

Do we agree that this is out-of-scope for the stable branch?
Yuya Nishihara - June 7, 2020, 11:56 a.m.
On Sun, 07 Jun 2020 00:33:31 +0200, Manuel Jacob wrote:
> On 2020-06-05 14:08, Yuya Nishihara wrote:
> > On Fri, 05 Jun 2020 11:44:51 +0200, Manuel Jacob wrote:
> >> # HG changeset patch
> >> # User Manuel Jacob <me@manueljacob.de>
> >> # Date 1591333053 -7200
> >> #      Fri Jun 05 06:57:33 2020 +0200
> >> # Branch stable
> >> # Node ID 92af1f5877d02bd79420c04b1177accce27a61ce
> >> # Parent  618173e6b7115a93588b7f1a62f9b9b8460ca84b
> >> # EXP-Topic streams
> >> py3: stop using block-buffered stdout on Python 3

> >> Results:
> >> CPython 2.7 is unaffected (line-buffered): wall 0.231795 comb 0.230000 
> >> user 0.230000 sys 0.000000
> >> CPython 3.8 before patch (block-buffered): wall 0.190882 comb 0.190000 
> >> user 0.190000 sys 0.000000
> >> CPython 3.8 after patch (unbuffered):      wall 0.654466 comb 0.650000 
> >> user 0.480000 sys 0.170000
> >> CPython 3.8 hand-rolled line-buffered [*]: wall 0.218919 comb 0.220000 
> >> user 0.220000 sys 0.000000
> > 
> > I'm not strongly against this change, but I hesitate to take in this 
> > patch
> > for stable. ui.fout or procutil.stdout is used in various ways and I 
> > can't
> > say the worst-case pattern would never happen.
> 
> Non-TTY stdout will be unaffected. On TTYs, the output will anyway be 
> much faster than the user can read it, whereas withholding status output 
> from the user (as is the case before this patch) is IMHO a serious bug, 
> especially before a blocking operation like consulting a server. 
> Debugging a hanging connection would be very hard for users who are not 
> aware of this bug and know how to configure Python to use unbuffered 
> I/O.

This exact issue can be mitigated by your first patch. I think it's safer
to take the mitigation in stable than changing stdout to unbuffered io.
There might be unwanted side effect.

> >> Results:
> >> CPython 2.7 is unaffected (line-buffered): wall 0.717060 comb 0.710000 
> >> user 0.510000 sys 0.200000
> >> CPython 3.8 before patch (block-buffered): wall 0.255527 comb 0.260000 
> >> user 0.180000 sys 0.080000
> >> CPython 3.8 after patch (unbuffered):      wall 0.616500 comb 0.620000 
> >> user 0.450000 sys 0.170000
> >> CPython 3.8 hand-rolled line-buffered [*]: wall 0.811805 comb 0.810000 
> >> user 0.570000 sys 0.240000
> > 
> > And this "hand-rolled" result makes me feel we'd better off giving up
> > line-buffered/unbuffered io and inserting flush() (or flush_if_tty())
> > where it should be.
> 
> Do we agree that this is out-of-scope for the stable branch?

Yep.

Can you test my patch I've just sent. With my testing, flush() insertion
showed better result than using unbuffered io.

Patch

diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py
--- a/mercurial/utils/procutil.py
+++ b/mercurial/utils/procutil.py
@@ -55,12 +55,19 @@  def isatty(fp):
 # on Windows).
 # Python 3 rolls its own standard I/O streams.
 if isatty(stdout):
+    # TODO: Consider mitigating the performance impact of using unbuffered
+    # writes on Python 3 and Windows, e.g. by using fully buffered streams and
+    # ensuring on a higher level that lines are flushed.
     if pycompat.iswindows:
         # Windows doesn't support line buffering
         stdout = os.fdopen(stdout.fileno(), 'wb', 0)
-    elif not pycompat.ispy3:
-        # on Python 3, stdout (sys.stdout.buffer) is already line buffered and
-        # buffering=1 is not handled in binary mode
+    elif pycompat.ispy3:
+        # Python 3 doesn't support line buffering on binary streams. Use the
+        # underlying unbuffered raw stream.
+        # TODO: After Mercurial 5.4, use this code path on Python 3 on Windows.
+        # We shouldn't risk changing working behavior in the stable branch.
+        stdout = stdout.raw
+    else:
         stdout = os.fdopen(stdout.fileno(), 'wb', 1)
 
 if pycompat.iswindows: