Patchwork [V2] procutil: ensure that all stdio file objects are flushed at interpreter exit

login
register
mail settings
Submitter Manuel Jacob
Date July 10, 2020, 5:11 a.m.
Message ID <284a80ed05e7f4724062.1594357864@tmp>
Download mbox | patch
Permalink /patch/46684/
State New
Headers show

Comments

Manuel Jacob - July 10, 2020, 5:11 a.m.
# HG changeset patch
# User Manuel Jacob <me@manueljacob.de>
# Date 1594346556 -7200
#      Fri Jul 10 04:02:36 2020 +0200
# Node ID 284a80ed05e7f4724062c9eeac95933a86257f38
# Parent  034feda7f6afcb0ae973569207a8268487793962
# EXP-Topic stdio
procutil: ensure that all stdio file objects are flushed at interpreter exit

Changeset 8403cc54bc83 introduced code that opens a second file object
referring to the stderr file descriptor. This broke tests on Windows. The
reason is that on Windows, sys.stderr is buffered and procutil.stderr closed
the file descriptor when it got garbage collected before sys.stderr had the
chance to flush buffered data.

Possibly, stdout had the same problem for a long time, but we didn’t realize,
as in CI test runs, stdout is not a TTY and therefore no second file object was
opened.
Yuya Nishihara - July 10, 2020, 11:19 a.m.
On Fri, 10 Jul 2020 07:11:04 +0200, Manuel Jacob wrote:
> # HG changeset patch
> # User Manuel Jacob <me@manueljacob.de>
> # Date 1594346556 -7200
> #      Fri Jul 10 04:02:36 2020 +0200
> # Node ID 284a80ed05e7f4724062c9eeac95933a86257f38
> # Parent  034feda7f6afcb0ae973569207a8268487793962
> # EXP-Topic stdio
> procutil: ensure that all stdio file objects are flushed at interpreter exit
> 
> Changeset 8403cc54bc83 introduced code that opens a second file object
> referring to the stderr file descriptor. This broke tests on Windows. The
> reason is that on Windows, sys.stderr is buffered and procutil.stderr closed
> the file descriptor when it got garbage collected before sys.stderr had the
> chance to flush buffered data.
> 
> Possibly, stdout had the same problem for a long time, but we didn’t realize,
> as in CI test runs, stdout is not a TTY and therefore no second file object was
> opened.

Yep, but IIUC, the major difference is that the Python interpreter needs
sys.stderr to print traceback, etc. So stderr has to remain open after the
procutil module gets GC-ed. On the other hand, stdout wouldn't have such
problem, and nothing should be printed after dispatch.run(), which flushes
the both streams.

Suppose 8403cc54bc83 wouldn't make a real difference in our codebase, I
suggest reverting the change.
Manuel Jacob - July 10, 2020, 11:55 a.m.
On 2020-07-10 13:19, Yuya Nishihara wrote:
> On Fri, 10 Jul 2020 07:11:04 +0200, Manuel Jacob wrote:
>> # HG changeset patch
>> # User Manuel Jacob <me@manueljacob.de>
>> # Date 1594346556 -7200
>> #      Fri Jul 10 04:02:36 2020 +0200
>> # Node ID 284a80ed05e7f4724062c9eeac95933a86257f38
>> # Parent  034feda7f6afcb0ae973569207a8268487793962
>> # EXP-Topic stdio
>> procutil: ensure that all stdio file objects are flushed at 
>> interpreter exit
>> 
>> Changeset 8403cc54bc83 introduced code that opens a second file object
>> referring to the stderr file descriptor. This broke tests on Windows. 
>> The
>> reason is that on Windows, sys.stderr is buffered and procutil.stderr 
>> closed
>> the file descriptor when it got garbage collected before sys.stderr 
>> had the
>> chance to flush buffered data.
>> 
>> Possibly, stdout had the same problem for a long time, but we didn’t 
>> realize,
>> as in CI test runs, stdout is not a TTY and therefore no second file 
>> object was
>> opened.
> 
> Yep, but IIUC, the major difference is that the Python interpreter 
> needs
> sys.stderr to print traceback, etc. So stderr has to remain open after 
> the
> procutil module gets GC-ed. On the other hand, stdout wouldn't have 
> such
> problem, and nothing should be printed after dispatch.run(), which 
> flushes
> the both streams.

There are various scripts, especially in contrib, that use the sys.std* 
streams.

> Suppose 8403cc54bc83 wouldn't make a real difference in our codebase, I
> suggest reverting the change.

There are various small tools using procutils.stderr directly that could 
benefit from it being unbuffered (one was referred to in the changeset 
description of 8403cc54bc83).

Furthermore, it would enable to remove the manual flush of stderr in the 
ui object that works around the fact the procutil.stderr wasn’t 
unbuffered before on Windows (and Python 3).
Yuya Nishihara - July 10, 2020, 1:16 p.m.
On Fri, 10 Jul 2020 13:55:50 +0200, Manuel Jacob wrote:
> On 2020-07-10 13:19, Yuya Nishihara wrote:
> > On Fri, 10 Jul 2020 07:11:04 +0200, Manuel Jacob wrote:
> >> # HG changeset patch
> >> # User Manuel Jacob <me@manueljacob.de>
> >> # Date 1594346556 -7200
> >> #      Fri Jul 10 04:02:36 2020 +0200
> >> # Node ID 284a80ed05e7f4724062c9eeac95933a86257f38
> >> # Parent  034feda7f6afcb0ae973569207a8268487793962
> >> # EXP-Topic stdio
> >> procutil: ensure that all stdio file objects are flushed at 
> >> interpreter exit
> >> 
> >> Changeset 8403cc54bc83 introduced code that opens a second file object
> >> referring to the stderr file descriptor. This broke tests on Windows. 
> >> The
> >> reason is that on Windows, sys.stderr is buffered and procutil.stderr 
> >> closed
> >> the file descriptor when it got garbage collected before sys.stderr 
> >> had the
> >> chance to flush buffered data.
> >> 
> >> Possibly, stdout had the same problem for a long time, but we didn’t 
> >> realize,
> >> as in CI test runs, stdout is not a TTY and therefore no second file 
> >> object was
> >> opened.
> > 
> > Yep, but IIUC, the major difference is that the Python interpreter 
> > needs
> > sys.stderr to print traceback, etc. So stderr has to remain open after 
> > the
> > procutil module gets GC-ed. On the other hand, stdout wouldn't have 
> > such
> > problem, and nothing should be printed after dispatch.run(), which 
> > flushes
> > the both streams.
> 
> There are various scripts, especially in contrib, that use the sys.std* 
> streams.
> 
> > Suppose 8403cc54bc83 wouldn't make a real difference in our codebase, I
> > suggest reverting the change.
> 
> There are various small tools using procutils.stderr directly that could 
> benefit from it being unbuffered (one was referred to in the changeset 
> description of 8403cc54bc83).
> 
> Furthermore, it would enable to remove the manual flush of stderr in the 
> ui object that works around the fact the procutil.stderr wasn’t 
> unbuffered before on Windows (and Python 3).

My feeling is that these improvements are far more trivial compared to the
complexity introduced by atexit.
Manuel Jacob - July 10, 2020, 9:27 p.m.
On 2020-07-10 15:16, Yuya Nishihara wrote:
> On Fri, 10 Jul 2020 13:55:50 +0200, Manuel Jacob wrote:
>> On 2020-07-10 13:19, Yuya Nishihara wrote:
>> > On Fri, 10 Jul 2020 07:11:04 +0200, Manuel Jacob wrote:
>> >> # HG changeset patch
>> >> # User Manuel Jacob <me@manueljacob.de>
>> >> # Date 1594346556 -7200
>> >> #      Fri Jul 10 04:02:36 2020 +0200
>> >> # Node ID 284a80ed05e7f4724062c9eeac95933a86257f38
>> >> # Parent  034feda7f6afcb0ae973569207a8268487793962
>> >> # EXP-Topic stdio
>> >> procutil: ensure that all stdio file objects are flushed at
>> >> interpreter exit
>> >>
>> >> Changeset 8403cc54bc83 introduced code that opens a second file object
>> >> referring to the stderr file descriptor. This broke tests on Windows.
>> >> The
>> >> reason is that on Windows, sys.stderr is buffered and procutil.stderr
>> >> closed
>> >> the file descriptor when it got garbage collected before sys.stderr
>> >> had the
>> >> chance to flush buffered data.
>> >>
>> >> Possibly, stdout had the same problem for a long time, but we didn’t
>> >> realize,
>> >> as in CI test runs, stdout is not a TTY and therefore no second file
>> >> object was
>> >> opened.
>> >
>> > Yep, but IIUC, the major difference is that the Python interpreter
>> > needs
>> > sys.stderr to print traceback, etc. So stderr has to remain open after
>> > the
>> > procutil module gets GC-ed. On the other hand, stdout wouldn't have
>> > such
>> > problem, and nothing should be printed after dispatch.run(), which
>> > flushes
>> > the both streams.
>> 
>> There are various scripts, especially in contrib, that use the 
>> sys.std*
>> streams.
>> 
>> > Suppose 8403cc54bc83 wouldn't make a real difference in our codebase, I
>> > suggest reverting the change.
>> 
>> There are various small tools using procutils.stderr directly that 
>> could
>> benefit from it being unbuffered (one was referred to in the changeset
>> description of 8403cc54bc83).
>> 
>> Furthermore, it would enable to remove the manual flush of stderr in 
>> the
>> ui object that works around the fact the procutil.stderr wasn’t
>> unbuffered before on Windows (and Python 3).
> 
> My feeling is that these improvements are far more trivial compared to 
> the
> complexity introduced by atexit.

I’m uncomfortable with the idea of having a block-buffered 
mercurial.utils.procutils.stderr on Python 3 and Windows. We should 
abstract the platform differences as much as possible. However, it’s 
worse to introduce regressions.

I think that we anyway need to have the complexity introduced by atexit, 
to flush stdout. But as you mentioned, sys.stderr is sometimes used by 
the Python interpreter itself to print stuff; that could even be after 
the atexit hook.

There seems to be some weak consensus that Python 2 support should be 
dropped for 5.6. Therefore, I would be fine with backing out 
8403cc54bc83 (make mercurial.utils.procutil.stderr unbuffered) for now, 
and reapply a Python 3-only solution (that doesn’t need the atexit hook) 
once Python 2 support was dropped.

What do you think? I might decide to send a patch for backing out 
8403cc54bc83 later today. Feel free to go ahead and apply it if you 
think it’s a good idea.
Yuya Nishihara - July 11, 2020, 2:51 a.m.
On Fri, 10 Jul 2020 23:27:39 +0200, Manuel Jacob wrote:
> On 2020-07-10 15:16, Yuya Nishihara wrote:
> > On Fri, 10 Jul 2020 13:55:50 +0200, Manuel Jacob wrote:
> >> On 2020-07-10 13:19, Yuya Nishihara wrote:
> >> > On Fri, 10 Jul 2020 07:11:04 +0200, Manuel Jacob wrote:
> >> >> # HG changeset patch
> >> >> # User Manuel Jacob <me@manueljacob.de>
> >> >> # Date 1594346556 -7200
> >> >> #      Fri Jul 10 04:02:36 2020 +0200
> >> >> # Node ID 284a80ed05e7f4724062c9eeac95933a86257f38
> >> >> # Parent  034feda7f6afcb0ae973569207a8268487793962
> >> >> # EXP-Topic stdio
> >> >> procutil: ensure that all stdio file objects are flushed at
> >> >> interpreter exit
> >> >>
> >> >> Changeset 8403cc54bc83 introduced code that opens a second file object
> >> >> referring to the stderr file descriptor. This broke tests on Windows.
> >> >> The
> >> >> reason is that on Windows, sys.stderr is buffered and procutil.stderr
> >> >> closed
> >> >> the file descriptor when it got garbage collected before sys.stderr
> >> >> had the
> >> >> chance to flush buffered data.
> >> >>
> >> >> Possibly, stdout had the same problem for a long time, but we didn’t
> >> >> realize,
> >> >> as in CI test runs, stdout is not a TTY and therefore no second file
> >> >> object was
> >> >> opened.
> >> >
> >> > Yep, but IIUC, the major difference is that the Python interpreter
> >> > needs
> >> > sys.stderr to print traceback, etc. So stderr has to remain open after
> >> > the
> >> > procutil module gets GC-ed. On the other hand, stdout wouldn't have
> >> > such
> >> > problem, and nothing should be printed after dispatch.run(), which
> >> > flushes
> >> > the both streams.
> >> 
> >> There are various scripts, especially in contrib, that use the 
> >> sys.std*
> >> streams.
> >> 
> >> > Suppose 8403cc54bc83 wouldn't make a real difference in our codebase, I
> >> > suggest reverting the change.
> >> 
> >> There are various small tools using procutils.stderr directly that 
> >> could
> >> benefit from it being unbuffered (one was referred to in the changeset
> >> description of 8403cc54bc83).
> >> 
> >> Furthermore, it would enable to remove the manual flush of stderr in 
> >> the
> >> ui object that works around the fact the procutil.stderr wasn’t
> >> unbuffered before on Windows (and Python 3).
> > 
> > My feeling is that these improvements are far more trivial compared to 
> > the
> > complexity introduced by atexit.
> 
> I’m uncomfortable with the idea of having a block-buffered 
> mercurial.utils.procutils.stderr on Python 3 and Windows. We should 
> abstract the platform differences as much as possible. However, it’s 
> worse to introduce regressions.

I think buffered stderr is acceptable option as we've worked around the
Windows issues for years.

> I think that we anyway need to have the complexity introduced by atexit, 
> to flush stdout. But as you mentioned, sys.stderr is sometimes used by 
> the Python interpreter itself to print stuff; that could even be after 
> the atexit hook.
> 
> There seems to be some weak consensus that Python 2 support should be 
> dropped for 5.6. Therefore, I would be fine with backing out 
> 8403cc54bc83 (make mercurial.utils.procutil.stderr unbuffered) for now, 
> and reapply a Python 3-only solution (that doesn’t need the atexit hook) 
> once Python 2 support was dropped.

Sounds reasonable.

Patch

diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py
--- a/mercurial/utils/procutil.py
+++ b/mercurial/utils/procutil.py
@@ -50,6 +50,27 @@ 
         return False
 
 
+def reopen_stream(orig, mode, buffering):
+    import atexit
+
+    assert not pycompat.ispy3
+    # On Python 2, there's no way to create a file such that the file
+    # descriptor doesn't get closed if the file object gets garbage collected.
+    # Therefore, when reopening an already open stream, the file descriptor
+    # will be closed twice. The error message for that will be swallowed during
+    # interpreter shutdown.
+    new = os.fdopen(orig.fileno(), mode, buffering)
+
+    @atexit.register
+    def flush_streams():
+        # Ensure that one stream gets flushed before the other closes the file
+        # descriptor.
+        new.flush()
+        orig.flush()
+
+    return new
+
+
 class LineBufferedWrapper(object):
     def __init__(self, orig):
         self.orig = orig
@@ -104,7 +125,7 @@ 
         # Therefore we have a wrapper that implements line buffering.
         stdout = make_line_buffered(stdout)
     else:
-        stdout = os.fdopen(stdout.fileno(), 'wb', 1)
+        stdout = reopen_stream(stdout, 'wb', 1)
 
 # stderr should be unbuffered
 if pycompat.ispy3:
@@ -115,7 +136,7 @@ 
     stderr = getattr(stderr, 'raw', stderr)
 elif pycompat.iswindows:
     # On Windows, stderr is buffered at least when connected to a pipe.
-    stderr = os.fdopen(stderr.fileno(), 'wb', 0)
+    stderr = reopen_stream(stderr, 'wb', 0)
 # On other platforms, stderr is always unbuffered.
 
 
diff --git a/tests/test-stdio.py b/tests/test-stdio.py
--- a/tests/test-stdio.py
+++ b/tests/test-stdio.py
@@ -30,6 +30,16 @@ 
 LINE_BUFFERED = b'[written aaa]aaabbb\n[written bbb\\n]'
 FULLY_BUFFERED = b'[written aaa][written bbb\\n]aaabbb\n'
 
+TEST_FLUSH_STDIO_CHILD_SCRIPT = r'''
+import sys
+
+from mercurial import dispatch
+from mercurial.utils import procutil
+
+dispatch.initstdio()
+sys.{stream}.write('test')
+'''
+
 
 @contextlib.contextmanager
 def _closing(fds):
@@ -155,6 +165,29 @@ 
     def test_buffering_stderr_ptys_unbuffered(self):
         self._test_buffering('stderr', _ptys, UNBUFFERED, python_args=['-u'])
 
+    def _test_flush_stdio(self, stream, rwpair_generator):
+        def check_output(stream_receiver):
+            self.assertEqual(_readall(stream_receiver, 1024), b'test')
+
+        self._test(
+            TEST_FLUSH_STDIO_CHILD_SCRIPT.format(stream=stream),
+            stream,
+            rwpair_generator,
+            check_output,
+        )
+
+    def test_flush_stdout_pipes(self):
+        self._test_flush_stdio('stdout', _pipes)
+
+    def test_flush_stdout_ptys(self):
+        self._test_flush_stdio('stdout', _ptys)
+
+    def test_flush_stderr_pipes(self):
+        self._test_flush_stdio('stderr', _pipes)
+
+    def test_flush_stderr_ptys(self):
+        self._test_flush_stdio('stderr', _ptys)
+
 
 if __name__ == '__main__':
     import silenttestrunner