Patchwork [02,of,11,V2] procutil: ensure that all stdio file objects are flushed at interpreter exit

login
register
mail settings
Submitter Manuel Jacob
Date July 12, 2020, 10:41 p.m.
Message ID <a0ddc1349dde0f5849d1.1594593687@tmp>
Download mbox | patch
Permalink /patch/46706/
State Accepted
Headers show

Comments

Manuel Jacob - July 12, 2020, 10:41 p.m.
# HG changeset patch
# User Manuel Jacob <me@manueljacob.de>
# Date 1594346556 -7200
#      Fri Jul 10 04:02:36 2020 +0200
# Node ID a0ddc1349dde0f5849d196236c3fa31d49934511
# Parent  2668345a92975b843cab2e766b8fa59181003772
# EXP-Topic stdio
procutil: ensure that all stdio file objects are flushed at interpreter exit

On Python 2 on Unix, we open a second file object referring to the stdout file
descriptor. If this file object gets garbage collected before sys.stdout (which
is very likely), the file descriptor gets closed and sys.stdout has no chance
to flush buffered data. Therefore, we flush them explicitly before the file
objects get garbage collected and the file descriptor gets closed.
Yuya Nishihara - July 13, 2020, 11:27 a.m.
On Mon, 13 Jul 2020 00:41:27 +0200, Manuel Jacob wrote:
> # HG changeset patch
> # User Manuel Jacob <me@manueljacob.de>
> # Date 1594346556 -7200
> #      Fri Jul 10 04:02:36 2020 +0200
> # Node ID a0ddc1349dde0f5849d196236c3fa31d49934511
> # Parent  2668345a92975b843cab2e766b8fa59181003772
> # EXP-Topic stdio
> procutil: ensure that all stdio file objects are flushed at interpreter exit
> 
> On Python 2 on Unix, we open a second file object referring to the stdout file
> descriptor. If this file object gets garbage collected before sys.stdout (which
> is very likely), the file descriptor gets closed and sys.stdout has no chance
> to flush buffered data. Therefore, we flush them explicitly before the file
> objects get garbage collected and the file descriptor gets closed.
> 
> diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py
> --- a/mercurial/utils/procutil.py
> +++ b/mercurial/utils/procutil.py
> @@ -50,6 +50,27 @@
>          return False
>  
>  
> +def reopen_stream(orig, mode, buffering):
> +    import atexit
> +
> +    assert not pycompat.ispy3
> +    # On Python 2, there's no way to create a file such that the file
> +    # descriptor doesn't get closed if the file object gets garbage collected.
> +    # Therefore, when reopening an already open stream, the file descriptor
> +    # will be closed twice. The error message for that will be swallowed during
> +    # interpreter shutdown.
> +    new = os.fdopen(orig.fileno(), mode, buffering)
> +
> +    @atexit.register
> +    def flush_streams():
> +        # Ensure that one stream gets flushed before the other closes the file
> +        # descriptor.
> +        new.flush()
> +        orig.flush()
> +
> +    return new

Didn't we agree that the use of atexit wouldn't buy and we'll instead add
py3-only solution later?

As I said, atexit will increase the complexity of the control flow. All data
should be flushed inside dispatch.run().
Manuel Jacob - July 13, 2020, 3:14 p.m.
On 2020-07-13 13:27, Yuya Nishihara wrote:
> On Mon, 13 Jul 2020 00:41:27 +0200, Manuel Jacob wrote:
>> # HG changeset patch
>> # User Manuel Jacob <me@manueljacob.de>
>> # Date 1594346556 -7200
>> #      Fri Jul 10 04:02:36 2020 +0200
>> # Node ID a0ddc1349dde0f5849d196236c3fa31d49934511
>> # Parent  2668345a92975b843cab2e766b8fa59181003772
>> # EXP-Topic stdio
>> procutil: ensure that all stdio file objects are flushed at 
>> interpreter exit
>> 
>> On Python 2 on Unix, we open a second file object referring to the 
>> stdout file
>> descriptor. If this file object gets garbage collected before 
>> sys.stdout (which
>> is very likely), the file descriptor gets closed and sys.stdout has no 
>> chance
>> to flush buffered data. Therefore, we flush them explicitly before the 
>> file
>> objects get garbage collected and the file descriptor gets closed.
>> 
>> diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py
>> --- a/mercurial/utils/procutil.py
>> +++ b/mercurial/utils/procutil.py
>> @@ -50,6 +50,27 @@
>>          return False
>> 
>> 
>> +def reopen_stream(orig, mode, buffering):
>> +    import atexit
>> +
>> +    assert not pycompat.ispy3
>> +    # On Python 2, there's no way to create a file such that the file
>> +    # descriptor doesn't get closed if the file object gets garbage 
>> collected.
>> +    # Therefore, when reopening an already open stream, the file 
>> descriptor
>> +    # will be closed twice. The error message for that will be 
>> swallowed during
>> +    # interpreter shutdown.
>> +    new = os.fdopen(orig.fileno(), mode, buffering)
>> +
>> +    @atexit.register
>> +    def flush_streams():
>> +        # Ensure that one stream gets flushed before the other closes 
>> the file
>> +        # descriptor.
>> +        new.flush()
>> +        orig.flush()
>> +
>> +    return new
> 
> Didn't we agree that the use of atexit wouldn't buy and we'll instead 
> add
> py3-only solution later?
> 
> As I said, atexit will increase the complexity of the control flow. All 
> data
> should be flushed inside dispatch.run().

We talked past each other, then. What I meant is that for stderr it’s 
not worth it because 1) we have other reasons to back it out, 2) stderr 
may be used for tracebacks etc. even after the atexit hook. For stdout 
it’s not as important because stdout is mostly written to through the ui 
object, but still there are some contrib scripts etc. using it directly. 
I don’t see any problem using atexit for this particular use case, but 
since I didn’t invest further time trying to find a case where we need 
it for stdout (and the stderr problem was solved otherwise), I’d be fine 
dropping the patch. On Python 3, we don’t need to do anything. What do 
you think?
Yuya Nishihara - July 14, 2020, 9:54 a.m.
On Mon, 13 Jul 2020 17:14:31 +0200, Manuel Jacob wrote:
> > Didn't we agree that the use of atexit wouldn't buy and we'll instead 
> > add
> > py3-only solution later?
> > 
> > As I said, atexit will increase the complexity of the control flow. All 
> > data
> > should be flushed inside dispatch.run().
> 
> We talked past each other, then. What I meant is that for stderr it’s 
> not worth it because 1) we have other reasons to back it out, 2) stderr 
> may be used for tracebacks etc. even after the atexit hook. For stdout 
> it’s not as important because stdout is mostly written to through the ui 
> object, but still there are some contrib scripts etc. using it directly.

I see. I meant the use of atexit isn't nice at all. Sorry for confusion.

> I don’t see any problem using atexit for this particular use case, but 
> since I didn’t invest further time trying to find a case where we need 
> it for stdout (and the stderr problem was solved otherwise), I’d be fine 
> dropping the patch. On Python 3, we don’t need to do anything. What do 
> you think?

I prefer dropping this patch.

Historically we've replaced atexit() with custom ui.atexit() function for
better control of errors and execution flow, see c13ff31818b0. I don't
remember the exact problem, but doing I/O in atexit() would be undesired.

Patch

diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py
--- a/mercurial/utils/procutil.py
+++ b/mercurial/utils/procutil.py
@@ -50,6 +50,27 @@ 
         return False
 
 
+def reopen_stream(orig, mode, buffering):
+    import atexit
+
+    assert not pycompat.ispy3
+    # On Python 2, there's no way to create a file such that the file
+    # descriptor doesn't get closed if the file object gets garbage collected.
+    # Therefore, when reopening an already open stream, the file descriptor
+    # will be closed twice. The error message for that will be swallowed during
+    # interpreter shutdown.
+    new = os.fdopen(orig.fileno(), mode, buffering)
+
+    @atexit.register
+    def flush_streams():
+        # Ensure that one stream gets flushed before the other closes the file
+        # descriptor.
+        new.flush()
+        orig.flush()
+
+    return new
+
+
 class LineBufferedWrapper(object):
     def __init__(self, orig):
         self.orig = orig
@@ -104,7 +125,7 @@ 
         # Therefore we have a wrapper that implements line buffering.
         stdout = make_line_buffered(stdout)
     else:
-        stdout = os.fdopen(stdout.fileno(), 'wb', 1)
+        stdout = reopen_stream(stdout, 'wb', 1)
 
 
 findexe = platform.findexe
diff --git a/tests/test-stdio.py b/tests/test-stdio.py
--- a/tests/test-stdio.py
+++ b/tests/test-stdio.py
@@ -30,6 +30,16 @@ 
 LINE_BUFFERED = b'[written aaa]aaabbb\n[written bbb\\n]'
 FULLY_BUFFERED = b'[written aaa][written bbb\\n]aaabbb\n'
 
+TEST_FLUSH_STDIO_CHILD_SCRIPT = r'''
+import sys
+
+from mercurial import dispatch
+from mercurial.utils import procutil
+
+dispatch.initstdio()
+sys.{stream}.write('test')
+'''
+
 
 @contextlib.contextmanager
 def _closing(fds):
@@ -143,6 +153,29 @@ 
             test_buffering_stdout_ptys_unbuffered
         )
 
+    def _test_flush_stdio(self, stream, rwpair_generator):
+        def check_output(stream_receiver):
+            self.assertEqual(_readall(stream_receiver, 1024), b'test')
+
+        self._test(
+            TEST_FLUSH_STDIO_CHILD_SCRIPT.format(stream=stream),
+            stream,
+            rwpair_generator,
+            check_output,
+        )
+
+    def test_flush_stdout_pipes(self):
+        self._test_flush_stdio('stdout', _pipes)
+
+    def test_flush_stdout_ptys(self):
+        self._test_flush_stdio('stdout', _ptys)
+
+    def test_flush_stderr_pipes(self):
+        self._test_flush_stdio('stderr', _pipes)
+
+    def test_flush_stderr_ptys(self):
+        self._test_flush_stdio('stderr', _ptys)
+
 
 if __name__ == '__main__':
     import silenttestrunner