Submitter | Manuel Jacob |
---|---|
Date | July 12, 2020, 10:41 p.m. |
Message ID | <a0ddc1349dde0f5849d1.1594593687@tmp> |
Download | mbox | patch |
Permalink | /patch/46706/ |
State | Accepted |
Headers | show |
Comments
On Mon, 13 Jul 2020 00:41:27 +0200, Manuel Jacob wrote: > # HG changeset patch > # User Manuel Jacob <me@manueljacob.de> > # Date 1594346556 -7200 > # Fri Jul 10 04:02:36 2020 +0200 > # Node ID a0ddc1349dde0f5849d196236c3fa31d49934511 > # Parent 2668345a92975b843cab2e766b8fa59181003772 > # EXP-Topic stdio > procutil: ensure that all stdio file objects are flushed at interpreter exit > > On Python 2 on Unix, we open a second file object referring to the stdout file > descriptor. If this file object gets garbage collected before sys.stdout (which > is very likely), the file descriptor gets closed and sys.stdout has no chance > to flush buffered data. Therefore, we flush them explicitly before the file > objects get garbage collected and the file descriptor gets closed. > > diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py > --- a/mercurial/utils/procutil.py > +++ b/mercurial/utils/procutil.py > @@ -50,6 +50,27 @@ > return False > > > +def reopen_stream(orig, mode, buffering): > + import atexit > + > + assert not pycompat.ispy3 > + # On Python 2, there's no way to create a file such that the file > + # descriptor doesn't get closed if the file object gets garbage collected. > + # Therefore, when reopening an already open stream, the file descriptor > + # will be closed twice. The error message for that will be swallowed during > + # interpreter shutdown. > + new = os.fdopen(orig.fileno(), mode, buffering) > + > + @atexit.register > + def flush_streams(): > + # Ensure that one stream gets flushed before the other closes the file > + # descriptor. > + new.flush() > + orig.flush() > + > + return new Didn't we agree that the use of atexit wouldn't buy and we'll instead add py3-only solution later? As I said, atexit will increase the complexity of the control flow. All data should be flushed inside dispatch.run().
On 2020-07-13 13:27, Yuya Nishihara wrote: > On Mon, 13 Jul 2020 00:41:27 +0200, Manuel Jacob wrote: >> # HG changeset patch >> # User Manuel Jacob <me@manueljacob.de> >> # Date 1594346556 -7200 >> # Fri Jul 10 04:02:36 2020 +0200 >> # Node ID a0ddc1349dde0f5849d196236c3fa31d49934511 >> # Parent 2668345a92975b843cab2e766b8fa59181003772 >> # EXP-Topic stdio >> procutil: ensure that all stdio file objects are flushed at >> interpreter exit >> >> On Python 2 on Unix, we open a second file object referring to the >> stdout file >> descriptor. If this file object gets garbage collected before >> sys.stdout (which >> is very likely), the file descriptor gets closed and sys.stdout has no >> chance >> to flush buffered data. Therefore, we flush them explicitly before the >> file >> objects get garbage collected and the file descriptor gets closed. >> >> diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py >> --- a/mercurial/utils/procutil.py >> +++ b/mercurial/utils/procutil.py >> @@ -50,6 +50,27 @@ >> return False >> >> >> +def reopen_stream(orig, mode, buffering): >> + import atexit >> + >> + assert not pycompat.ispy3 >> + # On Python 2, there's no way to create a file such that the file >> + # descriptor doesn't get closed if the file object gets garbage >> collected. >> + # Therefore, when reopening an already open stream, the file >> descriptor >> + # will be closed twice. The error message for that will be >> swallowed during >> + # interpreter shutdown. >> + new = os.fdopen(orig.fileno(), mode, buffering) >> + >> + @atexit.register >> + def flush_streams(): >> + # Ensure that one stream gets flushed before the other closes >> the file >> + # descriptor. >> + new.flush() >> + orig.flush() >> + >> + return new > > Didn't we agree that the use of atexit wouldn't buy and we'll instead > add > py3-only solution later? > > As I said, atexit will increase the complexity of the control flow. All > data > should be flushed inside dispatch.run(). We talked past each other, then. What I meant is that for stderr it’s not worth it because 1) we have other reasons to back it out, 2) stderr may be used for tracebacks etc. even after the atexit hook. For stdout it’s not as important because stdout is mostly written to through the ui object, but still there are some contrib scripts etc. using it directly. I don’t see any problem using atexit for this particular use case, but since I didn’t invest further time trying to find a case where we need it for stdout (and the stderr problem was solved otherwise), I’d be fine dropping the patch. On Python 3, we don’t need to do anything. What do you think?
On Mon, 13 Jul 2020 17:14:31 +0200, Manuel Jacob wrote: > > Didn't we agree that the use of atexit wouldn't buy and we'll instead > > add > > py3-only solution later? > > > > As I said, atexit will increase the complexity of the control flow. All > > data > > should be flushed inside dispatch.run(). > > We talked past each other, then. What I meant is that for stderr it’s > not worth it because 1) we have other reasons to back it out, 2) stderr > may be used for tracebacks etc. even after the atexit hook. For stdout > it’s not as important because stdout is mostly written to through the ui > object, but still there are some contrib scripts etc. using it directly. I see. I meant the use of atexit isn't nice at all. Sorry for confusion. > I don’t see any problem using atexit for this particular use case, but > since I didn’t invest further time trying to find a case where we need > it for stdout (and the stderr problem was solved otherwise), I’d be fine > dropping the patch. On Python 3, we don’t need to do anything. What do > you think? I prefer dropping this patch. Historically we've replaced atexit() with custom ui.atexit() function for better control of errors and execution flow, see c13ff31818b0. I don't remember the exact problem, but doing I/O in atexit() would be undesired.
Patch
diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py --- a/mercurial/utils/procutil.py +++ b/mercurial/utils/procutil.py @@ -50,6 +50,27 @@ return False +def reopen_stream(orig, mode, buffering): + import atexit + + assert not pycompat.ispy3 + # On Python 2, there's no way to create a file such that the file + # descriptor doesn't get closed if the file object gets garbage collected. + # Therefore, when reopening an already open stream, the file descriptor + # will be closed twice. The error message for that will be swallowed during + # interpreter shutdown. + new = os.fdopen(orig.fileno(), mode, buffering) + + @atexit.register + def flush_streams(): + # Ensure that one stream gets flushed before the other closes the file + # descriptor. + new.flush() + orig.flush() + + return new + + class LineBufferedWrapper(object): def __init__(self, orig): self.orig = orig @@ -104,7 +125,7 @@ # Therefore we have a wrapper that implements line buffering. stdout = make_line_buffered(stdout) else: - stdout = os.fdopen(stdout.fileno(), 'wb', 1) + stdout = reopen_stream(stdout, 'wb', 1) findexe = platform.findexe diff --git a/tests/test-stdio.py b/tests/test-stdio.py --- a/tests/test-stdio.py +++ b/tests/test-stdio.py @@ -30,6 +30,16 @@ LINE_BUFFERED = b'[written aaa]aaabbb\n[written bbb\\n]' FULLY_BUFFERED = b'[written aaa][written bbb\\n]aaabbb\n' +TEST_FLUSH_STDIO_CHILD_SCRIPT = r''' +import sys + +from mercurial import dispatch +from mercurial.utils import procutil + +dispatch.initstdio() +sys.{stream}.write('test') +''' + @contextlib.contextmanager def _closing(fds): @@ -143,6 +153,29 @@ test_buffering_stdout_ptys_unbuffered ) + def _test_flush_stdio(self, stream, rwpair_generator): + def check_output(stream_receiver): + self.assertEqual(_readall(stream_receiver, 1024), b'test') + + self._test( + TEST_FLUSH_STDIO_CHILD_SCRIPT.format(stream=stream), + stream, + rwpair_generator, + check_output, + ) + + def test_flush_stdout_pipes(self): + self._test_flush_stdio('stdout', _pipes) + + def test_flush_stdout_ptys(self): + self._test_flush_stdio('stdout', _ptys) + + def test_flush_stderr_pipes(self): + self._test_flush_stdio('stderr', _pipes) + + def test_flush_stderr_ptys(self): + self._test_flush_stdio('stderr', _ptys) + if __name__ == '__main__': import silenttestrunner