Submitter | Manuel Jacob |
---|---|
Date | June 2, 2020, 6:07 p.m. |
Message ID | <ebbc45544673c33ea3be.1591121253@tmp> |
Download | mbox | patch |
Permalink | /patch/46458/ |
State | Superseded |
Headers | show |
Comments
> On Jun 2, 2020, at 14:07, Manuel Jacob <me@manueljacob.de> wrote: > > # HG changeset patch > # User Manuel Jacob <me@manueljacob.de> > # Date 1591120869 -7200 > # Tue Jun 02 20:01:09 2020 +0200 > # Branch stable > # Node ID ebbc45544673c33ea3beb887ed4d5230b015102a > # Parent 91e509a12dbc4cd6cf2dcb9dae3ed383932132ac > py3: always flush ui streams on Python 3 > > On Python 2, we rely on that stdout and stderr are line buffered. Python 3’s > io module doesn’t offer line buffered binary streams. We use the underlying non-buffer thing though for std{err, out} though, or so I thought. Are you observing behavior that this corrects? > > diff --git a/mercurial/ui.py b/mercurial/ui.py > --- a/mercurial/ui.py > +++ b/mercurial/ui.py > @@ -1198,9 +1198,14 @@ class ui(object): > label = opts.get('label', b'') > msg = self.label(msg, label) > dest.write(msg) > + # On Python 2, stdout and stderr are usually line buffered, but > # stderr may be buffered under win32 when redirected to files, > # including stdout. > - if dest is self._ferr and not getattr(self._ferr, 'closed', False): > + # On Python 3, we use the underlying binary buffer, which does not > + # support line buffering. > + if (pycompat.ispy3 or dest is self._ferr) and not getattr( > + self._ferr, 'closed', False > + ): > dest.flush() > except IOError as err: > if dest is self._ferr and err.errno in ( > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On 2020-06-03 00:04, Augie Fackler wrote: >> On Jun 2, 2020, at 14:07, Manuel Jacob <me@manueljacob.de> wrote: >> >> # HG changeset patch >> # User Manuel Jacob <me@manueljacob.de> >> # Date 1591120869 -7200 >> # Tue Jun 02 20:01:09 2020 +0200 >> # Branch stable >> # Node ID ebbc45544673c33ea3beb887ed4d5230b015102a >> # Parent 91e509a12dbc4cd6cf2dcb9dae3ed383932132ac >> py3: always flush ui streams on Python 3 >> >> On Python 2, we rely on that stdout and stderr are line buffered. >> Python 3’s >> io module doesn’t offer line buffered binary streams. > > We use the underlying non-buffer thing though for std{err, out} > though, or so I thought. Are you observing behavior that this > corrects? pycompat says: stdout = sys.stdout.buffer stderr = sys.stderr.buffer They are <_io.BufferedWriter name='<stdout>'> and <_io.BufferedWriter name='<stderr>'>, respectively. I observed the error in TortoiseHg, which is a long-running process. To trigger the problem with the hg command, you would need to do something slow that prints on `self._fout`, e.g. `hg pull https://www.mercurial-scm.org/repo/hg --debug`. >> diff --git a/mercurial/ui.py b/mercurial/ui.py >> --- a/mercurial/ui.py >> +++ b/mercurial/ui.py >> @@ -1198,9 +1198,14 @@ class ui(object): >> label = opts.get('label', b'') >> msg = self.label(msg, label) >> dest.write(msg) >> + # On Python 2, stdout and stderr are usually line >> buffered, but >> # stderr may be buffered under win32 when redirected to >> files, >> # including stdout. >> - if dest is self._ferr and not getattr(self._ferr, >> 'closed', False): >> + # On Python 3, we use the underlying binary buffer, which >> does not >> + # support line buffering. >> + if (pycompat.ispy3 or dest is self._ferr) and not >> getattr( >> + self._ferr, 'closed', False >> + ): >> dest.flush() >> except IOError as err: >> if dest is self._ferr and err.errno in ( >> _______________________________________________ >> Mercurial-devel mailing list >> Mercurial-devel@mercurial-scm.org >> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
On 2020-06-03 05:46, Manuel Jacob wrote: > On 2020-06-03 00:04, Augie Fackler wrote: >>> On Jun 2, 2020, at 14:07, Manuel Jacob <me@manueljacob.de> wrote: >>> >>> # HG changeset patch >>> # User Manuel Jacob <me@manueljacob.de> >>> # Date 1591120869 -7200 >>> # Tue Jun 02 20:01:09 2020 +0200 >>> # Branch stable >>> # Node ID ebbc45544673c33ea3beb887ed4d5230b015102a >>> # Parent 91e509a12dbc4cd6cf2dcb9dae3ed383932132ac >>> py3: always flush ui streams on Python 3 >>> >>> On Python 2, we rely on that stdout and stderr are line buffered. >>> Python 3’s >>> io module doesn’t offer line buffered binary streams. >> >> We use the underlying non-buffer thing though for std{err, out} >> though, or so I thought. Are you observing behavior that this >> corrects? > > pycompat says: > > stdout = sys.stdout.buffer > stderr = sys.stderr.buffer > > They are <_io.BufferedWriter name='<stdout>'> and <_io.BufferedWriter > name='<stderr>'>, respectively. > > I observed the error in TortoiseHg, which is a long-running process. > > To trigger the problem with the hg command, you would need to do > something slow that prints on `self._fout`, e.g. `hg pull > https://www.mercurial-scm.org/repo/hg --debug`. Actually, you don’t need to specify --debug. Just run `hg pull https://www.mercurial-scm.org/repo/hg` and observe that without the patch, "pulling from https://www.mercurial-scm.org/repo/hg" will not be immediately printed on your screen. I’ve sent an updated patch with this example in the description and with a correction.
On Wed, 03 Jun 2020 05:46:05 +0200, Manuel Jacob wrote: > On 2020-06-03 00:04, Augie Fackler wrote: > >> On Jun 2, 2020, at 14:07, Manuel Jacob <me@manueljacob.de> wrote: > >> > >> # HG changeset patch > >> # User Manuel Jacob <me@manueljacob.de> > >> # Date 1591120869 -7200 > >> # Tue Jun 02 20:01:09 2020 +0200 > >> # Branch stable > >> # Node ID ebbc45544673c33ea3beb887ed4d5230b015102a > >> # Parent 91e509a12dbc4cd6cf2dcb9dae3ed383932132ac > >> py3: always flush ui streams on Python 3 > >> > >> On Python 2, we rely on that stdout and stderr are line buffered. > >> Python 3’s > >> io module doesn’t offer line buffered binary streams. > > > > We use the underlying non-buffer thing though for std{err, out} > > though, or so I thought. Are you observing behavior that this > > corrects? > > pycompat says: > > stdout = sys.stdout.buffer > stderr = sys.stderr.buffer > > They are <_io.BufferedWriter name='<stdout>'> and <_io.BufferedWriter > name='<stderr>'>, respectively. > > I observed the error in TortoiseHg, which is a long-running process. > > To trigger the problem with the hg command, you would need to do > something slow that prints on `self._fout`, e.g. `hg pull > https://www.mercurial-scm.org/repo/hg --debug`. Denis, do you have any idea? My assumption was sys.stdout.buffer was magically line-buffered, but it appears not at least with Python 3.8.3 on Linux. https://www.mercurial-scm.org/repo/hg/rev/227ba1afcb65 > >> --- a/mercurial/ui.py > >> +++ b/mercurial/ui.py > >> @@ -1198,9 +1198,14 @@ class ui(object): > >> label = opts.get('label', b'') > >> msg = self.label(msg, label) > >> dest.write(msg) > >> + # On Python 2, stdout and stderr are usually line > >> buffered, but > >> # stderr may be buffered under win32 when redirected to > >> files, > >> # including stdout. > >> - if dest is self._ferr and not getattr(self._ferr, > >> 'closed', False): > >> + # On Python 3, we use the underlying binary buffer, which > >> does not > >> + # support line buffering. > >> + if (pycompat.ispy3 or dest is self._ferr) and not > >> getattr( > >> + self._ferr, 'closed', False > >> + ): > >> dest.flush() Flushing stdout per write() sounds bad. If that's the only way to work around Py3 io module, we'll probably need to check if the stdout is a tty or not.
> On Jun 3, 2020, at 07:13, Yuya Nishihara <yuya@tcha.org> wrote: > > On Wed, 03 Jun 2020 05:46:05 +0200, Manuel Jacob wrote: >> On 2020-06-03 00:04, Augie Fackler wrote: >>>> On Jun 2, 2020, at 14:07, Manuel Jacob <me@manueljacob.de> wrote: >>>> >>>> # HG changeset patch >>>> # User Manuel Jacob <me@manueljacob.de> >>>> # Date 1591120869 -7200 >>>> # Tue Jun 02 20:01:09 2020 +0200 >>>> # Branch stable >>>> # Node ID ebbc45544673c33ea3beb887ed4d5230b015102a >>>> # Parent 91e509a12dbc4cd6cf2dcb9dae3ed383932132ac >>>> py3: always flush ui streams on Python 3 >>>> >>>> On Python 2, we rely on that stdout and stderr are line buffered. >>>> Python 3’s >>>> io module doesn’t offer line buffered binary streams. >>> >>> We use the underlying non-buffer thing though for std{err, out} >>> though, or so I thought. Are you observing behavior that this >>> corrects? >> >> pycompat says: >> >> stdout = sys.stdout.buffer >> stderr = sys.stderr.buffer >> >> They are <_io.BufferedWriter name='<stdout>'> and <_io.BufferedWriter >> name='<stderr>'>, respectively. >> >> I observed the error in TortoiseHg, which is a long-running process. >> >> To trigger the problem with the hg command, you would need to do >> something slow that prints on `self._fout`, e.g. `hg pull >> https://www.mercurial-scm.org/repo/hg --debug`. > > Denis, do you have any idea? My assumption was sys.stdout.buffer was magically > line-buffered, but it appears not at least with Python 3.8.3 on Linux. > > https://www.mercurial-scm.org/repo/hg/rev/227ba1afcb65 > >>>> --- a/mercurial/ui.py >>>> +++ b/mercurial/ui.py >>>> @@ -1198,9 +1198,14 @@ class ui(object): >>>> label = opts.get('label', b'') >>>> msg = self.label(msg, label) >>>> dest.write(msg) >>>> + # On Python 2, stdout and stderr are usually line >>>> buffered, but >>>> # stderr may be buffered under win32 when redirected to >>>> files, >>>> # including stdout. >>>> - if dest is self._ferr and not getattr(self._ferr, >>>> 'closed', False): >>>> + # On Python 3, we use the underlying binary buffer, which >>>> does not >>>> + # support line buffering. >>>> + if (pycompat.ispy3 or dest is self._ferr) and not >>>> getattr( >>>> + self._ferr, 'closed', False >>>> + ): >>>> dest.flush() > > Flushing stdout per write() sounds bad. If that's the only way to work > around Py3 io module, we'll probably need to check if the stdout is > a tty or not. Agreed. Maybe we could sniff for a \n in the output stream, but I dunno if that's likely to be a win.
On 2020-06-03 22:40, Augie Fackler wrote: >> On Jun 3, 2020, at 07:13, Yuya Nishihara <yuya@tcha.org> wrote: >> >> On Wed, 03 Jun 2020 05:46:05 +0200, Manuel Jacob wrote: >>> On 2020-06-03 00:04, Augie Fackler wrote: >>>>> On Jun 2, 2020, at 14:07, Manuel Jacob <me@manueljacob.de> wrote: >>>>> >>>>> # HG changeset patch >>>>> # User Manuel Jacob <me@manueljacob.de> >>>>> # Date 1591120869 -7200 >>>>> # Tue Jun 02 20:01:09 2020 +0200 >>>>> # Branch stable >>>>> # Node ID ebbc45544673c33ea3beb887ed4d5230b015102a >>>>> # Parent 91e509a12dbc4cd6cf2dcb9dae3ed383932132ac >>>>> py3: always flush ui streams on Python 3 >>>>> >>>>> On Python 2, we rely on that stdout and stderr are line buffered. >>>>> Python 3’s >>>>> io module doesn’t offer line buffered binary streams. >>>> >>>> We use the underlying non-buffer thing though for std{err, out} >>>> though, or so I thought. Are you observing behavior that this >>>> corrects? >>> >>> pycompat says: >>> >>> stdout = sys.stdout.buffer >>> stderr = sys.stderr.buffer >>> >>> They are <_io.BufferedWriter name='<stdout>'> and <_io.BufferedWriter >>> name='<stderr>'>, respectively. >>> >>> I observed the error in TortoiseHg, which is a long-running process. >>> >>> To trigger the problem with the hg command, you would need to do >>> something slow that prints on `self._fout`, e.g. `hg pull >>> https://www.mercurial-scm.org/repo/hg --debug`. >> >> Denis, do you have any idea? My assumption was sys.stdout.buffer was >> magically >> line-buffered, but it appears not at least with Python 3.8.3 on Linux. >> >> https://www.mercurial-scm.org/repo/hg/rev/227ba1afcb65 >> >>>>> --- a/mercurial/ui.py >>>>> +++ b/mercurial/ui.py >>>>> @@ -1198,9 +1198,14 @@ class ui(object): >>>>> label = opts.get('label', b'') >>>>> msg = self.label(msg, label) >>>>> dest.write(msg) >>>>> + # On Python 2, stdout and stderr are usually line >>>>> buffered, but >>>>> # stderr may be buffered under win32 when redirected to >>>>> files, >>>>> # including stdout. >>>>> - if dest is self._ferr and not getattr(self._ferr, >>>>> 'closed', False): >>>>> + # On Python 3, we use the underlying binary buffer, >>>>> which >>>>> does not >>>>> + # support line buffering. >>>>> + if (pycompat.ispy3 or dest is self._ferr) and not >>>>> getattr( >>>>> + self._ferr, 'closed', False >>>>> + ): >>>>> dest.flush() >> >> Flushing stdout per write() sounds bad. If that's the only way to work >> around Py3 io module, we'll probably need to check if the stdout is >> a tty or not. > > Agreed. Maybe we could sniff for a \n in the output stream, but I > dunno if that's likely to be a win. For status output (in the general sense, not necessarily in the ui.status() sense), flushing the stream would IMHO be fine: the messages should be shown as fast as possible, often end with a '\n' (so the stream would be flushed anyway in case of line buffering) and often use string interpolation (therefore _writenobuf() gets called once per message), and the amount of output is limited (so printing the lines won’t dominate the total time spent). For "data" output (like diffs), flushing the stream can indeed be wasteful: showing single lines immediately is often not as important, ui.write() is often called with small data (not complete lines), and the amount of output can be large. Maybe we can introduce special methods for this kind of data that uses fully buffered writes? Code that uses these methods would need to flush manually. Just sniffing for '\n', as Augie suggested, is pretty much what the io module does when line buffering is enabled (https://github.com/python/cpython/blob/v3.8.3/Modules/_io/textio.c#L1597). Of course, we should do that only if the stream is interactive. I’m concerned that calling isatty() on every write is too expensive, but I need to benchmark this. BTW, there’s ui._isatty() that checks the ui.nontty config. This config is not checked when setting sys.stdout to line-buffered on Python 2 (by the interpreter), so am I right that we should not care about this config? BTW, I’m not sure I understand the comment about stderr on Windows. I couldn't find any further information on this topic. Is it the case that, when stderr is redirected to stdout on windows, Python fails to recognize that stderr is actually interactive and doesn't set line buffering? Is there some command that calls ui.write() many times with small data and doesn’t do much else, to be used as a benchmark?
Yuya Nishihara a écrit : > On Wed, 03 Jun 2020 05:46:05 +0200, Manuel Jacob wrote: > > On 2020-06-03 00:04, Augie Fackler wrote: > > >> On Jun 2, 2020, at 14:07, Manuel Jacob <me@manueljacob.de> wrote: > > >> > > >> # HG changeset patch > > >> # User Manuel Jacob <me@manueljacob.de> > > >> # Date 1591120869 -7200 > > >> # Tue Jun 02 20:01:09 2020 +0200 > > >> # Branch stable > > >> # Node ID ebbc45544673c33ea3beb887ed4d5230b015102a > > >> # Parent 91e509a12dbc4cd6cf2dcb9dae3ed383932132ac > > >> py3: always flush ui streams on Python 3 > > >> > > >> On Python 2, we rely on that stdout and stderr are line buffered. > > >> Python 3’s > > >> io module doesn’t offer line buffered binary streams. > > > > > > We use the underlying non-buffer thing though for std{err, out} > > > though, or so I thought. Are you observing behavior that this > > > corrects? > > > > pycompat says: > > > > stdout = sys.stdout.buffer > > stderr = sys.stderr.buffer > > > > They are <_io.BufferedWriter name='<stdout>'> and <_io.BufferedWriter > > name='<stderr>'>, respectively. > > > > I observed the error in TortoiseHg, which is a long-running process. > > > > To trigger the problem with the hg command, you would need to do > > something slow that prints on `self._fout`, e.g. `hg pull > > https://www.mercurial-scm.org/repo/hg --debug`. > > Denis, do you have any idea? My assumption was sys.stdout.buffer was magically > line-buffered, but it appears not at least with Python 3.8.3 on Linux. sys.stdout is documented to be line-buffered in interactive mode. But looking at the implementation of TextIOWrapper, line buffering is apparently not handled at the underlying buffer level (which we use IIRC): https://github.com/python/cpython/blob/v3.8.3/Lib/_pyio.py#L2210 This would mean that sys.stdout.buffer is not line-buffered. So perhaps flushing using a similar logic as in TextIOWrapper.write(), but at the buffer level, is the way go. > https://www.mercurial-scm.org/repo/hg/rev/227ba1afcb65 > > > >> --- a/mercurial/ui.py > > >> +++ b/mercurial/ui.py > > >> @@ -1198,9 +1198,14 @@ class ui(object): > > >> label = opts.get('label', b'') > > >> msg = self.label(msg, label) > > >> dest.write(msg) > > >> + # On Python 2, stdout and stderr are usually line > > >> buffered, but > > >> # stderr may be buffered under win32 when redirected to > > >> files, > > >> # including stdout. > > >> - if dest is self._ferr and not getattr(self._ferr, > > >> 'closed', False): > > >> + # On Python 3, we use the underlying binary buffer, which > > >> does not > > >> + # support line buffering. > > >> + if (pycompat.ispy3 or dest is self._ferr) and not > > >> getattr( > > >> + self._ferr, 'closed', False > > >> + ): > > >> dest.flush() > > Flushing stdout per write() sounds bad. If that's the only way to work > around Py3 io module, we'll probably need to check if the stdout is > a tty or not.
On Thu, 04 Jun 2020 07:54:14 +0200, Manuel Jacob wrote: > On 2020-06-03 22:40, Augie Fackler wrote: > >> On Jun 3, 2020, at 07:13, Yuya Nishihara <yuya@tcha.org> wrote: > >> Flushing stdout per write() sounds bad. If that's the only way to work > >> around Py3 io module, we'll probably need to check if the stdout is > >> a tty or not. > > > > Agreed. Maybe we could sniff for a \n in the output stream, but I > > dunno if that's likely to be a win. > > For status output (in the general sense, not necessarily in the > ui.status() sense), flushing the stream would IMHO be fine: the messages > should be shown as fast as possible, often end with a '\n' Yes, that should be fine. > For "data" output (like diffs), flushing the stream can indeed be > wasteful: showing single lines immediately is often not as important, > ui.write() is often called with small data (not complete lines), and the > amount of output can be large. Maybe we can introduce special methods > for this kind of data that uses fully buffered writes? Code that uses > these methods would need to flush manually. Actually there's a method for "data" output: - ui.write(), which is a subset of ui._write() - ui.status(), etc. -> ui._write() -> ui._writenobuf() So it's probably okay for ui._writenobuf() to flush() per call. For "data" output, we'll also need to call flush() occasionally so e.g. "hg log -r 'grep(foobar)'" will emit data incrementally. It would be nice if this flush() can be automated, but that's up to the overhead. > Of course, we should do that only if the stream is interactive. I’m > concerned that calling isatty() on every write is too expensive, but I > need to benchmark this. BTW, there’s ui._isatty() that checks the > ui.nontty config. This config is not checked when setting sys.stdout to > line-buffered on Python 2 (by the interpreter), so am I right that we > should not care about this config? That wouldn't matter, but I think ui functions should respect ui.nontty option consistently. > BTW, I’m not sure I understand the comment about stderr on Windows. I > couldn't find any further information on this topic. Is it the case > that, when stderr is redirected to stdout on windows, Python fails to > recognize that stderr is actually interactive and doesn't set line > buffering? Maybe there would be weird behavior in C stdio. > Is there some command that calls ui.write() many times with small data > and doesn’t do much else, to be used as a benchmark? hg files?
On 2020-06-04 14:21, Yuya Nishihara wrote: > On Thu, 04 Jun 2020 07:54:14 +0200, Manuel Jacob wrote: >> On 2020-06-03 22:40, Augie Fackler wrote: >> >> On Jun 3, 2020, at 07:13, Yuya Nishihara <yuya@tcha.org> wrote: >> >> Flushing stdout per write() sounds bad. If that's the only way to work >> >> around Py3 io module, we'll probably need to check if the stdout is >> >> a tty or not. >> > >> > Agreed. Maybe we could sniff for a \n in the output stream, but I >> > dunno if that's likely to be a win. >> >> For status output (in the general sense, not necessarily in the >> ui.status() sense), flushing the stream would IMHO be fine: the >> messages >> should be shown as fast as possible, often end with a '\n' > > Yes, that should be fine. > >> For "data" output (like diffs), flushing the stream can indeed be >> wasteful: showing single lines immediately is often not as important, >> ui.write() is often called with small data (not complete lines), and >> the >> amount of output can be large. Maybe we can introduce special methods >> for this kind of data that uses fully buffered writes? Code that uses >> these methods would need to flush manually. > > Actually there's a method for "data" output: > > - ui.write(), which is a subset of ui._write() > - ui.status(), etc. -> ui._write() -> ui._writenobuf() > > So it's probably okay for ui._writenobuf() to flush() per call. For > "data" > output, we'll also need to call flush() occasionally so e.g. > "hg log -r 'grep(foobar)'" will emit data incrementally. It would be > nice > if this flush() can be automated, but that's up to the overhead. I missed one thing before: By default, ui uses the streams from procutil, which are supposed to be line-buffered or unbuffered. That logic was buggy and gets fixed in the new patch series I just sent. So the higher layers don’t need to flush. As suggested in a comment in the last patch, we could make the streams from procutil buffered and push the responsibility to flush to higher layers. >> Of course, we should do that only if the stream is interactive. I’m >> concerned that calling isatty() on every write is too expensive, but I >> need to benchmark this. BTW, there’s ui._isatty() that checks the >> ui.nontty config. This config is not checked when setting sys.stdout >> to >> line-buffered on Python 2 (by the interpreter), so am I right that we >> should not care about this config? > > That wouldn't matter, but I think ui functions should respect ui.nontty > option consistently. > >> BTW, I’m not sure I understand the comment about stderr on Windows. I >> couldn't find any further information on this topic. Is it the case >> that, when stderr is redirected to stdout on windows, Python fails to >> recognize that stderr is actually interactive and doesn't set line >> buffering? > > Maybe there would be weird behavior in C stdio. Procutil claims "Windows doesn't support line buffering. I didn’t find an good independent source clearly indicating that this is the case, but https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setvbuf?view=vs-2019#remarks suggests that it could be the case. Luckily, on Python 3 we’re not dependent on such implementation details anymore. I think the logic to flush stderr can be replaced by code in procutil that ensures that procutil.stderr is always unbuffered. I’ll send a patch once the current series is through. >> Is there some command that calls ui.write() many times with small data >> and doesn’t do much else, to be used as a benchmark? > > hg files?
Patch
diff --git a/mercurial/ui.py b/mercurial/ui.py --- a/mercurial/ui.py +++ b/mercurial/ui.py @@ -1198,9 +1198,14 @@ class ui(object): label = opts.get('label', b'') msg = self.label(msg, label) dest.write(msg) + # On Python 2, stdout and stderr are usually line buffered, but # stderr may be buffered under win32 when redirected to files, # including stdout. - if dest is self._ferr and not getattr(self._ferr, 'closed', False): + # On Python 3, we use the underlying binary buffer, which does not + # support line buffering. + if (pycompat.ispy3 or dest is self._ferr) and not getattr( + self._ferr, 'closed', False + ): dest.flush() except IOError as err: if dest is self._ferr and err.errno in (