Submitter | Jun Wu |
---|---|
Date | Nov. 14, 2016, 11:35 p.m. |
Message ID | <7e3bb7754d338399dee3.1479166553@x1c> |
Download | mbox | patch |
Permalink | /patch/17569/ |
State | Accepted |
Headers | show |
Comments
On Mon, Nov 14, 2016 at 3:35 PM, Jun Wu <quark@fb.com> wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1479166374 0 > # Mon Nov 14 23:32:54 2016 +0000 > # Node ID 7e3bb7754d338399dee3ee41770b7d6624b81fc1 > # Parent 038547a14d850f14ecd2671852093dc07848a134 > # Available At https://bitbucket.org/quark-zju/hg-draft > # hg pull https://bitbucket.org/quark-zju/hg-draft -r > 7e3bb7754d33 > util: add iterfile to workaround a fileobj.__iter__ issue with EINTR > > The fileobj.__iter__ implementation in Python 2.7.12 (hg changeset > 45d4cea97b04) is buggy: it cannot handle EINTR correctly. > > In Objects/fileobject.c: > > size_t Py_UniversalNewlineFread(....) { > .... > if (!f->f_univ_newline) > return fread(buf, 1, n, stream); > .... > } > > According to the "fread" man page: > > If an error occurs, or the end of the file is reached, the return value > is a short item count (or zero). > > Therefore it's possible for "fread" (and "Py_UniversalNewlineFread") to > return a positive value while errno is set to EINTR and ferror(stream) > changes from zero to non-zero. > > There are multiple "Py_UniversalNewlineFread": "file_read", > "file_readinto", > "file_readlines", "readahead". While the first 3 have code to handle the > EINTR case, the last one "readahead" doesn't: > > static int readahead(PyFileObject *f, Py_ssize_t bufsize) { > .... > chunksize = Py_UniversalNewlineFread( > f->f_buf, bufsize, f->f_fp, (PyObject *)f); > .... > if (chunksize == 0) { > if (ferror(f->f_fp)) { > PyErr_SetFromErrno(PyExc_IOError); > .... > } > } > .... > } > > It means "readahead" could ignore EINTR, if "Py_UniversalNewlineFread" > returns a non-zero value. And at the next time "readahead" got executed, if > "Py_UniversalNewlineFread" returns 0, "readahead" would raise a Python > error > without a incorrect errno - could be 0 - thus "IOError: [Errno 0] Error". > > The only user of "readahead" is "readahead_get_line_skip". > The only user of "readahead_get_line_skip" is "file_iternext", aka. > "fileobj.__iter__", which should be avoided. > > There are multiple places where the pattern "for x in fp" is used. This > patch adds a "iterfile" method in "util.py" so we can migrate our code from > "for x in fp" to "fox x in util.iterfile(fp)". > Bleh. Is this bug reported upstream? (Not that it helps us much.) This seems like a pretty trivial workaround. So LGTM. > > diff --git a/mercurial/util.py b/mercurial/util.py > --- a/mercurial/util.py > +++ b/mercurial/util.py > @@ -2191,4 +2191,9 @@ def wrap(line, width, initindent='', han > return wrapper.fill(line).encode(encoding.encoding) > > +def iterfile(fp): > + """like fp.__iter__ but does not have issues with EINTR. Python > 2.7.12 is > + known to have such issues.""" > + return iter(fp.readline, '') > + > def iterlines(iterator): > for chunk in iterator: > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >
Excerpts from Gregory Szorc's message of 2016-11-14 15:49:06 -0800: > > without a incorrect errno - could be 0 - thus "IOError: [Errno 0] Error". ^^^^^^^^^^^^^^^^^^^ this should be "with an incorrect". Sorry :'( > > The only user of "readahead" is "readahead_get_line_skip". > > The only user of "readahead_get_line_skip" is "file_iternext", aka. > > "fileobj.__iter__", which should be avoided. > > > > There are multiple places where the pattern "for x in fp" is used. This > > patch adds a "iterfile" method in "util.py" so we can migrate our code from > > "for x in fp" to "fox x in util.iterfile(fp)". > > > > Bleh. Is this bug reported upstream? (Not that it helps us much.) I guess no? I'll report it tomorrow. Python 3 seems to be unrelated as the old code seems to have been largely rewritten. > This seems like a pretty trivial workaround. So LGTM. > > > > > diff --git a/mercurial/util.py b/mercurial/util.py > > --- a/mercurial/util.py > > +++ b/mercurial/util.py > > @@ -2191,4 +2191,9 @@ def wrap(line, width, initindent='', han > > return wrapper.fill(line).encode(encoding.encoding) > > > > +def iterfile(fp): > > + """like fp.__iter__ but does not have issues with EINTR. Python > > 2.7.12 is > > + known to have such issues.""" > > + return iter(fp.readline, '') > > + > > def iterlines(iterator): > > for chunk in iterator: > > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel > >
On 11/14/2016 11:49 PM, Gregory Szorc wrote: > On Mon, Nov 14, 2016 at 3:35 PM, Jun Wu <quark@fb.com > <mailto:quark@fb.com>> wrote: > > # HG changeset patch > # User Jun Wu <quark@fb.com <mailto:quark@fb.com>> > # Date 1479166374 0 > # Mon Nov 14 23:32:54 2016 +0000 > # Node ID 7e3bb7754d338399dee3ee41770b7d6624b81fc1 > # Parent 038547a14d850f14ecd2671852093dc07848a134 > # Available At https://bitbucket.org/quark-zju/hg-draft > <https://bitbucket.org/quark-zju/hg-draft> > # hg pull https://bitbucket.org/quark-zju/hg-draft > <https://bitbucket.org/quark-zju/hg-draft> -r 7e3bb7754d33 > util: add iterfile to workaround a fileobj.__iter__ issue with EINTR > > The fileobj.__iter__ implementation in Python 2.7.12 (hg changeset > 45d4cea97b04) is buggy: it cannot handle EINTR correctly. > > In Objects/fileobject.c: > > size_t Py_UniversalNewlineFread(....) { > .... > if (!f->f_univ_newline) > return fread(buf, 1, n, stream); > .... > } > > According to the "fread" man page: > > If an error occurs, or the end of the file is reached, the > return value > is a short item count (or zero). > > Therefore it's possible for "fread" (and "Py_UniversalNewlineFread") to > return a positive value while errno is set to EINTR and ferror(stream) > changes from zero to non-zero. > > There are multiple "Py_UniversalNewlineFread": "file_read", > "file_readinto", > "file_readlines", "readahead". While the first 3 have code to handle the > EINTR case, the last one "readahead" doesn't: > > static int readahead(PyFileObject *f, Py_ssize_t bufsize) { > .... > chunksize = Py_UniversalNewlineFread( > f->f_buf, bufsize, f->f_fp, (PyObject *)f); > .... > if (chunksize == 0) { > if (ferror(f->f_fp)) { > PyErr_SetFromErrno(PyExc_IOError); > .... > } > } > .... > } > > It means "readahead" could ignore EINTR, if "Py_UniversalNewlineFread" > returns a non-zero value. And at the next time "readahead" got > executed, if > "Py_UniversalNewlineFread" returns 0, "readahead" would raise a > Python error > without a incorrect errno - could be 0 - thus "IOError: [Errno 0] > Error". > > The only user of "readahead" is "readahead_get_line_skip". > The only user of "readahead_get_line_skip" is "file_iternext", aka. > "fileobj.__iter__", which should be avoided. > > There are multiple places where the pattern "for x in fp" is used. This > patch adds a "iterfile" method in "util.py" so we can migrate our > code from > "for x in fp" to "fox x in util.iterfile(fp)". > > > Bleh. Is this bug reported upstream? (Not that it helps us much.) > > This seems like a pretty trivial workaround. So LGTM. This seems okay and I trust Greg review here. This is pushed.
Patch
diff --git a/mercurial/util.py b/mercurial/util.py --- a/mercurial/util.py +++ b/mercurial/util.py @@ -2191,4 +2191,9 @@ def wrap(line, width, initindent='', han return wrapper.fill(line).encode(encoding.encoding) +def iterfile(fp): + """like fp.__iter__ but does not have issues with EINTR. Python 2.7.12 is + known to have such issues.""" + return iter(fp.readline, '') + def iterlines(iterator): for chunk in iterator: