Patchwork [1,of,6] util: add iterfile to workaround a fileobj.__iter__ issue with EINTR

login
register
mail settings
Submitter Jun Wu
Date Nov. 14, 2016, 11:35 p.m.
Message ID <7e3bb7754d338399dee3.1479166553@x1c>
Download mbox | patch
Permalink /patch/17569/
State Accepted
Headers show

Comments

Jun Wu - Nov. 14, 2016, 11:35 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1479166374 0
#      Mon Nov 14 23:32:54 2016 +0000
# Node ID 7e3bb7754d338399dee3ee41770b7d6624b81fc1
# Parent  038547a14d850f14ecd2671852093dc07848a134
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 7e3bb7754d33
util: add iterfile to workaround a fileobj.__iter__ issue with EINTR

The fileobj.__iter__ implementation in Python 2.7.12 (hg changeset
45d4cea97b04) is buggy: it cannot handle EINTR correctly.

In Objects/fileobject.c:

    size_t Py_UniversalNewlineFread(....) {
        ....
        if (!f->f_univ_newline)
            return fread(buf, 1, n, stream);
        ....
    }

According to the "fread" man page:

    If an error occurs, or the end of the file is reached, the return value
    is a short item count (or zero).

Therefore it's possible for "fread" (and "Py_UniversalNewlineFread") to
return a positive value while errno is set to EINTR and ferror(stream)
changes from zero to non-zero.

There are multiple "Py_UniversalNewlineFread": "file_read", "file_readinto",
"file_readlines", "readahead". While the first 3 have code to handle the
EINTR case, the last one "readahead" doesn't:

    static int readahead(PyFileObject *f, Py_ssize_t bufsize) {
        ....
        chunksize = Py_UniversalNewlineFread(
            f->f_buf, bufsize, f->f_fp, (PyObject *)f);
        ....
        if (chunksize == 0) {
            if (ferror(f->f_fp)) {
                PyErr_SetFromErrno(PyExc_IOError);
                ....
            }
        }
        ....
    }

It means "readahead" could ignore EINTR, if "Py_UniversalNewlineFread"
returns a non-zero value. And at the next time "readahead" got executed, if
"Py_UniversalNewlineFread" returns 0, "readahead" would raise a Python error
without a incorrect errno - could be 0 - thus "IOError: [Errno 0] Error".

The only user of "readahead" is "readahead_get_line_skip".
The only user of "readahead_get_line_skip" is "file_iternext", aka.
"fileobj.__iter__", which should be avoided.

There are multiple places where the pattern "for x in fp" is used. This
patch adds a "iterfile" method in "util.py" so we can migrate our code from
"for x in fp" to "fox x in util.iterfile(fp)".
Gregory Szorc - Nov. 14, 2016, 11:49 p.m.
On Mon, Nov 14, 2016 at 3:35 PM, Jun Wu <quark@fb.com> wrote:

> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1479166374 0
> #      Mon Nov 14 23:32:54 2016 +0000
> # Node ID 7e3bb7754d338399dee3ee41770b7d6624b81fc1
> # Parent  038547a14d850f14ecd2671852093dc07848a134
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r
> 7e3bb7754d33
> util: add iterfile to workaround a fileobj.__iter__ issue with EINTR
>
> The fileobj.__iter__ implementation in Python 2.7.12 (hg changeset
> 45d4cea97b04) is buggy: it cannot handle EINTR correctly.
>
> In Objects/fileobject.c:
>
>     size_t Py_UniversalNewlineFread(....) {
>         ....
>         if (!f->f_univ_newline)
>             return fread(buf, 1, n, stream);
>         ....
>     }
>
> According to the "fread" man page:
>
>     If an error occurs, or the end of the file is reached, the return value
>     is a short item count (or zero).
>
> Therefore it's possible for "fread" (and "Py_UniversalNewlineFread") to
> return a positive value while errno is set to EINTR and ferror(stream)
> changes from zero to non-zero.
>
> There are multiple "Py_UniversalNewlineFread": "file_read",
> "file_readinto",
> "file_readlines", "readahead". While the first 3 have code to handle the
> EINTR case, the last one "readahead" doesn't:
>
>     static int readahead(PyFileObject *f, Py_ssize_t bufsize) {
>         ....
>         chunksize = Py_UniversalNewlineFread(
>             f->f_buf, bufsize, f->f_fp, (PyObject *)f);
>         ....
>         if (chunksize == 0) {
>             if (ferror(f->f_fp)) {
>                 PyErr_SetFromErrno(PyExc_IOError);
>                 ....
>             }
>         }
>         ....
>     }
>
> It means "readahead" could ignore EINTR, if "Py_UniversalNewlineFread"
> returns a non-zero value. And at the next time "readahead" got executed, if
> "Py_UniversalNewlineFread" returns 0, "readahead" would raise a Python
> error
> without a incorrect errno - could be 0 - thus "IOError: [Errno 0] Error".
>
> The only user of "readahead" is "readahead_get_line_skip".
> The only user of "readahead_get_line_skip" is "file_iternext", aka.
> "fileobj.__iter__", which should be avoided.
>
> There are multiple places where the pattern "for x in fp" is used. This
> patch adds a "iterfile" method in "util.py" so we can migrate our code from
> "for x in fp" to "fox x in util.iterfile(fp)".
>

Bleh. Is this bug reported upstream? (Not that it helps us much.)

This seems like a pretty trivial workaround. So LGTM.


>
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -2191,4 +2191,9 @@ def wrap(line, width, initindent='', han
>      return wrapper.fill(line).encode(encoding.encoding)
>
> +def iterfile(fp):
> +    """like fp.__iter__ but does not have issues with EINTR. Python
> 2.7.12 is
> +    known to have such issues."""
> +    return iter(fp.readline, '')
> +
>  def iterlines(iterator):
>      for chunk in iterator:
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Jun Wu - Nov. 14, 2016, 11:58 p.m.
Excerpts from Gregory Szorc's message of 2016-11-14 15:49:06 -0800:
> > without a incorrect errno - could be 0 - thus "IOError: [Errno 0] Error".
   ^^^^^^^^^^^^^^^^^^^ this should be "with an incorrect". Sorry :'(

> > The only user of "readahead" is "readahead_get_line_skip".
> > The only user of "readahead_get_line_skip" is "file_iternext", aka.
> > "fileobj.__iter__", which should be avoided.
> >
> > There are multiple places where the pattern "for x in fp" is used. This
> > patch adds a "iterfile" method in "util.py" so we can migrate our code from
> > "for x in fp" to "fox x in util.iterfile(fp)".
> >
> 
> Bleh. Is this bug reported upstream? (Not that it helps us much.)

I guess no? I'll report it tomorrow. Python 3 seems to be unrelated as the
old code seems to have been largely rewritten.

> This seems like a pretty trivial workaround. So LGTM.
> 
> >
> > diff --git a/mercurial/util.py b/mercurial/util.py
> > --- a/mercurial/util.py
> > +++ b/mercurial/util.py
> > @@ -2191,4 +2191,9 @@ def wrap(line, width, initindent='', han
> >      return wrapper.fill(line).encode(encoding.encoding)
> >
> > +def iterfile(fp):
> > +    """like fp.__iter__ but does not have issues with EINTR. Python
> > 2.7.12 is
> > +    known to have such issues."""
> > +    return iter(fp.readline, '')
> > +
> >  def iterlines(iterator):
> >      for chunk in iterator:
> > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel 
> >
Pierre-Yves David - Nov. 15, 2016, 12:31 a.m.
On 11/14/2016 11:49 PM, Gregory Szorc wrote:
> On Mon, Nov 14, 2016 at 3:35 PM, Jun Wu <quark@fb.com
> <mailto:quark@fb.com>> wrote:
>
>     # HG changeset patch
>     # User Jun Wu <quark@fb.com <mailto:quark@fb.com>>
>     # Date 1479166374 0
>     #      Mon Nov 14 23:32:54 2016 +0000
>     # Node ID 7e3bb7754d338399dee3ee41770b7d6624b81fc1
>     # Parent  038547a14d850f14ecd2671852093dc07848a134
>     # Available At https://bitbucket.org/quark-zju/hg-draft
>     <https://bitbucket.org/quark-zju/hg-draft>
>     #              hg pull https://bitbucket.org/quark-zju/hg-draft
>     <https://bitbucket.org/quark-zju/hg-draft> -r 7e3bb7754d33
>     util: add iterfile to workaround a fileobj.__iter__ issue with EINTR
>
>     The fileobj.__iter__ implementation in Python 2.7.12 (hg changeset
>     45d4cea97b04) is buggy: it cannot handle EINTR correctly.
>
>     In Objects/fileobject.c:
>
>         size_t Py_UniversalNewlineFread(....) {
>             ....
>             if (!f->f_univ_newline)
>                 return fread(buf, 1, n, stream);
>             ....
>         }
>
>     According to the "fread" man page:
>
>         If an error occurs, or the end of the file is reached, the
>     return value
>         is a short item count (or zero).
>
>     Therefore it's possible for "fread" (and "Py_UniversalNewlineFread") to
>     return a positive value while errno is set to EINTR and ferror(stream)
>     changes from zero to non-zero.
>
>     There are multiple "Py_UniversalNewlineFread": "file_read",
>     "file_readinto",
>     "file_readlines", "readahead". While the first 3 have code to handle the
>     EINTR case, the last one "readahead" doesn't:
>
>         static int readahead(PyFileObject *f, Py_ssize_t bufsize) {
>             ....
>             chunksize = Py_UniversalNewlineFread(
>                 f->f_buf, bufsize, f->f_fp, (PyObject *)f);
>             ....
>             if (chunksize == 0) {
>                 if (ferror(f->f_fp)) {
>                     PyErr_SetFromErrno(PyExc_IOError);
>                     ....
>                 }
>             }
>             ....
>         }
>
>     It means "readahead" could ignore EINTR, if "Py_UniversalNewlineFread"
>     returns a non-zero value. And at the next time "readahead" got
>     executed, if
>     "Py_UniversalNewlineFread" returns 0, "readahead" would raise a
>     Python error
>     without a incorrect errno - could be 0 - thus "IOError: [Errno 0]
>     Error".
>
>     The only user of "readahead" is "readahead_get_line_skip".
>     The only user of "readahead_get_line_skip" is "file_iternext", aka.
>     "fileobj.__iter__", which should be avoided.
>
>     There are multiple places where the pattern "for x in fp" is used. This
>     patch adds a "iterfile" method in "util.py" so we can migrate our
>     code from
>     "for x in fp" to "fox x in util.iterfile(fp)".
>
>
> Bleh. Is this bug reported upstream? (Not that it helps us much.)
>
> This seems like a pretty trivial workaround. So LGTM.

This seems okay and I trust Greg review here. This is pushed.

Patch

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -2191,4 +2191,9 @@  def wrap(line, width, initindent='', han
     return wrapper.fill(line).encode(encoding.encoding)
 
+def iterfile(fp):
+    """like fp.__iter__ but does not have issues with EINTR. Python 2.7.12 is
+    known to have such issues."""
+    return iter(fp.readline, '')
+
 def iterlines(iterator):
     for chunk in iterator: