Patchwork [STABLE] worker: ignore meaningless exit status indication returned by os.waitpid()

login
register
mail settings
Submitter Katsunori FUJIWARA
Date Feb. 23, 2017, 9:05 p.m.
Message ID <d879917b416a305a42ab.1487883929@juju>
Download mbox | patch
Permalink /patch/18752/
State Changes Requested
Headers show

Comments

Katsunori FUJIWARA - Feb. 23, 2017, 9:05 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1487883654 -32400
#      Fri Feb 24 06:00:54 2017 +0900
# Branch stable
# Node ID d879917b416a305a42ab92a6d3ac2121d6830560
# Parent  aa25989b0658dcefbd4c1bce7c389f006f22af30
worker: ignore meaningless exit status indication returned by os.waitpid()

Before this patch, worker implementation assumes that os.waitpid()
with os.WNOHANG returns '(0, 0)' for still running child process. This
is explicitly specified as below in Python API document.

    os.WNOHANG
        The option for waitpid() to return immediately if no child
        process status is available immediately. The function returns
        (0, 0) in this case.

On the other hand, POSIX specification doesn't define the "stat_loc"
value returned by waitpid() with WNOHANG for such child process.

    http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html

CPython implementation for os.waitpid() on POSIX doesn't take any care
of this gap, and this may cause unexpected "exit status indication"
even on POSIX conformance platform.

For example, os.waitpid() with os.WNOHANG returns non-zero "exit
status indication" on FreeBSD. This implies os.kill() with own pid or
sys.exit() with non-zero exit code, even if no child process fails.

To ignore meaningless exit status indication returned by os.waitpid()
with os.WNOHANG, this patch skips assignment to problem[0], if
non-blocking os.waitpid() returns 0 as pid.

This patch also adds "assert not st", to detect this kind of breakage
immediately. In this unexpected case, blocking os.waitpid() with
explicit pid returns '(0, non-zero)' without any exception.
Jun Wu - Feb. 24, 2017, 1:32 a.m.
Looks great to me. Thanks for examining it carefully!

I can confirm it from FreeBSD kernel code [1]:

  sys_wait4  -> kern_wait | sys_wait6
    -> kern_wait6
         -> proc_to_reap # write status

[1]: https://github.com/freebsd/freebsd/blob/127162a8d9ad171b0f45cb8553384d013722a495/sys/kern/kern_exit.c


Excerpts from FUJIWARA Katsunori's message of 2017-02-24 06:05:29 +0900:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1487883654 -32400
> #      Fri Feb 24 06:00:54 2017 +0900
> # Branch stable
> # Node ID d879917b416a305a42ab92a6d3ac2121d6830560
> # Parent  aa25989b0658dcefbd4c1bce7c389f006f22af30
> worker: ignore meaningless exit status indication returned by os.waitpid()
> 
> Before this patch, worker implementation assumes that os.waitpid()
> with os.WNOHANG returns '(0, 0)' for still running child process. This
> is explicitly specified as below in Python API document.
> 
>     os.WNOHANG
>         The option for waitpid() to return immediately if no child
>         process status is available immediately. The function returns
>         (0, 0) in this case.
> 
> On the other hand, POSIX specification doesn't define the "stat_loc"
> value returned by waitpid() with WNOHANG for such child process.
> 
>     http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html 
> 
> CPython implementation for os.waitpid() on POSIX doesn't take any care
> of this gap, and this may cause unexpected "exit status indication"
> even on POSIX conformance platform.
> 
> For example, os.waitpid() with os.WNOHANG returns non-zero "exit
> status indication" on FreeBSD. This implies os.kill() with own pid or
> sys.exit() with non-zero exit code, even if no child process fails.
> 
> To ignore meaningless exit status indication returned by os.waitpid()
> with os.WNOHANG, this patch skips assignment to problem[0], if
> non-blocking os.waitpid() returns 0 as pid.
> 
> This patch also adds "assert not st", to detect this kind of breakage
> immediately. In this unexpected case, blocking os.waitpid() with
> explicit pid returns '(0, non-zero)' without any exception.
> 
> diff --git a/mercurial/worker.py b/mercurial/worker.py
> --- a/mercurial/worker.py
> +++ b/mercurial/worker.py
> @@ -123,6 +123,15 @@ def _posixworker(ui, func, staticargs, a
>              if p:
>                  pids.discard(p)
>                  st = _exitstatus(st)
> +            elif not blocking:
> +                # ignore st in this case, because it might be non-zero
> +                # on some platforms, even though it should be zero
> +                # according to Python document
> +                #
> +                # See also https://bugs.python.org/issue27808 
> +                continue
> +            else:
> +                assert not st
>              if st and not problem[0]:
>                  problem[0] = st
>      def sigchldhandler(signum, frame):
Yuya Nishihara - Feb. 24, 2017, 1:40 p.m.
On Fri, 24 Feb 2017 06:05:29 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1487883654 -32400
> #      Fri Feb 24 06:00:54 2017 +0900
> # Branch stable
> # Node ID d879917b416a305a42ab92a6d3ac2121d6830560
> # Parent  aa25989b0658dcefbd4c1bce7c389f006f22af30
> worker: ignore meaningless exit status indication returned by os.waitpid()
> 
> Before this patch, worker implementation assumes that os.waitpid()
> with os.WNOHANG returns '(0, 0)' for still running child process. This
> is explicitly specified as below in Python API document.
> 
>     os.WNOHANG
>         The option for waitpid() to return immediately if no child
>         process status is available immediately. The function returns
>         (0, 0) in this case.
> 
> On the other hand, POSIX specification doesn't define the "stat_loc"
> value returned by waitpid() with WNOHANG for such child process.
> 
>     http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html
> 
> CPython implementation for os.waitpid() on POSIX doesn't take any care
> of this gap, and this may cause unexpected "exit status indication"
> even on POSIX conformance platform.
> 
> For example, os.waitpid() with os.WNOHANG returns non-zero "exit
> status indication" on FreeBSD. This implies os.kill() with own pid or
> sys.exit() with non-zero exit code, even if no child process fails.

Thanks for the detailed explanation. This looks good except for the assertion.

> This patch also adds "assert not st", to detect this kind of breakage
> immediately. In this unexpected case, blocking os.waitpid() with
> explicit pid returns '(0, non-zero)' without any exception.
> 
> diff --git a/mercurial/worker.py b/mercurial/worker.py
> --- a/mercurial/worker.py
> +++ b/mercurial/worker.py
> @@ -123,6 +123,15 @@ def _posixworker(ui, func, staticargs, a
>              if p:
>                  pids.discard(p)
>                  st = _exitstatus(st)
> +            elif not blocking:
> +                # ignore st in this case, because it might be non-zero
> +                # on some platforms, even though it should be zero
> +                # according to Python document
> +                #
> +                # See also https://bugs.python.org/issue27808
> +                continue
> +            else:
> +                assert not st
>              if st and not problem[0]:
>                  problem[0] = st

This assertion is confusing. You said "blocking os.waitpid() returns
'(0, non-zero)'", where p == 0 should be invalid. But the code says st should
be zero (since we've tested p == 0 beforehand and we do nothing if st is zero.)

Moreover, we can take waitpid(pid, whatever) a function that may return
0 if child isn't finished yet. It makes zero sense to test the 'st' value if
p == 0. So I don't think there's a practical benefit to handle blocking and
non-blocking cases differently.
Katsunori FUJIWARA - Feb. 24, 2017, 2:52 p.m.
At Fri, 24 Feb 2017 22:40:14 +0900,
Yuya Nishihara wrote:
> 
> On Fri, 24 Feb 2017 06:05:29 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1487883654 -32400
> > #      Fri Feb 24 06:00:54 2017 +0900
> > # Branch stable
> > # Node ID d879917b416a305a42ab92a6d3ac2121d6830560
> > # Parent  aa25989b0658dcefbd4c1bce7c389f006f22af30
> > worker: ignore meaningless exit status indication returned by os.waitpid()
> > 
> > Before this patch, worker implementation assumes that os.waitpid()
> > with os.WNOHANG returns '(0, 0)' for still running child process. This
> > is explicitly specified as below in Python API document.
> > 
> >     os.WNOHANG
> >         The option for waitpid() to return immediately if no child
> >         process status is available immediately. The function returns
> >         (0, 0) in this case.
> > 
> > On the other hand, POSIX specification doesn't define the "stat_loc"
> > value returned by waitpid() with WNOHANG for such child process.
> > 
> >     http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html
> > 
> > CPython implementation for os.waitpid() on POSIX doesn't take any care
> > of this gap, and this may cause unexpected "exit status indication"
> > even on POSIX conformance platform.
> > 
> > For example, os.waitpid() with os.WNOHANG returns non-zero "exit
> > status indication" on FreeBSD. This implies os.kill() with own pid or
> > sys.exit() with non-zero exit code, even if no child process fails.
> 
> Thanks for the detailed explanation. This looks good except for the assertion.
> 
> > This patch also adds "assert not st", to detect this kind of breakage
> > immediately. In this unexpected case, blocking os.waitpid() with
> > explicit pid returns '(0, non-zero)' without any exception.
> > 
> > diff --git a/mercurial/worker.py b/mercurial/worker.py
> > --- a/mercurial/worker.py
> > +++ b/mercurial/worker.py
> > @@ -123,6 +123,15 @@ def _posixworker(ui, func, staticargs, a
> >              if p:
> >                  pids.discard(p)
> >                  st = _exitstatus(st)
> > +            elif not blocking:
> > +                # ignore st in this case, because it might be non-zero
> > +                # on some platforms, even though it should be zero
> > +                # according to Python document
> > +                #
> > +                # See also https://bugs.python.org/issue27808
> > +                continue
> > +            else:
> > +                assert not st
> >              if st and not problem[0]:
> >                  problem[0] = st
> 
> This assertion is confusing. You said "blocking os.waitpid() returns
> '(0, non-zero)'", where p == 0 should be invalid. But the code says st should
> be zero (since we've tested p == 0 beforehand and we do nothing if st is zero.)
> 
> Moreover, we can take waitpid(pid, whatever) a function that may return
> 0 if child isn't finished yet. It makes zero sense to test the 'st' value if
> p == 0. So I don't think there's a practical benefit to handle blocking and
> non-blocking cases differently.

OK, I don't have strong opinion to add this assertion.

I'll post revised one.

Patch

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -123,6 +123,15 @@  def _posixworker(ui, func, staticargs, a
             if p:
                 pids.discard(p)
                 st = _exitstatus(st)
+            elif not blocking:
+                # ignore st in this case, because it might be non-zero
+                # on some platforms, even though it should be zero
+                # according to Python document
+                #
+                # See also https://bugs.python.org/issue27808
+                continue
+            else:
+                assert not st
             if st and not problem[0]:
                 problem[0] = st
     def sigchldhandler(signum, frame):