Submitter | Katsunori FUJIWARA |
---|---|
Date | Feb. 23, 2017, 9:05 p.m. |
Message ID | <d879917b416a305a42ab.1487883929@juju> |
Download | mbox | patch |
Permalink | /patch/18752/ |
State | Changes Requested |
Headers | show |
Comments
Looks great to me. Thanks for examining it carefully! I can confirm it from FreeBSD kernel code [1]: sys_wait4 -> kern_wait | sys_wait6 -> kern_wait6 -> proc_to_reap # write status [1]: https://github.com/freebsd/freebsd/blob/127162a8d9ad171b0f45cb8553384d013722a495/sys/kern/kern_exit.c Excerpts from FUJIWARA Katsunori's message of 2017-02-24 06:05:29 +0900: > # HG changeset patch > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp> > # Date 1487883654 -32400 > # Fri Feb 24 06:00:54 2017 +0900 > # Branch stable > # Node ID d879917b416a305a42ab92a6d3ac2121d6830560 > # Parent aa25989b0658dcefbd4c1bce7c389f006f22af30 > worker: ignore meaningless exit status indication returned by os.waitpid() > > Before this patch, worker implementation assumes that os.waitpid() > with os.WNOHANG returns '(0, 0)' for still running child process. This > is explicitly specified as below in Python API document. > > os.WNOHANG > The option for waitpid() to return immediately if no child > process status is available immediately. The function returns > (0, 0) in this case. > > On the other hand, POSIX specification doesn't define the "stat_loc" > value returned by waitpid() with WNOHANG for such child process. > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html > > CPython implementation for os.waitpid() on POSIX doesn't take any care > of this gap, and this may cause unexpected "exit status indication" > even on POSIX conformance platform. > > For example, os.waitpid() with os.WNOHANG returns non-zero "exit > status indication" on FreeBSD. This implies os.kill() with own pid or > sys.exit() with non-zero exit code, even if no child process fails. > > To ignore meaningless exit status indication returned by os.waitpid() > with os.WNOHANG, this patch skips assignment to problem[0], if > non-blocking os.waitpid() returns 0 as pid. > > This patch also adds "assert not st", to detect this kind of breakage > immediately. In this unexpected case, blocking os.waitpid() with > explicit pid returns '(0, non-zero)' without any exception. > > diff --git a/mercurial/worker.py b/mercurial/worker.py > --- a/mercurial/worker.py > +++ b/mercurial/worker.py > @@ -123,6 +123,15 @@ def _posixworker(ui, func, staticargs, a > if p: > pids.discard(p) > st = _exitstatus(st) > + elif not blocking: > + # ignore st in this case, because it might be non-zero > + # on some platforms, even though it should be zero > + # according to Python document > + # > + # See also https://bugs.python.org/issue27808 > + continue > + else: > + assert not st > if st and not problem[0]: > problem[0] = st > def sigchldhandler(signum, frame):
On Fri, 24 Feb 2017 06:05:29 +0900, FUJIWARA Katsunori wrote: > # HG changeset patch > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp> > # Date 1487883654 -32400 > # Fri Feb 24 06:00:54 2017 +0900 > # Branch stable > # Node ID d879917b416a305a42ab92a6d3ac2121d6830560 > # Parent aa25989b0658dcefbd4c1bce7c389f006f22af30 > worker: ignore meaningless exit status indication returned by os.waitpid() > > Before this patch, worker implementation assumes that os.waitpid() > with os.WNOHANG returns '(0, 0)' for still running child process. This > is explicitly specified as below in Python API document. > > os.WNOHANG > The option for waitpid() to return immediately if no child > process status is available immediately. The function returns > (0, 0) in this case. > > On the other hand, POSIX specification doesn't define the "stat_loc" > value returned by waitpid() with WNOHANG for such child process. > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html > > CPython implementation for os.waitpid() on POSIX doesn't take any care > of this gap, and this may cause unexpected "exit status indication" > even on POSIX conformance platform. > > For example, os.waitpid() with os.WNOHANG returns non-zero "exit > status indication" on FreeBSD. This implies os.kill() with own pid or > sys.exit() with non-zero exit code, even if no child process fails. Thanks for the detailed explanation. This looks good except for the assertion. > This patch also adds "assert not st", to detect this kind of breakage > immediately. In this unexpected case, blocking os.waitpid() with > explicit pid returns '(0, non-zero)' without any exception. > > diff --git a/mercurial/worker.py b/mercurial/worker.py > --- a/mercurial/worker.py > +++ b/mercurial/worker.py > @@ -123,6 +123,15 @@ def _posixworker(ui, func, staticargs, a > if p: > pids.discard(p) > st = _exitstatus(st) > + elif not blocking: > + # ignore st in this case, because it might be non-zero > + # on some platforms, even though it should be zero > + # according to Python document > + # > + # See also https://bugs.python.org/issue27808 > + continue > + else: > + assert not st > if st and not problem[0]: > problem[0] = st This assertion is confusing. You said "blocking os.waitpid() returns '(0, non-zero)'", where p == 0 should be invalid. But the code says st should be zero (since we've tested p == 0 beforehand and we do nothing if st is zero.) Moreover, we can take waitpid(pid, whatever) a function that may return 0 if child isn't finished yet. It makes zero sense to test the 'st' value if p == 0. So I don't think there's a practical benefit to handle blocking and non-blocking cases differently.
At Fri, 24 Feb 2017 22:40:14 +0900, Yuya Nishihara wrote: > > On Fri, 24 Feb 2017 06:05:29 +0900, FUJIWARA Katsunori wrote: > > # HG changeset patch > > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp> > > # Date 1487883654 -32400 > > # Fri Feb 24 06:00:54 2017 +0900 > > # Branch stable > > # Node ID d879917b416a305a42ab92a6d3ac2121d6830560 > > # Parent aa25989b0658dcefbd4c1bce7c389f006f22af30 > > worker: ignore meaningless exit status indication returned by os.waitpid() > > > > Before this patch, worker implementation assumes that os.waitpid() > > with os.WNOHANG returns '(0, 0)' for still running child process. This > > is explicitly specified as below in Python API document. > > > > os.WNOHANG > > The option for waitpid() to return immediately if no child > > process status is available immediately. The function returns > > (0, 0) in this case. > > > > On the other hand, POSIX specification doesn't define the "stat_loc" > > value returned by waitpid() with WNOHANG for such child process. > > > > http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitpid.html > > > > CPython implementation for os.waitpid() on POSIX doesn't take any care > > of this gap, and this may cause unexpected "exit status indication" > > even on POSIX conformance platform. > > > > For example, os.waitpid() with os.WNOHANG returns non-zero "exit > > status indication" on FreeBSD. This implies os.kill() with own pid or > > sys.exit() with non-zero exit code, even if no child process fails. > > Thanks for the detailed explanation. This looks good except for the assertion. > > > This patch also adds "assert not st", to detect this kind of breakage > > immediately. In this unexpected case, blocking os.waitpid() with > > explicit pid returns '(0, non-zero)' without any exception. > > > > diff --git a/mercurial/worker.py b/mercurial/worker.py > > --- a/mercurial/worker.py > > +++ b/mercurial/worker.py > > @@ -123,6 +123,15 @@ def _posixworker(ui, func, staticargs, a > > if p: > > pids.discard(p) > > st = _exitstatus(st) > > + elif not blocking: > > + # ignore st in this case, because it might be non-zero > > + # on some platforms, even though it should be zero > > + # according to Python document > > + # > > + # See also https://bugs.python.org/issue27808 > > + continue > > + else: > > + assert not st > > if st and not problem[0]: > > problem[0] = st > > This assertion is confusing. You said "blocking os.waitpid() returns > '(0, non-zero)'", where p == 0 should be invalid. But the code says st should > be zero (since we've tested p == 0 beforehand and we do nothing if st is zero.) > > Moreover, we can take waitpid(pid, whatever) a function that may return > 0 if child isn't finished yet. It makes zero sense to test the 'st' value if > p == 0. So I don't think there's a practical benefit to handle blocking and > non-blocking cases differently. OK, I don't have strong opinion to add this assertion. I'll post revised one.
Patch
diff --git a/mercurial/worker.py b/mercurial/worker.py --- a/mercurial/worker.py +++ b/mercurial/worker.py @@ -123,6 +123,15 @@ def _posixworker(ui, func, staticargs, a if p: pids.discard(p) st = _exitstatus(st) + elif not blocking: + # ignore st in this case, because it might be non-zero + # on some platforms, even though it should be zero + # according to Python document + # + # See also https://bugs.python.org/issue27808 + continue + else: + assert not st if st and not problem[0]: problem[0] = st def sigchldhandler(signum, frame):