Patchwork [3,of,4] worker: kill workers after all zombie processes are reaped

login
register
mail settings
Submitter Yuya Nishihara
Date Nov. 17, 2016, 1:20 p.m.
Message ID <4be342413ddecd0eb6d7.1479388801@mimosa>
Download mbox | patch
Permalink /patch/17614/
State Accepted
Headers show

Comments

Yuya Nishihara - Nov. 17, 2016, 1:20 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1479384538 -32400
#      Thu Nov 17 21:08:58 2016 +0900
# Node ID 4be342413ddecd0eb6d76c4e5d8bb38fee28061d
# Parent  60c9574e29b43c6d9579b99249ba11ff19af77a0
worker: kill workers after all zombie processes are reaped

Since we now wait child processes in non-blocking way (changed by 7bc25549e084
and e8fb03cfbbde), we don't have to kill them in the middle of the waitpid()
loop. This change will help solving a possible race of waitpid()-pids.discard()
sequence and another SIGCHLD.

waitforworkers() is called by cleanup(), in which case we do killworkers()
beforehand so we can remove killworkers() from waitforworkers().
Jun Wu - Nov. 17, 2016, 4:51 p.m.
The 4 patches LGTM.

Upon closer investigation, I think this is a brilliant solution that solves
the redundant "os.kill" problem.

A minor issue is that "killworkers" becomes easier to be called multiple
times, there could be redundant "signal.signal" calls. But that does not
affect correctness and can be improved later.

Excerpts from Yuya Nishihara's message of 2016-11-17 22:20:01 +0900:
> # HG changeset patch
> # User Yuya Nishihara <yuya@tcha.org>
> # Date 1479384538 -32400
> #      Thu Nov 17 21:08:58 2016 +0900
> # Node ID 4be342413ddecd0eb6d76c4e5d8bb38fee28061d
> # Parent  60c9574e29b43c6d9579b99249ba11ff19af77a0
> worker: kill workers after all zombie processes are reaped
> 
> Since we now wait child processes in non-blocking way (changed by 7bc25549e084
> and e8fb03cfbbde), we don't have to kill them in the middle of the waitpid()
> loop. This change will help solving a possible race of waitpid()-pids.discard()
> sequence and another SIGCHLD.
> 
> waitforworkers() is called by cleanup(), in which case we do killworkers()
> beforehand so we can remove killworkers() from waitforworkers().
> 
> diff --git a/mercurial/worker.py b/mercurial/worker.py
> --- a/mercurial/worker.py
> +++ b/mercurial/worker.py
> @@ -119,9 +119,10 @@ def _posixworker(ui, func, staticargs, a
>                  st = _exitstatus(st)
>              if st and not problem[0]:
>                  problem[0] = st
> -                killworkers()
>      def sigchldhandler(signum, frame):
>          waitforworkers(blocking=False)
> +        if problem[0]:

With a traditional multi-thread mindset, I thought SIGCHLD happening here
could cause trouble, while it's actually okay.

> +            killworkers()
>      oldchldhandler = signal.signal(signal.SIGCHLD, sigchldhandler)
>      for pargs in partition(args, workers):
>          pid = os.fork()

Patch

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -119,9 +119,10 @@  def _posixworker(ui, func, staticargs, a
                 st = _exitstatus(st)
             if st and not problem[0]:
                 problem[0] = st
-                killworkers()
     def sigchldhandler(signum, frame):
         waitforworkers(blocking=False)
+        if problem[0]:
+            killworkers()
     oldchldhandler = signal.signal(signal.SIGCHLD, sigchldhandler)
     for pargs in partition(args, workers):
         pid = os.fork()