Patchwork [1,of,2] win32: spawndetached returns pid of detached process and not of cmd.exe

login
register
mail settings
Submitter Simon Heimberg
Date Feb. 10, 2014, 2:37 p.m.
Message ID <1277051454b64dfd5b34.1392043042@lapsi.heimberg.home>
Download mbox | patch
Permalink /patch/3540/
State Accepted
Commit ca6aa8362f33dddfe74efb03107b99f5947e21f6
Headers show

Comments

Simon Heimberg - Feb. 10, 2014, 2:37 p.m.
# HG changeset patch
# User Simon Heimberg <simohe@besonet.ch>
# Date 1391866507 -3600
#      Sat Feb 08 14:35:07 2014 +0100
# Node ID 1277051454b64dfd5b34d6fd8588f448be2ff47f
# Parent  12df27e151f1d72a180ab1e40222a9d88110e096
win32: spawndetached returns pid of detached process and not of cmd.exe

win32.spawndetached starts the detached process by `cmd.exe` (or COMSPEC). The
pid it returned was the one of cmd.exe and not the one of the detached process.
When this pid is used to kill the process, the detached process is not killed,
but only cmd.exe.

With this patch the pid of the detached process is written to the pid file.
Killing the process works as expected.

The pid is only evaluated on writing the pid file. It is unnecessary to search
the pid when it is not needed. And more important, it probably does not yet
exist right after the cmd.exe process was started. When the pid is written to
the file, waiting for the start of the detached process has already happened.
Use this functionality instead of writing a 2nd wait function.

Many tests on windows will not fail anymore, all those with the first failing
line "abort: child process failed to start". (The processes still hanging
around from previous test runs have to be killed first. They still block a
tcp port.)
A good test for the functionality of this patch is test-treediscovery.t,
because it starts and kills `hg serve -d` several times.
Mads Kiilerich - Feb. 10, 2014, 3:57 p.m.
On 02/10/2014 03:37 PM, Simon Heimberg wrote:
> # HG changeset patch
> # User Simon Heimberg <simohe@besonet.ch>
> # Date 1391866507 -3600
> #      Sat Feb 08 14:35:07 2014 +0100
> # Node ID 1277051454b64dfd5b34d6fd8588f448be2ff47f
> # Parent  12df27e151f1d72a180ab1e40222a9d88110e096
> win32: spawndetached returns pid of detached process and not of cmd.exe
>
> win32.spawndetached starts the detached process by `cmd.exe` (or COMSPEC). The
> pid it returned was the one of cmd.exe and not the one of the detached process.
> When this pid is used to kill the process, the detached process is not killed,
> but only cmd.exe.
>
> With this patch the pid of the detached process is written to the pid file.
> Killing the process works as expected.
>
> The pid is only evaluated on writing the pid file. It is unnecessary to search
> the pid when it is not needed. And more important, it probably does not yet
> exist right after the cmd.exe process was started. When the pid is written to
> the file, waiting for the start of the detached process has already happened.
> Use this functionality instead of writing a 2nd wait function.
>
> Many tests on windows will not fail anymore, all those with the first failing
> line "abort: child process failed to start". (The processes still hanging
> around from previous test runs have to be killed first. They still block a
> tcp port.)
> A good test for the functionality of this patch is test-treediscovery.t,
> because it starts and kills `hg serve -d` several times.

Assuming the analysis is correct and it actually works: Very nice! The 
windows process ids always seemed unreliable and unusable to me (and 
Patrick, IIRC). It sounds like it was very non-trivial but you nailed it.

With PID writing working reliable on windows, the main showstopper for 
_really_ supporting daemon mode on windows is to have some reliable 
'kill' command. I think it would make sense to have a cross platform 'hg 
serve --pid-file X --kill' option. That could also be used for a cross 
platform test: launch a daemon on $HGPORT, kill it, and verify that 
another one can be started on the same port.

(I agree that the ps test can be problematic. ps comes in completely 
different variants and for instance OS X do not have --no-heading.)

/Mads

> diff -r 12df27e151f1 -r 1277051454b6 mercurial/win32.py
> --- a/mercurial/win32.py	Sat Feb 08 17:48:57 2014 +0100
> +++ b/mercurial/win32.py	Sat Feb 08 14:35:07 2014 +0100
> @@ -119,6 +119,27 @@
>   
>   _STD_ERROR_HANDLE = _DWORD(-12).value
>   
> +# CreateToolhelp32Snapshot, Process32First, Process32Next
> +_TH32CS_SNAPPROCESS = 0x00000002
> +_MAX_PATH = 260
> +
> +class _tagPROCESSENTRY32(ctypes.Structure):
> +    _fields_ = [('dwsize', _DWORD),
> +                ('cntUsage', _DWORD),
> +                ('th32ProcessID', _DWORD),
> +                ('th32DefaultHeapID', ctypes.c_void_p),
> +                ('th32ModuleID', _DWORD),
> +                ('cntThreads', _DWORD),
> +                ('th32ParentProcessID', _DWORD),
> +                ('pcPriClassBase', _LONG),
> +                ('dwFlags', _DWORD),
> +                ('szExeFile', ctypes.c_char * _MAX_PATH)]
> +
> +    def __init__(self):
> +        super(_tagPROCESSENTRY32, self).__init__()
> +        self.dwsize = ctypes.sizeof(self)
> +
> +
>   # types of parameters of C functions used (required by pypy)
>   
>   _kernel32.CreateFileA.argtypes = [_LPCSTR, _DWORD, _DWORD, ctypes.c_void_p,
> @@ -186,6 +207,15 @@
>   _user32.EnumWindows.argtypes = [_WNDENUMPROC, _LPARAM]
>   _user32.EnumWindows.restype = _BOOL
>   
> +_kernel32.CreateToolhelp32Snapshot.argtypes = [_DWORD, _DWORD]
> +_kernel32.CreateToolhelp32Snapshot.restype = _BOOL
> +
> +_kernel32.Process32First.argtypes = [_HANDLE, ctypes.c_void_p]
> +_kernel32.Process32First.restype = _BOOL
> +
> +_kernel32.Process32Next.argtypes = [_HANDLE, ctypes.c_void_p]
> +_kernel32.Process32Next.restype = _BOOL
> +
>   def _raiseoserror(name):
>       err = ctypes.WinError()
>       raise OSError(err.errno, '%s: %s' % (name, err.strerror))
> @@ -309,6 +339,50 @@
>       width = csbi.srWindow.Right - csbi.srWindow.Left
>       return width
>   
> +def _1stchild(pid):
> +    '''return the 1st found child of the given pid
> +
> +    None is returned when no child is found'''
> +    pe = _tagPROCESSENTRY32()
> +
> +    # create handle to list all processes
> +    ph = _kernel32.CreateToolhelp32Snapshot(_TH32CS_SNAPPROCESS, 0)
> +    if ph == _INVALID_HANDLE_VALUE:
> +        raise ctypes.WinError
> +    try:
> +        r = _kernel32.Process32First(ph, ctypes.byref(pe))
> +        # loop over all processes
> +        while r:
> +            if pe.th32ParentProcessID == pid:
> +                # return first child found
> +                return pe.th32ProcessID
> +            r = _kernel32.Process32Next(ph, ctypes.byref(pe))
> +    finally:
> +        _kernel32.CloseHandle(ph)
> +    if _kernel32.GetLastError() != _ERROR_NO_MORE_FILES:
> +        raise ctypes.WinError
> +    return None # no child found
> +
> +class _tochildpid(int): # pid is _DWORD, which always matches in an int
> +    '''helper for spawndetached, returns the child pid on conversion to string
> +
> +    Does not resolve the child pid immediately because the child may not yet be
> +    started.
> +    '''
> +    def childpid(self):
> +        'returns the child pid of the first found child of the process with this pid'
> +        return _1stchild(self)
> +    def __str__(self):
> +        # run when the pid is written to the file
> +        ppid = self.childpid()
> +        if ppid is None:
> +            # race, child has exited since check
> +            # fall back to this pid. Its process will also have disappered,
> +            # raising the same error type later as when the child pid would
> +            # be returned.
> +            return " %d" % self
> +        return str(ppid)
> +
>   def spawndetached(args):
>       # No standard library function really spawns a fully detached
>       # process under win32 because they allocate pipes or other objects
> @@ -339,7 +413,8 @@
>       if not res:
>           raise ctypes.WinError
>   
> -    return pi.dwProcessId
> +    # _tochildpid because the process is the child of COMSPEC
> +    return _tochildpid(pi.dwProcessId)
>   
>   def unlink(f):
>       '''try to implement POSIX' unlink semantics on Windows'''
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel
Simon Heimberg - Feb. 10, 2014, 5:02 p.m.
Am 10.02.2014 16:57, schrieb Mads Kiilerich:
> On 02/10/2014 03:37 PM, Simon Heimberg wrote:
>> # HG changeset patch
>> # User Simon Heimberg <simohe@besonet.ch>
>> # Date 1391866507 -3600
>> #      Sat Feb 08 14:35:07 2014 +0100
>> # Node ID 1277051454b64dfd5b34d6fd8588f448be2ff47f
>> # Parent  12df27e151f1d72a180ab1e40222a9d88110e096
>> win32: spawndetached returns pid of detached process and not of cmd.exe
>>
>> win32.spawndetached starts the detached process by `cmd.exe` (or 
>> COMSPEC). The
>> pid it returned was the one of cmd.exe and not the one of the 
>> detached process.
>> When this pid is used to kill the process, the detached process is 
>> not killed,
>> but only cmd.exe.
>>
>> With this patch the pid of the detached process is written to the pid 
>> file.
>> Killing the process works as expected.
>>
>> The pid is only evaluated on writing the pid file. It is unnecessary 
>> to search
>> the pid when it is not needed. And more important, it probably does 
>> not yet
>> exist right after the cmd.exe process was started. When the pid is 
>> written to
>> the file, waiting for the start of the detached process has already 
>> happened.
>> Use this functionality instead of writing a 2nd wait function.
>>
>> Many tests on windows will not fail anymore, all those with the first 
>> failing
>> line "abort: child process failed to start". (The processes still 
>> hanging
>> around from previous test runs have to be killed first. They still 
>> block a
>> tcp port.)
>> A good test for the functionality of this patch is test-treediscovery.t,
>> because it starts and kills `hg serve -d` several times.
>
> Assuming the analysis is correct and it actually works: Very nice! The 
> windows process ids always seemed unreliable and unusable to me (and 
> Patrick, IIRC). It sounds like it was very non-trivial but you nailed it.
>
> With PID writing working reliable on windows, the main showstopper for 
> _really_ supporting daemon mode on windows is to have some reliable 
> 'kill' command. I think it would make sense to have a cross platform 
> 'hg serve --pid-file X --kill' option. That could also be used for a 
> cross platform test: launch a daemon on $HGPORT, kill it, and verify 
> that another one can be started on the same port.
Is this not is what test-treediscovery.t does? It starts `hg serve -d 
--pid-file hg.pid`, does some test, kills it, starts it again at the 
same port, ... It uses tests/killdaemons.py for killing, which seems to 
work reliable also on windows (as long as the right pid is provided).
I would recommend to the user to use the kill command that the system 
provides. (`kill PID` on linux, `taskkill /pid PID` or the task manager 
on windows, ...) Or did I misunderstand the use of `hg serve --kill 
--pid-file X`?

>
> (I agree that the ps test can be problematic. ps comes in completely 
> different variants and for instance OS X do not have --no-heading.)
>
> /Mads
>
>> diff -r 12df27e151f1 -r 1277051454b6 mercurial/win32.py
>> --- a/mercurial/win32.py    Sat Feb 08 17:48:57 2014 +0100
>> +++ b/mercurial/win32.py    Sat Feb 08 14:35:07 2014 +0100
>> @@ -119,6 +119,27 @@
>>     _STD_ERROR_HANDLE = _DWORD(-12).value
>>   +# CreateToolhelp32Snapshot, Process32First, Process32Next
>> +_TH32CS_SNAPPROCESS = 0x00000002
>> +_MAX_PATH = 260
>> +
>> +class _tagPROCESSENTRY32(ctypes.Structure):
>> +    _fields_ = [('dwsize', _DWORD),
>> +                ('cntUsage', _DWORD),
>> +                ('th32ProcessID', _DWORD),
>> +                ('th32DefaultHeapID', ctypes.c_void_p),
>> +                ('th32ModuleID', _DWORD),
>> +                ('cntThreads', _DWORD),
>> +                ('th32ParentProcessID', _DWORD),
>> +                ('pcPriClassBase', _LONG),
>> +                ('dwFlags', _DWORD),
>> +                ('szExeFile', ctypes.c_char * _MAX_PATH)]
>> +
>> +    def __init__(self):
>> +        super(_tagPROCESSENTRY32, self).__init__()
>> +        self.dwsize = ctypes.sizeof(self)
>> +
>> +
>>   # types of parameters of C functions used (required by pypy)
>>     _kernel32.CreateFileA.argtypes = [_LPCSTR, _DWORD, _DWORD, 
>> ctypes.c_void_p,
>> @@ -186,6 +207,15 @@
>>   _user32.EnumWindows.argtypes = [_WNDENUMPROC, _LPARAM]
>>   _user32.EnumWindows.restype = _BOOL
>>   +_kernel32.CreateToolhelp32Snapshot.argtypes = [_DWORD, _DWORD]
>> +_kernel32.CreateToolhelp32Snapshot.restype = _BOOL
>> +
>> +_kernel32.Process32First.argtypes = [_HANDLE, ctypes.c_void_p]
>> +_kernel32.Process32First.restype = _BOOL
>> +
>> +_kernel32.Process32Next.argtypes = [_HANDLE, ctypes.c_void_p]
>> +_kernel32.Process32Next.restype = _BOOL
>> +
>>   def _raiseoserror(name):
>>       err = ctypes.WinError()
>>       raise OSError(err.errno, '%s: %s' % (name, err.strerror))
>> @@ -309,6 +339,50 @@
>>       width = csbi.srWindow.Right - csbi.srWindow.Left
>>       return width
>>   +def _1stchild(pid):
>> +    '''return the 1st found child of the given pid
>> +
>> +    None is returned when no child is found'''
>> +    pe = _tagPROCESSENTRY32()
>> +
>> +    # create handle to list all processes
>> +    ph = _kernel32.CreateToolhelp32Snapshot(_TH32CS_SNAPPROCESS, 0)
>> +    if ph == _INVALID_HANDLE_VALUE:
>> +        raise ctypes.WinError
>> +    try:
>> +        r = _kernel32.Process32First(ph, ctypes.byref(pe))
>> +        # loop over all processes
>> +        while r:
>> +            if pe.th32ParentProcessID == pid:
>> +                # return first child found
>> +                return pe.th32ProcessID
>> +            r = _kernel32.Process32Next(ph, ctypes.byref(pe))
>> +    finally:
>> +        _kernel32.CloseHandle(ph)
>> +    if _kernel32.GetLastError() != _ERROR_NO_MORE_FILES:
>> +        raise ctypes.WinError
>> +    return None # no child found
>> +
>> +class _tochildpid(int): # pid is _DWORD, which always matches in an int
>> +    '''helper for spawndetached, returns the child pid on conversion 
>> to string
>> +
>> +    Does not resolve the child pid immediately because the child may 
>> not yet be
>> +    started.
>> +    '''
>> +    def childpid(self):
>> +        'returns the child pid of the first found child of the 
>> process with this pid'
>> +        return _1stchild(self)
>> +    def __str__(self):
>> +        # run when the pid is written to the file
>> +        ppid = self.childpid()
>> +        if ppid is None:
>> +            # race, child has exited since check
>> +            # fall back to this pid. Its process will also have 
>> disappered,
>> +            # raising the same error type later as when the child 
>> pid would
>> +            # be returned.
>> +            return " %d" % self
>> +        return str(ppid)
>> +
>>   def spawndetached(args):
>>       # No standard library function really spawns a fully detached
>>       # process under win32 because they allocate pipes or other objects
>> @@ -339,7 +413,8 @@
>>       if not res:
>>           raise ctypes.WinError
>>   -    return pi.dwProcessId
>> +    # _tochildpid because the process is the child of COMSPEC
>> +    return _tochildpid(pi.dwProcessId)
>>     def unlink(f):
>>       '''try to implement POSIX' unlink semantics on Windows'''
>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@selenic.com
>> http://selenic.com/mailman/listinfo/mercurial-devel
Mads Kiilerich - Feb. 10, 2014, 7:08 p.m.
On 02/10/2014 06:02 PM, Simon Heimberg wrote:
> I would recommend to the user to use the kill command that the system 
> provides. (`kill PID` on linux, `taskkill /pid PID` or the task 
> manager on windows, ...)

Sure - if taskkill works and is reliable now. I guess few users are 
aware of taskkill. It should probably be mentioned in hg help serve - 
together with kill for unix-ish systems.

Still, it would also be nice to have a cross platform way to terminate 
the daemon we started. Something that not only sent the signal, but also 
waited for the process to actually terminate...

/Mads
Simon Heimberg - Feb. 11, 2014, 8:39 a.m.
--On Montag, 10. Februar 2014 16:57 +0100 Mads Kiilerich 
<mads@kiilerich.com> wrote:
> On 02/10/2014 03:37 PM, Simon Heimberg wrote:
>> # HG changeset patch
>> # User Simon Heimberg <simohe@besonet.ch>
>> # Date 1391866507 -3600
>> #      Sat Feb 08 14:35:07 2014 +0100
>> # Node ID 1277051454b64dfd5b34d6fd8588f448be2ff47f
>> # Parent  12df27e151f1d72a180ab1e40222a9d88110e096
>> win32: spawndetached returns pid of detached process and not of cmd.exe
>>

<snip>

>
> (I agree that the ps test can be problematic. ps comes in completely
> different variants and for instance OS X do not have --no-heading.)

Indeed, it did break the test on OS X. I should probably have written an 
answer to [Patch 2 of 2] to prevent it from being applied. Or mention my 
concerns in the commit message itself?
I guess stripping the header with `grep -v PID` should work:
  $ ps -p `cat hg.pid` | grep -v PID

I will send a patch...

>
> /Mads
>
>> diff -r 12df27e151f1 -r 1277051454b6 mercurial/win32.py
>> --- a/mercurial/win32.py	Sat Feb 08 17:48:57 2014 +0100
>> +++ b/mercurial/win32.py	Sat Feb 08 14:35:07 2014 +0100
<snip>
Yuya Nishihara - Feb. 11, 2014, 1:09 p.m.
On Mon, 10 Feb 2014 20:08:01 +0100, Mads Kiilerich wrote:
> On 02/10/2014 06:02 PM, Simon Heimberg wrote:
> > I would recommend to the user to use the kill command that the system 
> > provides. (`kill PID` on linux, `taskkill /pid PID` or the task 
> > manager on windows, ...)
> 
> Sure - if taskkill works and is reliable now. I guess few users are 
> aware of taskkill. It should probably be mentioned in hg help serve - 
> together with kill for unix-ish systems.
> 
> Still, it would also be nice to have a cross platform way to terminate 
> the daemon we started. Something that not only sent the signal, but also 
> waited for the process to actually terminate...

IIRC, Windows has equivalent for `kill -KILL`, but not for `kill -SIGINT`.
`GenerateConsoleCtrlEvent(CTRL_C_EVENT)` only works if receiver processes
shares the same console.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms683155(v=vs.85).aspx

It seems `taskkill` sends WM_CLOSE message if `/F` option isn't specified,
but hg has no message loop.

Regards,

Patch

diff -r 12df27e151f1 -r 1277051454b6 mercurial/win32.py
--- a/mercurial/win32.py	Sat Feb 08 17:48:57 2014 +0100
+++ b/mercurial/win32.py	Sat Feb 08 14:35:07 2014 +0100
@@ -119,6 +119,27 @@ 
 
 _STD_ERROR_HANDLE = _DWORD(-12).value
 
+# CreateToolhelp32Snapshot, Process32First, Process32Next
+_TH32CS_SNAPPROCESS = 0x00000002
+_MAX_PATH = 260
+
+class _tagPROCESSENTRY32(ctypes.Structure):
+    _fields_ = [('dwsize', _DWORD),
+                ('cntUsage', _DWORD),
+                ('th32ProcessID', _DWORD),
+                ('th32DefaultHeapID', ctypes.c_void_p),
+                ('th32ModuleID', _DWORD),
+                ('cntThreads', _DWORD),
+                ('th32ParentProcessID', _DWORD),
+                ('pcPriClassBase', _LONG),
+                ('dwFlags', _DWORD),
+                ('szExeFile', ctypes.c_char * _MAX_PATH)]
+
+    def __init__(self):
+        super(_tagPROCESSENTRY32, self).__init__()
+        self.dwsize = ctypes.sizeof(self)
+
+
 # types of parameters of C functions used (required by pypy)
 
 _kernel32.CreateFileA.argtypes = [_LPCSTR, _DWORD, _DWORD, ctypes.c_void_p,
@@ -186,6 +207,15 @@ 
 _user32.EnumWindows.argtypes = [_WNDENUMPROC, _LPARAM]
 _user32.EnumWindows.restype = _BOOL
 
+_kernel32.CreateToolhelp32Snapshot.argtypes = [_DWORD, _DWORD]
+_kernel32.CreateToolhelp32Snapshot.restype = _BOOL
+
+_kernel32.Process32First.argtypes = [_HANDLE, ctypes.c_void_p]
+_kernel32.Process32First.restype = _BOOL
+
+_kernel32.Process32Next.argtypes = [_HANDLE, ctypes.c_void_p]
+_kernel32.Process32Next.restype = _BOOL
+
 def _raiseoserror(name):
     err = ctypes.WinError()
     raise OSError(err.errno, '%s: %s' % (name, err.strerror))
@@ -309,6 +339,50 @@ 
     width = csbi.srWindow.Right - csbi.srWindow.Left
     return width
 
+def _1stchild(pid):
+    '''return the 1st found child of the given pid
+
+    None is returned when no child is found'''
+    pe = _tagPROCESSENTRY32()
+
+    # create handle to list all processes
+    ph = _kernel32.CreateToolhelp32Snapshot(_TH32CS_SNAPPROCESS, 0)
+    if ph == _INVALID_HANDLE_VALUE:
+        raise ctypes.WinError
+    try:
+        r = _kernel32.Process32First(ph, ctypes.byref(pe))
+        # loop over all processes
+        while r:
+            if pe.th32ParentProcessID == pid:
+                # return first child found
+                return pe.th32ProcessID
+            r = _kernel32.Process32Next(ph, ctypes.byref(pe))
+    finally:
+        _kernel32.CloseHandle(ph)
+    if _kernel32.GetLastError() != _ERROR_NO_MORE_FILES:
+        raise ctypes.WinError
+    return None # no child found
+
+class _tochildpid(int): # pid is _DWORD, which always matches in an int
+    '''helper for spawndetached, returns the child pid on conversion to string
+
+    Does not resolve the child pid immediately because the child may not yet be
+    started.
+    '''
+    def childpid(self):
+        'returns the child pid of the first found child of the process with this pid'
+        return _1stchild(self)
+    def __str__(self):
+        # run when the pid is written to the file
+        ppid = self.childpid()
+        if ppid is None:
+            # race, child has exited since check
+            # fall back to this pid. Its process will also have disappered, 
+            # raising the same error type later as when the child pid would
+            # be returned.
+            return " %d" % self
+        return str(ppid)
+
 def spawndetached(args):
     # No standard library function really spawns a fully detached
     # process under win32 because they allocate pipes or other objects
@@ -339,7 +413,8 @@ 
     if not res:
         raise ctypes.WinError
 
-    return pi.dwProcessId
+    # _tochildpid because the process is the child of COMSPEC
+    return _tochildpid(pi.dwProcessId)
 
 def unlink(f):
     '''try to implement POSIX' unlink semantics on Windows'''