Patchwork Transient Windows test failures

login
register
mail settings
Submitter Matt Harbison
Date June 17, 2017, 2:02 a.m.
Message ID <op.y1yfyexa9lwrgf@envy>
Download mbox | patch
Permalink /patch/21439/
State Not Applicable
Headers show

Comments

Matt Harbison - June 17, 2017, 2:02 a.m.
On Fri, 16 Jun 2017 09:59:30 -0400, Augie Fackler <raf@durin42.com> wrote:

> On Fri, Jun 16, 2017 at 12:18:18AM -0400, Matt Harbison wrote:
>> So apparently, this is a symptom of not having %SystemRoot% in the
>> environment when calling CreateProcess().
>>
>> https://bugs.python.org/issue13524
>> https://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/
>>
>> I see that setup.py special cases this variable.  I did a search for  
>> 'env
>> =', and it looks like hooks and pager start with empty environments, so  
>> they
>> must not inherit this.  IDR if any recent changes were made that start  
>> with
>> an empty environment.
>>
>> The thing I can't get my mind around is the hit and miss nature of the
>> error, if this is really the problem.
>
> It sounds like it should be harmless to just always forward
> %SystemRoot% - should we just do that?

Seems reasonable, but run-tests._getenv() already does an  
os.environ.copy(), so it should be there?

It does seem like a good idea to do it for hooks and other things  
executed, where the environment is built from scratch.  The question is  
where?  There's util.popen[2-4](), plus some direct calls to  
subprocess.Popen(), and an os.system().  I considered util.shellenviron(),  
but there are far fewer calls to this than places where processes are  
spawned.

I just hit it here:
Augie Fackler - June 19, 2017, 2:19 a.m.
> On Jun 16, 2017, at 22:02, Matt Harbison <mharbison72@gmail.com> wrote:
> 
> On Fri, 16 Jun 2017 09:59:30 -0400, Augie Fackler <raf@durin42.com> wrote:
> 
>> On Fri, Jun 16, 2017 at 12:18:18AM -0400, Matt Harbison wrote:
>>> So apparently, this is a symptom of not having %SystemRoot% in the
>>> environment when calling CreateProcess().
>>> 
>>> https://bugs.python.org/issue13524
>>> https://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/
>>> 
>>> I see that setup.py special cases this variable.  I did a search for 'env
>>> =', and it looks like hooks and pager start with empty environments, so they
>>> must not inherit this.  IDR if any recent changes were made that start with
>>> an empty environment.
>>> 
>>> The thing I can't get my mind around is the hit and miss nature of the
>>> error, if this is really the problem.
>> 
>> It sounds like it should be harmless to just always forward
>> %SystemRoot% - should we just do that?
> 
> Seems reasonable, but run-tests._getenv() already does an os.environ.copy(), so it should be there?
> 
> It does seem like a good idea to do it for hooks and other things executed, where the environment is built from scratch.  The question is where?  There's util.popen[2-4](), plus some direct calls to subprocess.Popen(), and an os.system().  I considered util.shellenviron(), but there are far fewer calls to this than places where processes are spawned.

My gut (without having gone through the calls) is that we should probably build up a subproc module in hg and standardize the various ways we call things - right now it's kind of messy, and I strongly suspect we only need 2 or 3 functions (naive call, call with stdin/stdout/stderr wired up, call with no stdin but stdout/stderr captured).

Absent that cleanup, yeah, just do the necessary cleanup on each callsite - the os.system() call should (I think) forward everything though, so that one at least should be fine...


> I just hit it here:
> 
> --- c:/Users/Matt/projects/hg/tests/test-log.t
> +++ c:/Users/Matt/projects/hg/tests/test-log.t.err
> @@ -1476,72 +1476,64 @@
>   $ hg resolve -m foo
>   (no more unresolved files)
>   $ hg ci -m "First merge, related"
> +  abort: Invalid Signature
> +  [255]
> 
> I was wondering if it's right on the edge of some limit, and paths or whatever being set by the test runner are (sometimes) pushing it out of the environment block.  But the docs say there's no limit to the block size:
> 
> https://msdn.microsoft.com/en-us/library/windows/desktop/ms682653(v=vs.85).aspx
> 
>>> _______________________________________________
>>> Mercurial-devel mailing list
>>> Mercurial-devel@mercurial-scm.org
>>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - June 19, 2017, 3:30 p.m.
On Sun, 18 Jun 2017 22:19:29 -0400, Augie Fackler wrote:
> 
> > On Jun 16, 2017, at 22:02, Matt Harbison <mharbison72@gmail.com> wrote:
> > 
> > On Fri, 16 Jun 2017 09:59:30 -0400, Augie Fackler <raf@durin42.com> wrote:
> > 
> >> On Fri, Jun 16, 2017 at 12:18:18AM -0400, Matt Harbison wrote:
> >>> So apparently, this is a symptom of not having %SystemRoot% in the
> >>> environment when calling CreateProcess().
> >>> 
> >>> https://bugs.python.org/issue13524
> >>> https://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/
> >>> 
> >>> I see that setup.py special cases this variable.  I did a search for 'env
> >>> =', and it looks like hooks and pager start with empty environments, so they
> >>> must not inherit this.  IDR if any recent changes were made that start with
> >>> an empty environment.
> >>> 
> >>> The thing I can't get my mind around is the hit and miss nature of the
> >>> error, if this is really the problem.
> >> 
> >> It sounds like it should be harmless to just always forward
> >> %SystemRoot% - should we just do that?
> > 
> > Seems reasonable, but run-tests._getenv() already does an os.environ.copy(), so it should be there?
> > 
> > It does seem like a good idea to do it for hooks and other things executed, where the environment is built from scratch.  The question is where?  There's util.popen[2-4](), plus some direct calls to subprocess.Popen(), and an os.system().  I considered util.shellenviron(), but there are far fewer calls to this than places where processes are spawned.

(+CC foozy since he has Windows)

Is the problem only seen in tests? I don't think environment variables are
cleared in hg side.
Katsunori FUJIWARA - June 19, 2017, 6:47 p.m.
At Tue, 20 Jun 2017 00:30:28 +0900,
Yuya Nishihara wrote:
> 
> On Sun, 18 Jun 2017 22:19:29 -0400, Augie Fackler wrote:
> > 
> > > On Jun 16, 2017, at 22:02, Matt Harbison <mharbison72@gmail.com> wrote:
> > > 
> > > On Fri, 16 Jun 2017 09:59:30 -0400, Augie Fackler <raf@durin42.com> wrote:
> > > 
> > >> On Fri, Jun 16, 2017 at 12:18:18AM -0400, Matt Harbison wrote:
> > >>> So apparently, this is a symptom of not having %SystemRoot% in the
> > >>> environment when calling CreateProcess().
> > >>> 
> > >>> https://bugs.python.org/issue13524
> > >>> https://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/
> > >>> 
> > >>> I see that setup.py special cases this variable.  I did a search for 'env
> > >>> =', and it looks like hooks and pager start with empty environments, so they
> > >>> must not inherit this.  IDR if any recent changes were made that start with
> > >>> an empty environment.
> > >>> 
> > >>> The thing I can't get my mind around is the hit and miss nature of the
> > >>> error, if this is really the problem.
> > >> 
> > >> It sounds like it should be harmless to just always forward
> > >> %SystemRoot% - should we just do that?
> > > 
> > > Seems reasonable, but run-tests._getenv() already does an os.environ.copy(), so it should be there?
> > > 
> > > It does seem like a good idea to do it for hooks and other things executed, where the environment is built from scratch.  The question is where?  There's util.popen[2-4](), plus some direct calls to subprocess.Popen(), and an os.system().  I considered util.shellenviron(), but there are far fewer calls to this than places where processes are spawned.
> 
> (+CC foozy since he has Windows)
> 
> Is the problem only seen in tests? I don't think environment variables are
> cleared in hg side.
> 

I can't reproduce this issue on my Windows VM, yet :-<

I'm using 32bit Python 2.7.9 on 64bit Windows7, but is 64bit or more
recent Python (like as your environment) required to reproduce this ?

BTW, on hg side, there is no code path, which causes replacement or
removal of SystemRoot environment variable, AFAIK.

  - util.popen

    This involves os.popen(), and always inherits current environment
    variables.

  - util.popen2/3/4

    This involves subprocess.Popen() with 'env=env', but (1) default
    value of 'env' is None (= inherit current ones), and (2) all
    existing callers in sources below invoke without explicit env
    argument.

    - hgext/convert/*.py
    - mercurial/sshpeer.py

  - util.system

    This involves subprocess.Popen() with result of
    shellenviron(environ), which updates existing variables or adds
    new variables. And there is no replacement of SystemRoot on caller
    sides.

    Execution of external hook, editor, and so on uses this API via
    ui.system().

  - os.popen (for other than Mercurial APIs above)

    This always inherits current environment variables.

    (not used directly in mercurial/** and hgext/**)

  - os.system (for other than Mercurial APIs above)

    This always inherits current environment variables.

    - hgext/convert/gnuarch.py
    - hgext/factotum.py
    - mercurial/statprof.py
    - mercurial/util.py (for tempfilter)

  - subprocess.Popen (for other than Mercurial APIs above)

    - hgext/convert/common.py
    - hgext/fsmonitor/pywatchman/__init__.py
    - mercurial/util.py (for pipefilter)
      no explicit env

    - hgext/logtoprocess.py
    - mercurial/subrepo.py
      using copy of encoding.environ

    - mercurial/ui.py (for pager)
      using util.shellenviron() with rcutil.defaultpagerenv()

On run-tests.py side, subprocess.Popen() is sometimes used with
explicit env argument, but the result of Test._getenv() or None is
used in such case. This doesn't cause replacement or removal of
SystemRoot (at least, with current implementation).

Any of intermediate layers (MinGW or Python ?) causes issue ?
Matt Harbison - June 20, 2017, 2:20 a.m.
On Mon, 19 Jun 2017 11:30:28 -0400, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sun, 18 Jun 2017 22:19:29 -0400, Augie Fackler wrote:
>>
>> > On Jun 16, 2017, at 22:02, Matt Harbison <mharbison72@gmail.com>  
>> wrote:
>> >
>> > On Fri, 16 Jun 2017 09:59:30 -0400, Augie Fackler <raf@durin42.com>  
>> wrote:
>> >
>> >> On Fri, Jun 16, 2017 at 12:18:18AM -0400, Matt Harbison wrote:
>> >>> So apparently, this is a symptom of not having %SystemRoot% in the
>> >>> environment when calling CreateProcess().
>> >>>
>> >>> https://bugs.python.org/issue13524
>> >>>  
>> https://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/
>> >>>
>> >>> I see that setup.py special cases this variable.  I did a search  
>> for 'env
>> >>> =', and it looks like hooks and pager start with empty  
>> environments, so they
>> >>> must not inherit this.  IDR if any recent changes were made that  
>> start with
>> >>> an empty environment.
>> >>>
>> >>> The thing I can't get my mind around is the hit and miss nature of  
>> the
>> >>> error, if this is really the problem.
>> >>
>> >> It sounds like it should be harmless to just always forward
>> >> %SystemRoot% - should we just do that?
>> >
>> > Seems reasonable, but run-tests._getenv() already does an  
>> os.environ.copy(), so it should be there?
>> >
>> > It does seem like a good idea to do it for hooks and other things  
>> executed, where the environment is built from scratch.  The question is  
>> where?  There's util.popen[2-4](), plus some direct calls to  
>> subprocess.Popen(), and an os.system().  I considered  
>> util.shellenviron(), but there are far fewer calls to this than places  
>> where processes are spawned.
>
> (+CC foozy since he has Windows)
>
> Is the problem only seen in tests? I don't think environment variables  
> are
> cleared in hg side.

I've only ever seen it while running the tests on this one Win7 x64  
system.  But since it only started in the last few weeks and is rare, I'd  
say not enough data at this point.

I didn't think hg cleared anything, and the get-with-headers.py example in  
this thread pretty much skips hg code (it does import util.py).

I've got a second Windows machine setup (but it's much slower, and really  
can't deal with much more than --jobs=2), and I added code to run-tests.py  
and dispatch.py to complain if SystemRoot is not set.  I haven't seen it  
pop up in the last few days (~6-8 full runs).
Matt Harbison - June 20, 2017, 2:50 a.m.
On Mon, 19 Jun 2017 14:47:22 -0400, FUJIWARA Katsunori  
<foozy@lares.dti.ne.jp> wrote:

> At Tue, 20 Jun 2017 00:30:28 +0900,
> Yuya Nishihara wrote:
>>
>> On Sun, 18 Jun 2017 22:19:29 -0400, Augie Fackler wrote:
>> >
>> > > On Jun 16, 2017, at 22:02, Matt Harbison <mharbison72@gmail.com>  
>> wrote:
>> > >
>> > > On Fri, 16 Jun 2017 09:59:30 -0400, Augie Fackler <raf@durin42.com>  
>> wrote:
>> > >
>> > >> On Fri, Jun 16, 2017 at 12:18:18AM -0400, Matt Harbison wrote:
>> > >>> So apparently, this is a symptom of not having %SystemRoot% in the
>> > >>> environment when calling CreateProcess().
>> > >>>
>> > >>> https://bugs.python.org/issue13524
>> > >>>  
>> https://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/

>> (+CC foozy since he has Windows)
>>
>> Is the problem only seen in tests? I don't think environment variables  
>> are
>> cleared in hg side.
>>
>
> I can't reproduce this issue on my Windows VM, yet :-<
>
> I'm using 32bit Python 2.7.9 on 64bit Windows7, but is 64bit or more
> recent Python (like as your environment) required to reproduce this ?

Not sure.  I've only tested with 2.7.13 x64.  I run with '-t9000 -j9' to  
burn through faster, which (I think) causes all kinds of occasional one  
off failures (port in use, etc).

> BTW, on hg side, there is no code path, which causes replacement or
> removal of SystemRoot environment variable, AFAIK.
>
>   - util.popen
>
>     This involves os.popen(), and always inherits current environment
>     variables.
>
>   - util.popen2/3/4
>
>     This involves subprocess.Popen() with 'env=env', but (1) default
>     value of 'env' is None (= inherit current ones), and (2) all
>     existing callers in sources below invoke without explicit env
>     argument.
>
>     - hgext/convert/*.py
>     - mercurial/sshpeer.py
>
>   - util.system
>
>     This involves subprocess.Popen() with result of
>     shellenviron(environ), which updates existing variables or adds
>     new variables. And there is no replacement of SystemRoot on caller
>     sides.
>
>     Execution of external hook, editor, and so on uses this API via
>     ui.system().
>
>   - os.popen (for other than Mercurial APIs above)
>
>     This always inherits current environment variables.
>
>     (not used directly in mercurial/** and hgext/**)
>
>   - os.system (for other than Mercurial APIs above)
>
>     This always inherits current environment variables.
>
>     - hgext/convert/gnuarch.py
>     - hgext/factotum.py
>     - mercurial/statprof.py
>     - mercurial/util.py (for tempfilter)
>
>   - subprocess.Popen (for other than Mercurial APIs above)
>
>     - hgext/convert/common.py
>     - hgext/fsmonitor/pywatchman/__init__.py
>     - mercurial/util.py (for pipefilter)
>       no explicit env
>
>     - hgext/logtoprocess.py
>     - mercurial/subrepo.py
>       using copy of encoding.environ
>
>     - mercurial/ui.py (for pager)
>       using util.shellenviron() with rcutil.defaultpagerenv()

Thanks for this.  I hadn't looked too closely at all of these call sites,  
and was a little suspicious about some of them.  But I see how they pick  
up the existing environment now.

> On run-tests.py side, subprocess.Popen() is sometimes used with
> explicit env argument, but the result of Test._getenv() or None is
> used in such case. This doesn't cause replacement or removal of
> SystemRoot (at least, with current implementation).
>
> Any of intermediate layers (MinGW or Python ?) causes issue ?

Not sure.  This is a regular Windows build (doesn't link against MSYS),  
and nothing has changed software-wise in a long time.
Matt Harbison - July 10, 2017, 3:15 a.m.
On Mon, 19 Jun 2017 11:30:28 -0400, Yuya Nishihara <yuya@tcha.org> wrote:

> On Sun, 18 Jun 2017 22:19:29 -0400, Augie Fackler wrote:
>>
>> > On Jun 16, 2017, at 22:02, Matt Harbison <mharbison72@gmail.com>  
>> wrote:
>> >
>> > On Fri, 16 Jun 2017 09:59:30 -0400, Augie Fackler <raf@durin42.com>  
>> wrote:
>> >
>> >> On Fri, Jun 16, 2017 at 12:18:18AM -0400, Matt Harbison wrote:
>> >>> So apparently, this is a symptom of not having %SystemRoot% in the
>> >>> environment when calling CreateProcess().
>> >>>
>> >>> https://bugs.python.org/issue13524
>> >>>  
>> https://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/
>> >>>
>> >>> I see that setup.py special cases this variable.  I did a search  
>> for 'env
>> >>> =', and it looks like hooks and pager start with empty  
>> environments, so they
>> >>> must not inherit this.  IDR if any recent changes were made that  
>> start with
>> >>> an empty environment.
>> >>>
>> >>> The thing I can't get my mind around is the hit and miss nature of  
>> the
>> >>> error, if this is really the problem.
>> >>
>> >> It sounds like it should be harmless to just always forward
>> >> %SystemRoot% - should we just do that?
>> >
>> > Seems reasonable, but run-tests._getenv() already does an  
>> os.environ.copy(), so it should be there?
>> >
>> > It does seem like a good idea to do it for hooks and other things  
>> executed, where the environment is built from scratch.  The question is  
>> where?  There's util.popen[2-4](), plus some direct calls to  
>> subprocess.Popen(), and an os.system().  I considered  
>> util.shellenviron(), but there are far fewer calls to this than places  
>> where processes are spawned.
>
> (+CC foozy since he has Windows)
>
> Is the problem only seen in tests? I don't think environment variables  
> are
> cleared in hg side.

I hit this problem again this weekend, after it was quiet for the past  
couple of weeks.  It looks like it might be an issue with nearly running  
low on memory.

When it happened this time, Windows popped up a dialog box saying memory  
was running low, offering to kill some programs.  I had TaskManager open,  
and saw the Performance > System > Commit (MB) line was running around  
5400/6076.  I closed thg, which exited thg.exe and hg.exe (listed at ~300  
MB each in the process list), and the issue stopped.  I was able to  
recreate it again after a day of quiet by opening up a bunch of tabs in  
FireFox, and pushing the memory usage around that threshold.  I tend to  
run the tests with -j9, and I've seen the first number bounce around  
between 4900 and 5700+ during these failures.  So I'm not sure what the  
exact problem threshold is, as tests start and exit.  Interestingly, the  
free memory number in Resource Monitor at the same time indicates only  
150MB-20MB free.

One of the "optimizations" of the SSD install software was to cap the page  
file, which is probably why I hadn't seen this until recently.  Kostia had  
mentioned to me that he was seeing errors saying “application failed to  
start 0xc0000142”, which I also saw (along with dialog box failures of  
various msys executables, like env.exe and grep.exe).  So maybe this is  
useful to others wanting to run Windows tests.  It seems unlikely that it  
would be seen in the wild (the page file usually isn't capped), and I  
doubt there's anything we can do about it anyway.

Patch

--- c:/Users/Matt/projects/hg/tests/test-log.t
+++ c:/Users/Matt/projects/hg/tests/test-log.t.err
@@ -1476,72 +1476,64 @@ 
    $ hg resolve -m foo
    (no more unresolved files)
    $ hg ci -m "First merge, related"
+  abort: Invalid Signature
+  [255]

I was wondering if it's right on the edge of some limit, and paths or  
whatever being set by the test runner are (sometimes) pushing it out of  
the environment block.  But the docs say there's no limit to the block  
size:

https://msdn.microsoft.com/en-us/library/windows/desktop/ms682653(v=vs.85).aspx

>> _______________________________________________
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel