Patchwork [STABLE] demandimport: avoid infinite recursion at actual module importing (issue5304)

login
register
mail settings
Submitter Katsunori FUJIWARA
Date July 29, 2016, 11:24 p.m.
Message ID <b0b24f8a06ffd2ca1dbd.1469834647@juju>
Download mbox | patch
Permalink /patch/16006/
State Superseded
Delegated to: Yuya Nishihara
Headers show

Comments

Katsunori FUJIWARA - July 29, 2016, 11:24 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1469834098 -32400
#      Sat Jul 30 08:14:58 2016 +0900
# Branch stable
# Node ID b0b24f8a06ffd2ca1dbd798f9c214466889939f8
# Parent  491ee264b9f6e32b6e4dfe34180fb48226fc1641
demandimport: avoid infinite recursion at actual module importing (issue5304)

Before this patch, importing C module on Windows environment causes
infinite recursion call, if py2exe is used with -b2 option.

At importing C module "a.b", extra hooking by zipextimporter of py2exe
causes:

  0. assumption before accessing "b" of "a":

     - built-in module object is created for "a",
       (= "a" is actually imported)
     - _demandmod is created for "a.b" as a proxy object, and
       (= "a.b" is not yet imported)
     - an attribute "b" of "a" is initialized by the latter

  1. invocation of __import__ via _hgextimport() in _demandmod._load()
     for "a.b" implies _demandimport() for "a.b"

     This is unintentional, because _demandmod might be returned by
     _hgextimport() instead of built-in module object.

  2. _demandimport() at (1) is invoked with not context of "a", but
     context of zipextimporter

     Just after invocation of _hgextimport() in _demandimport(), an
     attribute "b" of the built-in module object for "a" is still
     bound to the proxy object for "a.b", because context of "a" isn't
     updated by actual importing "a.b". even though the built-in
     module object for "a.b" already appears in sys.modules.

     Therefore, chainmodules() returns _demandmod for "a.b", which is
     gotten from the attribute "b" of "a".

  3. processfromitem() on "a.b" causes _demandmod._load() for "a.b"
     again

     _demandimport() takes context of "a" in this case.

     Therefore, attributes below are bound to built-in module object
     for "a.b", as expected:

     - "b" of built-in module object for "a"
     - _module of _demandmod for "a.b"

  4. but _demandimport() invoked at (1) returns _demandmod object

     because _demandimport() just returns the object returned by
     chainmodules() at (3) above.

  5. then, _demandmod._load() causes infinite recursion call

     _demandimport() returns _demandmod for "a.b", and it is "self" at
     _demandmod._load().

To avoid infinite recursion at actual module importing, this patch
uses self._module, if _hgextimport() returns _demandmod itself. If
_demandmod._module isn't yet bound at this point, execution should be
aborted, because actual importing failed.

In this patch, _demandmod._module is examined not on _demandimport()
side, but on _demandmod._load() side, because:

  - the former has some exit points
  - only the latter uses _hgextimport(), except for _demandimport()

BTW, this issue occurs only in the code path for non .py/.pyc files in
zipextimporter (strictly speaking, in _memimporter) of py2exe.

Even if zipextimporter is enabled, .py/.pyc files are handled by
zipimporter, and it doesn't imply unintentional _demandimport() at
invocation of __import__ via _hgextimport().
Yuya Nishihara - July 30, 2016, 11:59 a.m.
On Sat, 30 Jul 2016 08:24:07 +0900, FUJIWARA Katsunori wrote:
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1469834098 -32400
> #      Sat Jul 30 08:14:58 2016 +0900
> # Branch stable
> # Node ID b0b24f8a06ffd2ca1dbd798f9c214466889939f8
> # Parent  491ee264b9f6e32b6e4dfe34180fb48226fc1641
> demandimport: avoid infinite recursion at actual module importing (issue5304)
> 
> Before this patch, importing C module on Windows environment causes
> infinite recursion call, if py2exe is used with -b2 option.
> 
> At importing C module "a.b", extra hooking by zipextimporter of py2exe
> causes:
> 
>   0. assumption before accessing "b" of "a":
> 
>      - built-in module object is created for "a",
>        (= "a" is actually imported)
>      - _demandmod is created for "a.b" as a proxy object, and
>        (= "a.b" is not yet imported)
>      - an attribute "b" of "a" is initialized by the latter
> 
>   1. invocation of __import__ via _hgextimport() in _demandmod._load()
>      for "a.b" implies _demandimport() for "a.b"
> 
>      This is unintentional, because _demandmod might be returned by
>      _hgextimport() instead of built-in module object.
> 
>   2. _demandimport() at (1) is invoked with not context of "a", but
>      context of zipextimporter
> 
>      Just after invocation of _hgextimport() in _demandimport(), an
>      attribute "b" of the built-in module object for "a" is still
>      bound to the proxy object for "a.b", because context of "a" isn't
>      updated by actual importing "a.b". even though the built-in
>      module object for "a.b" already appears in sys.modules.
> 
>      Therefore, chainmodules() returns _demandmod for "a.b", which is
>      gotten from the attribute "b" of "a".
> 
>   3. processfromitem() on "a.b" causes _demandmod._load() for "a.b"
>      again
> 
>      _demandimport() takes context of "a" in this case.
> 
>      Therefore, attributes below are bound to built-in module object
>      for "a.b", as expected:
> 
>      - "b" of built-in module object for "a"
>      - _module of _demandmod for "a.b"
> 
>   4. but _demandimport() invoked at (1) returns _demandmod object
> 
>      because _demandimport() just returns the object returned by
>      chainmodules() at (3) above.
> 
>   5. then, _demandmod._load() causes infinite recursion call
> 
>      _demandimport() returns _demandmod for "a.b", and it is "self" at
>      _demandmod._load().
> 
> To avoid infinite recursion at actual module importing, this patch
> uses self._module, if _hgextimport() returns _demandmod itself. If
> _demandmod._module isn't yet bound at this point, execution should be
> aborted, because actual importing failed.
> 
> In this patch, _demandmod._module is examined not on _demandimport()
> side, but on _demandmod._load() side, because:
> 
>   - the former has some exit points
>   - only the latter uses _hgextimport(), except for _demandimport()
> 
> BTW, this issue occurs only in the code path for non .py/.pyc files in
> zipextimporter (strictly speaking, in _memimporter) of py2exe.
> 
> Even if zipextimporter is enabled, .py/.pyc files are handled by
> zipimporter, and it doesn't imply unintentional _demandimport() at
> invocation of __import__ via _hgextimport().

Thanks for the detailed investigation. The change looks okay. If new mod is
self, it should be fine to use self._module.

A few nits:

> diff --git a/mercurial/demandimport.py b/mercurial/demandimport.py
> --- a/mercurial/demandimport.py
> +++ b/mercurial/demandimport.py
> @@ -94,6 +94,11 @@ class _demandmod(object):
>          if not self._module:
>              head, globals, locals, after, level, modrefs = self._data
>              mod = _hgextimport(_import, head, globals, locals, None, level)
> +            if mod is self:
> +                # see issue5304 for detail

Can you add more details here?

> +                assert self._module and self._module is not self
> +                mod = self._module

Do we need to re-run the setup path of self._module?

My understanding is that _hgextimport() can recurse into self._load(), in
which case, subload() and modrefs should already be resolved.

> +
>              # load submodules
>              def subload(mod, p):
>                  h, t = p, None
Katsunori FUJIWARA - July 30, 2016, 3:22 p.m.
At Sat, 30 Jul 2016 20:59:05 +0900,
Yuya Nishihara wrote:
> 
> On Sat, 30 Jul 2016 08:24:07 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> > # Date 1469834098 -32400
> > #      Sat Jul 30 08:14:58 2016 +0900
> > # Branch stable
> > # Node ID b0b24f8a06ffd2ca1dbd798f9c214466889939f8
> > # Parent  491ee264b9f6e32b6e4dfe34180fb48226fc1641
> > demandimport: avoid infinite recursion at actual module importing (issue5304)
> > 
> > Before this patch, importing C module on Windows environment causes
> > infinite recursion call, if py2exe is used with -b2 option.
> > 
> > At importing C module "a.b", extra hooking by zipextimporter of py2exe
> > causes:
> > 
> >   0. assumption before accessing "b" of "a":
> > 
> >      - built-in module object is created for "a",
> >        (= "a" is actually imported)
> >      - _demandmod is created for "a.b" as a proxy object, and
> >        (= "a.b" is not yet imported)
> >      - an attribute "b" of "a" is initialized by the latter
> > 
> >   1. invocation of __import__ via _hgextimport() in _demandmod._load()
> >      for "a.b" implies _demandimport() for "a.b"
> > 
> >      This is unintentional, because _demandmod might be returned by
> >      _hgextimport() instead of built-in module object.
> > 
> >   2. _demandimport() at (1) is invoked with not context of "a", but
> >      context of zipextimporter
> > 
> >      Just after invocation of _hgextimport() in _demandimport(), an
> >      attribute "b" of the built-in module object for "a" is still
> >      bound to the proxy object for "a.b", because context of "a" isn't
> >      updated by actual importing "a.b". even though the built-in
> >      module object for "a.b" already appears in sys.modules.
> > 
> >      Therefore, chainmodules() returns _demandmod for "a.b", which is
> >      gotten from the attribute "b" of "a".
> > 
> >   3. processfromitem() on "a.b" causes _demandmod._load() for "a.b"
> >      again
> > 
> >      _demandimport() takes context of "a" in this case.
> > 
> >      Therefore, attributes below are bound to built-in module object
> >      for "a.b", as expected:
> > 
> >      - "b" of built-in module object for "a"
> >      - _module of _demandmod for "a.b"
> > 
> >   4. but _demandimport() invoked at (1) returns _demandmod object
> > 
> >      because _demandimport() just returns the object returned by
> >      chainmodules() at (3) above.
> > 
> >   5. then, _demandmod._load() causes infinite recursion call
> > 
> >      _demandimport() returns _demandmod for "a.b", and it is "self" at
> >      _demandmod._load().
> > 
> > To avoid infinite recursion at actual module importing, this patch
> > uses self._module, if _hgextimport() returns _demandmod itself. If
> > _demandmod._module isn't yet bound at this point, execution should be
> > aborted, because actual importing failed.
> > 
> > In this patch, _demandmod._module is examined not on _demandimport()
> > side, but on _demandmod._load() side, because:
> > 
> >   - the former has some exit points
> >   - only the latter uses _hgextimport(), except for _demandimport()
> > 
> > BTW, this issue occurs only in the code path for non .py/.pyc files in
> > zipextimporter (strictly speaking, in _memimporter) of py2exe.
> > 
> > Even if zipextimporter is enabled, .py/.pyc files are handled by
> > zipimporter, and it doesn't imply unintentional _demandimport() at
> > invocation of __import__ via _hgextimport().
> 
> Thanks for the detailed investigation. The change looks okay. If new mod is
> self, it should be fine to use self._module.
> 
> A few nits:
> 
> > diff --git a/mercurial/demandimport.py b/mercurial/demandimport.py
> > --- a/mercurial/demandimport.py
> > +++ b/mercurial/demandimport.py
> > @@ -94,6 +94,11 @@ class _demandmod(object):
> >          if not self._module:
> >              head, globals, locals, after, level, modrefs = self._data
> >              mod = _hgextimport(_import, head, globals, locals, None, level)
> > +            if mod is self:
> > +                # see issue5304 for detail
> 
> Can you add more details here?

OK, I'll add "why (1) self._module should be bound to other than self
and (2) subload() and modrefs aren't needed, in this situation" (+
"see also issue5304").


> > +                assert self._module and self._module is not self
> > +                mod = self._module
> 
> Do we need to re-run the setup path of self._module?
> 
> My understanding is that _hgextimport() can recurse into self._load(), in
> which case, subload() and modrefs should already be resolved.

Yes. you are right. _demandmod._load() can return immediately in this
case.

I'll send revised one.

> > +
> >              # load submodules
> >              def subload(mod, p):
> >                  h, t = p, None
> 

----------------------------------------------------------------------
[FUJIWARA Katsunori]                             foozy@lares.dti.ne.jp

Patch

diff --git a/mercurial/demandimport.py b/mercurial/demandimport.py
--- a/mercurial/demandimport.py
+++ b/mercurial/demandimport.py
@@ -94,6 +94,11 @@  class _demandmod(object):
         if not self._module:
             head, globals, locals, after, level, modrefs = self._data
             mod = _hgextimport(_import, head, globals, locals, None, level)
+            if mod is self:
+                # see issue5304 for detail
+                assert self._module and self._module is not self
+                mod = self._module
+
             # load submodules
             def subload(mod, p):
                 h, t = p, None