Patchwork [6,of,6,import-refactor,V3] mercurial: support loading modules from zipimporter

login
register
mail settings
Submitter Gregory Szorc
Date Dec. 4, 2015, 5:51 a.m.
Message ID <782783ea499c7e03253b.1449208272@ubuntu-main>
Download mbox | patch
Permalink /patch/11804/
State Accepted
Delegated to: Yuya Nishihara
Headers show

Comments

Gregory Szorc - Dec. 4, 2015, 5:51 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1449206705 28800
#      Thu Dec 03 21:25:05 2015 -0800
# Node ID 782783ea499c7e03253b55f0b0c221b9946356e2
# Parent  dd8b23489d2a5f5580905c995333d139fe457d1c
mercurial: support loading modules from zipimporter

The previous refactor to module importing broke module loading when
mercurial.* modules were loaded from a zipfile (using a zipimporter).
This scenario is likely encountered when using py2exe.

Supporting zipimporter and the traditional importer side-by-side
turns out to be quite a pain. In Python 2.x, the standard, file-based
import mechanism is partially implemented in C. The sys.meta_path
and sys.path_hooks hook points exist to allow custom importers in
Python/userland. zipimport.zipimporter and our "hgimporter" class
from earlier in this patch series are 2 of these.

In a standard Python installation (no matter if running in py2exe
or similar or not), zipimport.zipimporter appears to be registered
in sys.path_hooks. This means that as each sys.path entry is
consulted, it will ask zipimporter if it supports that path and
zipimporter will be used if that entry is a zip file. In a
py2exe environment, sys.path contains an entry with the path to
the zip file containing the Python standard library along with
Mercurial's Python files.

The way the importer mechanism works is the first importer that
declares knowledge of a module (via find_module() returning an
object) gets to load it. Since our "hgimporter" is registered
in sys.meta_path and returns an interest in specific mercurial.*
modules, the zipimporter registered on sys.path_hooks never comes
into play for these modules. So, we need to be zipimporter aware
and call into zipimporter to load modules.

This patch teaches "hgimporter" how to call out into zipimporter
when necessary. We detect the necessity of zipimporter by looking
at the loader for the "mercurial" module. If it is a zipimporter
instance, we load via zipimporter.

The behavior of zipimporter is a bit wonky.

You appear to need separate zipimporter instances for each directory
in the zip file. I'm not sure why this is. I suspect it has
something to do with the low-level importing mechanism (implemented
in C) operating on a per-directory basis. PEP-302 makes some
references to this. I was not able to get a zipimporter to
import modules outside of its immediate directory no matter how
I specified the module name. This is why we use separate
zipimporter instances for the ".zip/mercurial" and
".zip/mercurial/pure" locations.

The zipimporter documentation for Python 2.7 explicitly states that
zipimporter does not import dynamic modules (C extensions). Yet from
a py2exe distribution on Windows - where the .pyd files are *not*
in the zip archive - zipimporter imported these dynamic modules
just fine! I'm not sure if dynamic modules can't be imported from
*inside* the zip archive or whether zipimporter looks for dynamic
modules outside the zip archive. All I know is zipimporter does
manage to import the .pyd files on Windows and this patch makes
our new importer compatible with py2exe.

In the ideal world, We'd probably reimplement or fall back to parts
of the built-in import mechanism instead of handling zipimporter
specially. After all, if we're loading Mercurial modules via
something that isn't the built-in file-based importer or zipimporter,
our custom importer will likely fail because it doesn't know how to
call into it. I'd like to think that we'll never encounter this
in the wild, but you never know. If we do encounter it, we can
come up with another solution.

It's worth nothing that Python 3 has moved a lot of the importing
code from C to Python. Python 3 gives you near total control over
the import mechanism. So in the very distant future when Mercurial
drops Python 2 support, it's likely that our custom importer code
can be refactored to something a bit saner.
Yuya Nishihara - Dec. 4, 2015, 1:06 p.m.
On Thu, 03 Dec 2015 21:51:12 -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1449206705 28800
> #      Thu Dec 03 21:25:05 2015 -0800
> # Node ID 782783ea499c7e03253b55f0b0c221b9946356e2
> # Parent  dd8b23489d2a5f5580905c995333d139fe457d1c
> mercurial: support loading modules from zipimporter
> 
> The previous refactor to module importing broke module loading when
> mercurial.* modules were loaded from a zipfile (using a zipimporter).
> This scenario is likely encountered when using py2exe.
> 
> Supporting zipimporter and the traditional importer side-by-side
> turns out to be quite a pain. In Python 2.x, the standard, file-based
> import mechanism is partially implemented in C. The sys.meta_path
> and sys.path_hooks hook points exist to allow custom importers in
> Python/userland. zipimport.zipimporter and our "hgimporter" class
> from earlier in this patch series are 2 of these.
> 
> In a standard Python installation (no matter if running in py2exe
> or similar or not), zipimport.zipimporter appears to be registered
> in sys.path_hooks. This means that as each sys.path entry is
> consulted, it will ask zipimporter if it supports that path and
> zipimporter will be used if that entry is a zip file. In a
> py2exe environment, sys.path contains an entry with the path to
> the zip file containing the Python standard library along with
> Mercurial's Python files.
> 
> The way the importer mechanism works is the first importer that
> declares knowledge of a module (via find_module() returning an
> object) gets to load it. Since our "hgimporter" is registered
> in sys.meta_path and returns an interest in specific mercurial.*
> modules, the zipimporter registered on sys.path_hooks never comes
> into play for these modules. So, we need to be zipimporter aware
> and call into zipimporter to load modules.
> 
> This patch teaches "hgimporter" how to call out into zipimporter
> when necessary. We detect the necessity of zipimporter by looking
> at the loader for the "mercurial" module. If it is a zipimporter
> instance, we load via zipimporter.
> 
> The behavior of zipimporter is a bit wonky.
> 
> You appear to need separate zipimporter instances for each directory
> in the zip file. I'm not sure why this is. I suspect it has
> something to do with the low-level importing mechanism (implemented
> in C) operating on a per-directory basis. PEP-302 makes some
> references to this. I was not able to get a zipimporter to
> import modules outside of its immediate directory no matter how
> I specified the module name. This is why we use separate
> zipimporter instances for the ".zip/mercurial" and
> ".zip/mercurial/pure" locations.
> 
> The zipimporter documentation for Python 2.7 explicitly states that
> zipimporter does not import dynamic modules (C extensions). Yet from
> a py2exe distribution on Windows - where the .pyd files are *not*
> in the zip archive - zipimporter imported these dynamic modules
> just fine! I'm not sure if dynamic modules can't be imported from
> *inside* the zip archive or whether zipimporter looks for dynamic
> modules outside the zip archive. All I know is zipimporter does
> manage to import the .pyd files on Windows and this patch makes
> our new importer compatible with py2exe.
> 
> In the ideal world, We'd probably reimplement or fall back to parts
> of the built-in import mechanism instead of handling zipimporter
> specially. After all, if we're loading Mercurial modules via
> something that isn't the built-in file-based importer or zipimporter,
> our custom importer will likely fail because it doesn't know how to
> call into it. I'd like to think that we'll never encounter this
> in the wild, but you never know. If we do encounter it, we can
> come up with another solution.
> 
> It's worth nothing that Python 3 has moved a lot of the importing
> code from C to Python. Python 3 gives you near total control over
> the import mechanism. So in the very distant future when Mercurial
> drops Python 2 support, it's likely that our custom importer code
> can be refactored to something a bit saner.
> 
> diff --git a/mercurial/__init__.py b/mercurial/__init__.py
> --- a/mercurial/__init__.py
> +++ b/mercurial/__init__.py
> @@ -5,16 +5,17 @@
>  # This software may be used and distributed according to the terms of the
>  # GNU General Public License version 2 or any later version.
>  
>  from __future__ import absolute_import
>  
>  import imp
>  import os
>  import sys
> +import zipimport
>  
>  __all__ = []
>  
>  # Rules for how modules can be loaded. Values are:
>  #
>  #    c - require C extensions
>  #    allow - allow pure Python implementation when C loading fails
>  #    py - only load pure Python modules
> @@ -55,16 +56,46 @@ class hgimporter(object):
>  
>      def load_module(self, name):
>          mod = sys.modules.get(name, None)
>          if mod:
>              return mod
>  
>          mercurial = sys.modules['mercurial']
>  
> +        # The zip importer behaves sufficiently differently from the default
> +        # importer to warrant its own code path.
> +        loader = getattr(mercurial, '__loader__', None)
> +        if isinstance(loader, zipimport.zipimporter):
> +            def ziploader(*paths):
> +                """Obtain a zipimporter for a directory under the main zip."""
> +                path = os.path.join(loader.archive, *paths)
> +                zl = sys.path_importer_cache.get(path)
> +                if not zl:
> +                    zl = zipimport.zipimporter(path)
> +                return zl
> +
> +            try:
> +                if modulepolicy == 'py':
> +                    raise ImportError()
> +
> +                zl = ziploader('mercurial')
> +                mod = zl.load_module(name)
> +                # Unlike imp, ziploader doesn't expose module metadata that
> +                # indicates the type of module. So just assume what we found
> +                # is OK (even though it could be a pure Python module).
> +            except ImportError:
> +                if modulepolicy == 'c':
> +                    raise
> +                zl = ziploader('mercurial', 'pure')
> +                mod = zl.load_module(name)
> +
> +            sys.modules[name] = mod
> +            return mod

Wow, great. My poor idea was to disable our importer if frozen assuming that
it would use the c modules.
Yuya Nishihara - Dec. 4, 2015, 2:03 p.m.
On Fri, 4 Dec 2015 22:06:15 +0900, Yuya Nishihara wrote:
> On Thu, 03 Dec 2015 21:51:12 -0800, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1449206705 28800
> > #      Thu Dec 03 21:25:05 2015 -0800
> > # Node ID 782783ea499c7e03253b55f0b0c221b9946356e2
> > # Parent  dd8b23489d2a5f5580905c995333d139fe457d1c
> > mercurial: support loading modules from zipimporter

Pushed the series to clowncopter, thanks!

I've ignored the following check-commit errors as they should be false
positive.

+  Revision 4374d819ccd5 does not comply to rules
+  ------------------------------------------------------
+  114: adds a function with foo_bar naming
+   +    def find_module(self, name, path=None):
+  
+  Revision 511a4384b033 does not comply to rules
+  ------------------------------------------------------
+  68: adds a function with foo_bar naming
+   +    def copy_file(self, *args, **kwargs):
Augie Fackler - Dec. 4, 2015, 3:13 p.m.
(+brett, who might have some idea about import behavior and is
probably going to be interested in this change anyway.)

On Thu, Dec 03, 2015 at 09:51:12PM -0800, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1449206705 28800
> #      Thu Dec 03 21:25:05 2015 -0800
> # Node ID 782783ea499c7e03253b55f0b0c221b9946356e2
> # Parent  dd8b23489d2a5f5580905c995333d139fe457d1c
> mercurial: support loading modules from zipimporter
>
> The previous refactor to module importing broke module loading when
> mercurial.* modules were loaded from a zipfile (using a zipimporter).
> This scenario is likely encountered when using py2exe.
>
> Supporting zipimporter and the traditional importer side-by-side
> turns out to be quite a pain. In Python 2.x, the standard, file-based
> import mechanism is partially implemented in C. The sys.meta_path
> and sys.path_hooks hook points exist to allow custom importers in
> Python/userland. zipimport.zipimporter and our "hgimporter" class
> from earlier in this patch series are 2 of these.
>
> In a standard Python installation (no matter if running in py2exe
> or similar or not), zipimport.zipimporter appears to be registered
> in sys.path_hooks. This means that as each sys.path entry is
> consulted, it will ask zipimporter if it supports that path and
> zipimporter will be used if that entry is a zip file. In a
> py2exe environment, sys.path contains an entry with the path to
> the zip file containing the Python standard library along with
> Mercurial's Python files.
>
> The way the importer mechanism works is the first importer that
> declares knowledge of a module (via find_module() returning an
> object) gets to load it. Since our "hgimporter" is registered
> in sys.meta_path and returns an interest in specific mercurial.*
> modules, the zipimporter registered on sys.path_hooks never comes
> into play for these modules. So, we need to be zipimporter aware
> and call into zipimporter to load modules.
>
> This patch teaches "hgimporter" how to call out into zipimporter
> when necessary. We detect the necessity of zipimporter by looking
> at the loader for the "mercurial" module. If it is a zipimporter
> instance, we load via zipimporter.
>
> The behavior of zipimporter is a bit wonky.

Patch is already queued, but Brett, do you have any insight into why
zipimport works this way? (see below)

>
> You appear to need separate zipimporter instances for each directory
> in the zip file. I'm not sure why this is. I suspect it has
> something to do with the low-level importing mechanism (implemented
> in C) operating on a per-directory basis. PEP-302 makes some
> references to this. I was not able to get a zipimporter to
> import modules outside of its immediate directory no matter how
> I specified the module name. This is why we use separate
> zipimporter instances for the ".zip/mercurial" and
> ".zip/mercurial/pure" locations.
>
> The zipimporter documentation for Python 2.7 explicitly states that
> zipimporter does not import dynamic modules (C extensions). Yet from
> a py2exe distribution on Windows - where the .pyd files are *not*
> in the zip archive - zipimporter imported these dynamic modules
> just fine! I'm not sure if dynamic modules can't be imported from
> *inside* the zip archive or whether zipimporter looks for dynamic
> modules outside the zip archive. All I know is zipimporter does
> manage to import the .pyd files on Windows and this patch makes
> our new importer compatible with py2exe.
>
> In the ideal world, We'd probably reimplement or fall back to parts
> of the built-in import mechanism instead of handling zipimporter
> specially. After all, if we're loading Mercurial modules via
> something that isn't the built-in file-based importer or zipimporter,
> our custom importer will likely fail because it doesn't know how to
> call into it. I'd like to think that we'll never encounter this
> in the wild, but you never know. If we do encounter it, we can
> come up with another solution.
>
> It's worth nothing that Python 3 has moved a lot of the importing
> code from C to Python. Python 3 gives you near total control over
> the import mechanism. So in the very distant future when Mercurial
> drops Python 2 support, it's likely that our custom importer code
> can be refactored to something a bit saner.
>
> diff --git a/mercurial/__init__.py b/mercurial/__init__.py
> --- a/mercurial/__init__.py
> +++ b/mercurial/__init__.py
> @@ -5,16 +5,17 @@
>  # This software may be used and distributed according to the terms of the
>  # GNU General Public License version 2 or any later version.
>
>  from __future__ import absolute_import
>
>  import imp
>  import os
>  import sys
> +import zipimport
>
>  __all__ = []
>
>  # Rules for how modules can be loaded. Values are:
>  #
>  #    c - require C extensions
>  #    allow - allow pure Python implementation when C loading fails
>  #    py - only load pure Python modules
> @@ -55,16 +56,46 @@ class hgimporter(object):
>
>      def load_module(self, name):
>          mod = sys.modules.get(name, None)
>          if mod:
>              return mod
>
>          mercurial = sys.modules['mercurial']
>
> +        # The zip importer behaves sufficiently differently from the default
> +        # importer to warrant its own code path.
> +        loader = getattr(mercurial, '__loader__', None)
> +        if isinstance(loader, zipimport.zipimporter):
> +            def ziploader(*paths):
> +                """Obtain a zipimporter for a directory under the main zip."""
> +                path = os.path.join(loader.archive, *paths)
> +                zl = sys.path_importer_cache.get(path)
> +                if not zl:
> +                    zl = zipimport.zipimporter(path)
> +                return zl
> +
> +            try:
> +                if modulepolicy == 'py':
> +                    raise ImportError()
> +
> +                zl = ziploader('mercurial')
> +                mod = zl.load_module(name)
> +                # Unlike imp, ziploader doesn't expose module metadata that
> +                # indicates the type of module. So just assume what we found
> +                # is OK (even though it could be a pure Python module).
> +            except ImportError:
> +                if modulepolicy == 'c':
> +                    raise
> +                zl = ziploader('mercurial', 'pure')
> +                mod = zl.load_module(name)
> +
> +            sys.modules[name] = mod
> +            return mod
> +
>          # Unlike the default importer which searches special locations and
>          # sys.path, we only look in the directory where "mercurial" was
>          # imported from.
>
>          # imp.find_module doesn't support submodules (modules with ".").
>          # Instead you have to pass the parent package's __path__ attribute
>          # as the path argument.
>          stem = name.split('.')[-1]
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> https://selenic.com/mailman/listinfo/mercurial-devel
Brett Cannon - Dec. 4, 2015, 6:27 p.m.
On Fri, 4 Dec 2015 at 07:13 Augie Fackler <raf@durin42.com> wrote:

> (+brett, who might have some idea about import behavior and is
> probably going to be interested in this change anyway.)
>
> On Thu, Dec 03, 2015 at 09:51:12PM -0800, Gregory Szorc wrote:
> > # HG changeset patch
> > # User Gregory Szorc <gregory.szorc@gmail.com>
> > # Date 1449206705 28800
> > #      Thu Dec 03 21:25:05 2015 -0800
> > # Node ID 782783ea499c7e03253b55f0b0c221b9946356e2
> > # Parent  dd8b23489d2a5f5580905c995333d139fe457d1c
> > mercurial: support loading modules from zipimporter
> >
> > The previous refactor to module importing broke module loading when
> > mercurial.* modules were loaded from a zipfile (using a zipimporter).
> > This scenario is likely encountered when using py2exe.
> >
> > Supporting zipimporter and the traditional importer side-by-side
> > turns out to be quite a pain. In Python 2.x, the standard, file-based
> > import mechanism is partially implemented in C. The sys.meta_path
> > and sys.path_hooks hook points exist to allow custom importers in
> > Python/userland. zipimport.zipimporter and our "hgimporter" class
> > from earlier in this patch series are 2 of these.
> >
> > In a standard Python installation (no matter if running in py2exe
> > or similar or not), zipimport.zipimporter appears to be registered
> > in sys.path_hooks. This means that as each sys.path entry is
> > consulted, it will ask zipimporter if it supports that path and
> > zipimporter will be used if that entry is a zip file. In a
> > py2exe environment, sys.path contains an entry with the path to
> > the zip file containing the Python standard library along with
> > Mercurial's Python files.
> >
> > The way the importer mechanism works is the first importer that
> > declares knowledge of a module (via find_module() returning an
> > object) gets to load it. Since our "hgimporter" is registered
> > in sys.meta_path and returns an interest in specific mercurial.*
> > modules, the zipimporter registered on sys.path_hooks never comes
> > into play for these modules. So, we need to be zipimporter aware
> > and call into zipimporter to load modules.
> >
> > This patch teaches "hgimporter" how to call out into zipimporter
> > when necessary. We detect the necessity of zipimporter by looking
> > at the loader for the "mercurial" module. If it is a zipimporter
> > instance, we load via zipimporter.
> >
> > The behavior of zipimporter is a bit wonky.
>
> Patch is already queued, but Brett, do you have any insight into why
> zipimport works this way? (see below)
>
> >
> > You appear to need separate zipimporter instances for each directory
> > in the zip file. I'm not sure why this is.


sys.path_importer_cache is the most likely culprit for structuring it this
way, but otherwise it's just the way someone chose to implement it (the
other approach would have been to cache a single importer per zipfile in
the sys.path_hooks object and then have the finders and loaders be
basically a single instance per zipfile).

FYI we are going to rewrite zipimporter in 3.6 or 3.7.

I suspect it has
> > something to do with the low-level importing mechanism (implemented
> > in C) operating on a per-directory basis. PEP-302 makes some
> > references to this. I was not able to get a zipimporter to
> > import modules outside of its immediate directory no matter how
> > I specified the module name. This is why we use separate
> > zipimporter instances for the ".zip/mercurial" and
> > ".zip/mercurial/pure" locations.
> >
> > The zipimporter documentation for Python 2.7 explicitly states that
> > zipimporter does not import dynamic modules (C extensions). Yet from
> > a py2exe distribution on Windows - where the .pyd files are *not*
> > in the zip archive - zipimporter imported these dynamic modules
> > just fine!


That doesn't work on UNIX which is why the docs say extension modules are
not supported.


> I'm not sure if dynamic modules can't be imported from
> > *inside* the zip archive or whether zipimporter looks for dynamic
> > modules outside the zip archive. All I know is zipimporter does
> > manage to import the .pyd files on Windows and this patch makes
> > our new importer compatible with py2exe.
> >
> > In the ideal world, We'd probably reimplement or fall back to parts
> > of the built-in import mechanism instead of handling zipimporter
> > specially. After all, if we're loading Mercurial modules via
> > something that isn't the built-in file-based importer or zipimporter,
> > our custom importer will likely fail because it doesn't know how to
> > call into it. I'd like to think that we'll never encounter this
> > in the wild, but you never know. If we do encounter it, we can
> > come up with another solution.
>

Sticking yourselves on sys.meta_path does complicate things since
sys.meta_path is explicitly for handling alternative storage solutions for
modules, i.e., built-in modules, frozen modules, and file-based storage.
What it sounds like you're trying to do is wrap the normal file-based
import stuff which is tough in Python 2 since that's all implicitly done.


> >
> > It's worth nothing that Python 3 has moved a lot of the importing
> > code from C to Python. Python 3 gives you near total control over
> > the import mechanism. So in the very distant future when Mercurial
> > drops Python 2 support, it's likely that our custom importer code
> > can be refactored to something a bit saner.
>

Not sure what your custom importer does, but if it exists for lazy loading
then I already implemented it for you in Python 3.5:
https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader

-Brett


> >
> > diff --git a/mercurial/__init__.py b/mercurial/__init__.py
> > --- a/mercurial/__init__.py
> > +++ b/mercurial/__init__.py
> > @@ -5,16 +5,17 @@
> >  # This software may be used and distributed according to the terms of
> the
> >  # GNU General Public License version 2 or any later version.
> >
> >  from __future__ import absolute_import
> >
> >  import imp
> >  import os
> >  import sys
> > +import zipimport
> >
> >  __all__ = []
> >
> >  # Rules for how modules can be loaded. Values are:
> >  #
> >  #    c - require C extensions
> >  #    allow - allow pure Python implementation when C loading fails
> >  #    py - only load pure Python modules
> > @@ -55,16 +56,46 @@ class hgimporter(object):
> >
> >      def load_module(self, name):
> >          mod = sys.modules.get(name, None)
> >          if mod:
> >              return mod
> >
> >          mercurial = sys.modules['mercurial']
> >
> > +        # The zip importer behaves sufficiently differently from the
> default
> > +        # importer to warrant its own code path.
> > +        loader = getattr(mercurial, '__loader__', None)
> > +        if isinstance(loader, zipimport.zipimporter):
> > +            def ziploader(*paths):
> > +                """Obtain a zipimporter for a directory under the main
> zip."""
> > +                path = os.path.join(loader.archive, *paths)
> > +                zl = sys.path_importer_cache.get(path)
> > +                if not zl:
> > +                    zl = zipimport.zipimporter(path)
> > +                return zl
> > +
> > +            try:
> > +                if modulepolicy == 'py':
> > +                    raise ImportError()
> > +
> > +                zl = ziploader('mercurial')
> > +                mod = zl.load_module(name)
> > +                # Unlike imp, ziploader doesn't expose module metadata
> that
> > +                # indicates the type of module. So just assume what we
> found
> > +                # is OK (even though it could be a pure Python module).
> > +            except ImportError:
> > +                if modulepolicy == 'c':
> > +                    raise
> > +                zl = ziploader('mercurial', 'pure')
> > +                mod = zl.load_module(name)
> > +
> > +            sys.modules[name] = mod
> > +            return mod
> > +
> >          # Unlike the default importer which searches special locations
> and
> >          # sys.path, we only look in the directory where "mercurial" was
> >          # imported from.
> >
> >          # imp.find_module doesn't support submodules (modules with ".").
> >          # Instead you have to pass the parent package's __path__
> attribute
> >          # as the path argument.
> >          stem = name.split('.')[-1]
> > _______________________________________________
> > Mercurial-devel mailing list
> > Mercurial-devel@selenic.com
> > https://selenic.com/mailman/listinfo/mercurial-devel
>
Gregory Szorc - Dec. 4, 2015, 6:50 p.m.
On Fri, Dec 4, 2015 at 10:27 AM, Brett Cannon <brett@python.org> wrote:

>
>
> On Fri, 4 Dec 2015 at 07:13 Augie Fackler <raf@durin42.com> wrote:
>
>> (+brett, who might have some idea about import behavior and is
>> probably going to be interested in this change anyway.)
>>
>> On Thu, Dec 03, 2015 at 09:51:12PM -0800, Gregory Szorc wrote:
>> > # HG changeset patch
>> > # User Gregory Szorc <gregory.szorc@gmail.com>
>> > # Date 1449206705 28800
>> > #      Thu Dec 03 21:25:05 2015 -0800
>> > # Node ID 782783ea499c7e03253b55f0b0c221b9946356e2
>> > # Parent  dd8b23489d2a5f5580905c995333d139fe457d1c
>> > mercurial: support loading modules from zipimporter
>> >
>> > The previous refactor to module importing broke module loading when
>> > mercurial.* modules were loaded from a zipfile (using a zipimporter).
>> > This scenario is likely encountered when using py2exe.
>> >
>> > Supporting zipimporter and the traditional importer side-by-side
>> > turns out to be quite a pain. In Python 2.x, the standard, file-based
>> > import mechanism is partially implemented in C. The sys.meta_path
>> > and sys.path_hooks hook points exist to allow custom importers in
>> > Python/userland. zipimport.zipimporter and our "hgimporter" class
>> > from earlier in this patch series are 2 of these.
>> >
>> > In a standard Python installation (no matter if running in py2exe
>> > or similar or not), zipimport.zipimporter appears to be registered
>> > in sys.path_hooks. This means that as each sys.path entry is
>> > consulted, it will ask zipimporter if it supports that path and
>> > zipimporter will be used if that entry is a zip file. In a
>> > py2exe environment, sys.path contains an entry with the path to
>> > the zip file containing the Python standard library along with
>> > Mercurial's Python files.
>> >
>> > The way the importer mechanism works is the first importer that
>> > declares knowledge of a module (via find_module() returning an
>> > object) gets to load it. Since our "hgimporter" is registered
>> > in sys.meta_path and returns an interest in specific mercurial.*
>> > modules, the zipimporter registered on sys.path_hooks never comes
>> > into play for these modules. So, we need to be zipimporter aware
>> > and call into zipimporter to load modules.
>> >
>> > This patch teaches "hgimporter" how to call out into zipimporter
>> > when necessary. We detect the necessity of zipimporter by looking
>> > at the loader for the "mercurial" module. If it is a zipimporter
>> > instance, we load via zipimporter.
>> >
>> > The behavior of zipimporter is a bit wonky.
>>
>> Patch is already queued, but Brett, do you have any insight into why
>> zipimport works this way? (see below)
>>
>> >
>> > You appear to need separate zipimporter instances for each directory
>> > in the zip file. I'm not sure why this is.
>
>
> sys.path_importer_cache is the most likely culprit for structuring it this
> way, but otherwise it's just the way someone chose to implement it (the
> other approach would have been to cache a single importer per zipfile in
> the sys.path_hooks object and then have the finders and loaders be
> basically a single instance per zipfile).
>
> FYI we are going to rewrite zipimporter in 3.6 or 3.7.
>
> I suspect it has
>> > something to do with the low-level importing mechanism (implemented
>> > in C) operating on a per-directory basis. PEP-302 makes some
>> > references to this. I was not able to get a zipimporter to
>> > import modules outside of its immediate directory no matter how
>> > I specified the module name. This is why we use separate
>> > zipimporter instances for the ".zip/mercurial" and
>> > ".zip/mercurial/pure" locations.
>> >
>> > The zipimporter documentation for Python 2.7 explicitly states that
>> > zipimporter does not import dynamic modules (C extensions). Yet from
>> > a py2exe distribution on Windows - where the .pyd files are *not*
>> > in the zip archive - zipimporter imported these dynamic modules
>> > just fine!
>
>
> That doesn't work on UNIX which is why the docs say extension modules are
> not supported.
>
>
>> I'm not sure if dynamic modules can't be imported from
>> > *inside* the zip archive or whether zipimporter looks for dynamic
>> > modules outside the zip archive. All I know is zipimporter does
>> > manage to import the .pyd files on Windows and this patch makes
>> > our new importer compatible with py2exe.
>> >
>> > In the ideal world, We'd probably reimplement or fall back to parts
>> > of the built-in import mechanism instead of handling zipimporter
>> > specially. After all, if we're loading Mercurial modules via
>> > something that isn't the built-in file-based importer or zipimporter,
>> > our custom importer will likely fail because it doesn't know how to
>> > call into it. I'd like to think that we'll never encounter this
>> > in the wild, but you never know. If we do encounter it, we can
>> > come up with another solution.
>>
>
> Sticking yourselves on sys.meta_path does complicate things since
> sys.meta_path is explicitly for handling alternative storage solutions for
> modules, i.e., built-in modules, frozen modules, and file-based storage.
> What it sounds like you're trying to do is wrap the normal file-based
> import stuff which is tough in Python 2 since that's all implicitly done.
>
>
>> >
>> > It's worth nothing that Python 3 has moved a lot of the importing
>> > code from C to Python. Python 3 gives you near total control over
>> > the import mechanism. So in the very distant future when Mercurial
>> > drops Python 2 support, it's likely that our custom importer code
>> > can be refactored to something a bit saner.
>>
>
> Not sure what your custom importer does, but if it exists for lazy loading
> then I already implemented it for you in Python 3.5:
> https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader
>

Mercurial has C and Python implementations for a handful of modules. We
desire to run both forms from the same installation so e.g. CPython and
PyPy can share the same installation.

The custom importer (
http://hg.netv6.net/clowncopter/file/30a20167ae29/mercurial/__init__.py)
basically looks in multiple locations for these modules. Depending on the
environment, it will try to load modules from mercurial/* and
mercurial/pure/*.

We could probably move files around and use a custom function to obtain a
reference to the module (instead of "import" throughout all the code that
imports these modules). But this would have required rewriting a bunch of
code. I felt it was easier to employ importer hackery than to rewrite the
world.

I'm receptive to better ideas.
Brett Cannon - Dec. 4, 2015, 7:45 p.m.
On Fri, 4 Dec 2015 at 10:50 Gregory Szorc <gregory.szorc@gmail.com> wrote:

> On Fri, Dec 4, 2015 at 10:27 AM, Brett Cannon <brett@python.org> wrote:
>
>>
>>
>> On Fri, 4 Dec 2015 at 07:13 Augie Fackler <raf@durin42.com> wrote:
>>
>>> (+brett, who might have some idea about import behavior and is
>>> probably going to be interested in this change anyway.)
>>>
>>> On Thu, Dec 03, 2015 at 09:51:12PM -0800, Gregory Szorc wrote:
>>> > # HG changeset patch
>>> > # User Gregory Szorc <gregory.szorc@gmail.com>
>>> > # Date 1449206705 28800
>>> > #      Thu Dec 03 21:25:05 2015 -0800
>>> > # Node ID 782783ea499c7e03253b55f0b0c221b9946356e2
>>> > # Parent  dd8b23489d2a5f5580905c995333d139fe457d1c
>>> > mercurial: support loading modules from zipimporter
>>> >
>>> > The previous refactor to module importing broke module loading when
>>> > mercurial.* modules were loaded from a zipfile (using a zipimporter).
>>> > This scenario is likely encountered when using py2exe.
>>> >
>>> > Supporting zipimporter and the traditional importer side-by-side
>>> > turns out to be quite a pain. In Python 2.x, the standard, file-based
>>> > import mechanism is partially implemented in C. The sys.meta_path
>>> > and sys.path_hooks hook points exist to allow custom importers in
>>> > Python/userland. zipimport.zipimporter and our "hgimporter" class
>>> > from earlier in this patch series are 2 of these.
>>> >
>>> > In a standard Python installation (no matter if running in py2exe
>>> > or similar or not), zipimport.zipimporter appears to be registered
>>> > in sys.path_hooks. This means that as each sys.path entry is
>>> > consulted, it will ask zipimporter if it supports that path and
>>> > zipimporter will be used if that entry is a zip file. In a
>>> > py2exe environment, sys.path contains an entry with the path to
>>> > the zip file containing the Python standard library along with
>>> > Mercurial's Python files.
>>> >
>>> > The way the importer mechanism works is the first importer that
>>> > declares knowledge of a module (via find_module() returning an
>>> > object) gets to load it. Since our "hgimporter" is registered
>>> > in sys.meta_path and returns an interest in specific mercurial.*
>>> > modules, the zipimporter registered on sys.path_hooks never comes
>>> > into play for these modules. So, we need to be zipimporter aware
>>> > and call into zipimporter to load modules.
>>> >
>>> > This patch teaches "hgimporter" how to call out into zipimporter
>>> > when necessary. We detect the necessity of zipimporter by looking
>>> > at the loader for the "mercurial" module. If it is a zipimporter
>>> > instance, we load via zipimporter.
>>> >
>>> > The behavior of zipimporter is a bit wonky.
>>>
>>> Patch is already queued, but Brett, do you have any insight into why
>>> zipimport works this way? (see below)
>>>
>>> >
>>> > You appear to need separate zipimporter instances for each directory
>>> > in the zip file. I'm not sure why this is.
>>
>>
>> sys.path_importer_cache is the most likely culprit for structuring it
>> this way, but otherwise it's just the way someone chose to implement it
>> (the other approach would have been to cache a single importer per zipfile
>> in the sys.path_hooks object and then have the finders and loaders be
>> basically a single instance per zipfile).
>>
>> FYI we are going to rewrite zipimporter in 3.6 or 3.7.
>>
>> I suspect it has
>>> > something to do with the low-level importing mechanism (implemented
>>> > in C) operating on a per-directory basis. PEP-302 makes some
>>> > references to this. I was not able to get a zipimporter to
>>> > import modules outside of its immediate directory no matter how
>>> > I specified the module name. This is why we use separate
>>> > zipimporter instances for the ".zip/mercurial" and
>>> > ".zip/mercurial/pure" locations.
>>> >
>>> > The zipimporter documentation for Python 2.7 explicitly states that
>>> > zipimporter does not import dynamic modules (C extensions). Yet from
>>> > a py2exe distribution on Windows - where the .pyd files are *not*
>>> > in the zip archive - zipimporter imported these dynamic modules
>>> > just fine!
>>
>>
>> That doesn't work on UNIX which is why the docs say extension modules are
>> not supported.
>>
>>
>>> I'm not sure if dynamic modules can't be imported from
>>> > *inside* the zip archive or whether zipimporter looks for dynamic
>>> > modules outside the zip archive. All I know is zipimporter does
>>> > manage to import the .pyd files on Windows and this patch makes
>>> > our new importer compatible with py2exe.
>>> >
>>> > In the ideal world, We'd probably reimplement or fall back to parts
>>> > of the built-in import mechanism instead of handling zipimporter
>>> > specially. After all, if we're loading Mercurial modules via
>>> > something that isn't the built-in file-based importer or zipimporter,
>>> > our custom importer will likely fail because it doesn't know how to
>>> > call into it. I'd like to think that we'll never encounter this
>>> > in the wild, but you never know. If we do encounter it, we can
>>> > come up with another solution.
>>>
>>
>> Sticking yourselves on sys.meta_path does complicate things since
>> sys.meta_path is explicitly for handling alternative storage solutions for
>> modules, i.e., built-in modules, frozen modules, and file-based storage.
>> What it sounds like you're trying to do is wrap the normal file-based
>> import stuff which is tough in Python 2 since that's all implicitly done.
>>
>>
>>> >
>>> > It's worth nothing that Python 3 has moved a lot of the importing
>>> > code from C to Python. Python 3 gives you near total control over
>>> > the import mechanism. So in the very distant future when Mercurial
>>> > drops Python 2 support, it's likely that our custom importer code
>>> > can be refactored to something a bit saner.
>>>
>>
>> Not sure what your custom importer does, but if it exists for lazy
>> loading then I already implemented it for you in Python 3.5:
>> https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader
>>
>
> Mercurial has C and Python implementations for a handful of modules. We
> desire to run both forms from the same installation so e.g. CPython and
> PyPy can share the same installation.
>
> The custom importer (
> http://hg.netv6.net/clowncopter/file/30a20167ae29/mercurial/__init__.py)
> basically looks in multiple locations for these modules. Depending on the
> environment, it will try to load modules from mercurial/* and
> mercurial/pure/*.
>
> We could probably move files around and use a custom function to obtain a
> reference to the module (instead of "import" throughout all the code that
> imports these modules). But this would have required rewriting a bunch of
> code. I felt it was easier to employ importer hackery than to rewrite the
> world.
>
> I'm receptive to better ideas.
>

If they are drop-in replacements for each other why can't you just give
them the same name and then simply let the priority of extension modules
over source modules handle it?

The other option is to make the extension modules act as accelerators and
follow the common idiom of always importing the source module and at the
bottom of it doing:

  try:
    from _mod import *
  except ImportError:
    pass

where _mod is the extension module. This is how we make the stdlib not
require extension modules when they are used purely for speed purposes.
Gregory Szorc - Dec. 4, 2015, 8:08 p.m.
On Fri, Dec 4, 2015 at 11:45 AM, Brett Cannon <brett@python.org> wrote:

>
>
> On Fri, 4 Dec 2015 at 10:50 Gregory Szorc <gregory.szorc@gmail.com> wrote:
>
>> On Fri, Dec 4, 2015 at 10:27 AM, Brett Cannon <brett@python.org> wrote:
>>
>>>
>>>
>>> On Fri, 4 Dec 2015 at 07:13 Augie Fackler <raf@durin42.com> wrote:
>>>
>>>> (+brett, who might have some idea about import behavior and is
>>>> probably going to be interested in this change anyway.)
>>>>
>>>> On Thu, Dec 03, 2015 at 09:51:12PM -0800, Gregory Szorc wrote:
>>>> > # HG changeset patch
>>>> > # User Gregory Szorc <gregory.szorc@gmail.com>
>>>> > # Date 1449206705 28800
>>>> > #      Thu Dec 03 21:25:05 2015 -0800
>>>> > # Node ID 782783ea499c7e03253b55f0b0c221b9946356e2
>>>> > # Parent  dd8b23489d2a5f5580905c995333d139fe457d1c
>>>> > mercurial: support loading modules from zipimporter
>>>> >
>>>> > The previous refactor to module importing broke module loading when
>>>> > mercurial.* modules were loaded from a zipfile (using a zipimporter).
>>>> > This scenario is likely encountered when using py2exe.
>>>> >
>>>> > Supporting zipimporter and the traditional importer side-by-side
>>>> > turns out to be quite a pain. In Python 2.x, the standard, file-based
>>>> > import mechanism is partially implemented in C. The sys.meta_path
>>>> > and sys.path_hooks hook points exist to allow custom importers in
>>>> > Python/userland. zipimport.zipimporter and our "hgimporter" class
>>>> > from earlier in this patch series are 2 of these.
>>>> >
>>>> > In a standard Python installation (no matter if running in py2exe
>>>> > or similar or not), zipimport.zipimporter appears to be registered
>>>> > in sys.path_hooks. This means that as each sys.path entry is
>>>> > consulted, it will ask zipimporter if it supports that path and
>>>> > zipimporter will be used if that entry is a zip file. In a
>>>> > py2exe environment, sys.path contains an entry with the path to
>>>> > the zip file containing the Python standard library along with
>>>> > Mercurial's Python files.
>>>> >
>>>> > The way the importer mechanism works is the first importer that
>>>> > declares knowledge of a module (via find_module() returning an
>>>> > object) gets to load it. Since our "hgimporter" is registered
>>>> > in sys.meta_path and returns an interest in specific mercurial.*
>>>> > modules, the zipimporter registered on sys.path_hooks never comes
>>>> > into play for these modules. So, we need to be zipimporter aware
>>>> > and call into zipimporter to load modules.
>>>> >
>>>> > This patch teaches "hgimporter" how to call out into zipimporter
>>>> > when necessary. We detect the necessity of zipimporter by looking
>>>> > at the loader for the "mercurial" module. If it is a zipimporter
>>>> > instance, we load via zipimporter.
>>>> >
>>>> > The behavior of zipimporter is a bit wonky.
>>>>
>>>> Patch is already queued, but Brett, do you have any insight into why
>>>> zipimport works this way? (see below)
>>>>
>>>> >
>>>> > You appear to need separate zipimporter instances for each directory
>>>> > in the zip file. I'm not sure why this is.
>>>
>>>
>>> sys.path_importer_cache is the most likely culprit for structuring it
>>> this way, but otherwise it's just the way someone chose to implement it
>>> (the other approach would have been to cache a single importer per zipfile
>>> in the sys.path_hooks object and then have the finders and loaders be
>>> basically a single instance per zipfile).
>>>
>>> FYI we are going to rewrite zipimporter in 3.6 or 3.7.
>>>
>>> I suspect it has
>>>> > something to do with the low-level importing mechanism (implemented
>>>> > in C) operating on a per-directory basis. PEP-302 makes some
>>>> > references to this. I was not able to get a zipimporter to
>>>> > import modules outside of its immediate directory no matter how
>>>> > I specified the module name. This is why we use separate
>>>> > zipimporter instances for the ".zip/mercurial" and
>>>> > ".zip/mercurial/pure" locations.
>>>> >
>>>> > The zipimporter documentation for Python 2.7 explicitly states that
>>>> > zipimporter does not import dynamic modules (C extensions). Yet from
>>>> > a py2exe distribution on Windows - where the .pyd files are *not*
>>>> > in the zip archive - zipimporter imported these dynamic modules
>>>> > just fine!
>>>
>>>
>>> That doesn't work on UNIX which is why the docs say extension modules
>>> are not supported.
>>>
>>>
>>>> I'm not sure if dynamic modules can't be imported from
>>>> > *inside* the zip archive or whether zipimporter looks for dynamic
>>>> > modules outside the zip archive. All I know is zipimporter does
>>>> > manage to import the .pyd files on Windows and this patch makes
>>>> > our new importer compatible with py2exe.
>>>> >
>>>> > In the ideal world, We'd probably reimplement or fall back to parts
>>>> > of the built-in import mechanism instead of handling zipimporter
>>>> > specially. After all, if we're loading Mercurial modules via
>>>> > something that isn't the built-in file-based importer or zipimporter,
>>>> > our custom importer will likely fail because it doesn't know how to
>>>> > call into it. I'd like to think that we'll never encounter this
>>>> > in the wild, but you never know. If we do encounter it, we can
>>>> > come up with another solution.
>>>>
>>>
>>> Sticking yourselves on sys.meta_path does complicate things since
>>> sys.meta_path is explicitly for handling alternative storage solutions for
>>> modules, i.e., built-in modules, frozen modules, and file-based storage.
>>> What it sounds like you're trying to do is wrap the normal file-based
>>> import stuff which is tough in Python 2 since that's all implicitly done.
>>>
>>>
>>>> >
>>>> > It's worth nothing that Python 3 has moved a lot of the importing
>>>> > code from C to Python. Python 3 gives you near total control over
>>>> > the import mechanism. So in the very distant future when Mercurial
>>>> > drops Python 2 support, it's likely that our custom importer code
>>>> > can be refactored to something a bit saner.
>>>>
>>>
>>> Not sure what your custom importer does, but if it exists for lazy
>>> loading then I already implemented it for you in Python 3.5:
>>> https://docs.python.org/3/library/importlib.html#importlib.util.LazyLoader
>>>
>>
>> Mercurial has C and Python implementations for a handful of modules. We
>> desire to run both forms from the same installation so e.g. CPython and
>> PyPy can share the same installation.
>>
>> The custom importer (
>> http://hg.netv6.net/clowncopter/file/30a20167ae29/mercurial/__init__.py)
>> basically looks in multiple locations for these modules. Depending on the
>> environment, it will try to load modules from mercurial/* and
>> mercurial/pure/*.
>>
>> We could probably move files around and use a custom function to obtain a
>> reference to the module (instead of "import" throughout all the code that
>> imports these modules). But this would have required rewriting a bunch of
>> code. I felt it was easier to employ importer hackery than to rewrite the
>> world.
>>
>> I'm receptive to better ideas.
>>
>
> If they are drop-in replacements for each other why can't you just give
> them the same name and then simply let the priority of extension modules
> over source modules handle it?
>
> The other option is to make the extension modules act as accelerators and
> follow the common idiom of always importing the source module and at the
> bottom of it doing:
>
>   try:
>     from _mod import *
>   except ImportError:
>     pass
>
> where _mod is the extension module. This is how we make the stdlib not
> require extension modules when they are used purely for speed purposes.
>

That's a good idea. We should consider this for the next refactor, whenever
that may be.

Patch

diff --git a/mercurial/__init__.py b/mercurial/__init__.py
--- a/mercurial/__init__.py
+++ b/mercurial/__init__.py
@@ -5,16 +5,17 @@ 
 # This software may be used and distributed according to the terms of the
 # GNU General Public License version 2 or any later version.
 
 from __future__ import absolute_import
 
 import imp
 import os
 import sys
+import zipimport
 
 __all__ = []
 
 # Rules for how modules can be loaded. Values are:
 #
 #    c - require C extensions
 #    allow - allow pure Python implementation when C loading fails
 #    py - only load pure Python modules
@@ -55,16 +56,46 @@  class hgimporter(object):
 
     def load_module(self, name):
         mod = sys.modules.get(name, None)
         if mod:
             return mod
 
         mercurial = sys.modules['mercurial']
 
+        # The zip importer behaves sufficiently differently from the default
+        # importer to warrant its own code path.
+        loader = getattr(mercurial, '__loader__', None)
+        if isinstance(loader, zipimport.zipimporter):
+            def ziploader(*paths):
+                """Obtain a zipimporter for a directory under the main zip."""
+                path = os.path.join(loader.archive, *paths)
+                zl = sys.path_importer_cache.get(path)
+                if not zl:
+                    zl = zipimport.zipimporter(path)
+                return zl
+
+            try:
+                if modulepolicy == 'py':
+                    raise ImportError()
+
+                zl = ziploader('mercurial')
+                mod = zl.load_module(name)
+                # Unlike imp, ziploader doesn't expose module metadata that
+                # indicates the type of module. So just assume what we found
+                # is OK (even though it could be a pure Python module).
+            except ImportError:
+                if modulepolicy == 'c':
+                    raise
+                zl = ziploader('mercurial', 'pure')
+                mod = zl.load_module(name)
+
+            sys.modules[name] = mod
+            return mod
+
         # Unlike the default importer which searches special locations and
         # sys.path, we only look in the directory where "mercurial" was
         # imported from.
 
         # imp.find_module doesn't support submodules (modules with ".").
         # Instead you have to pass the parent package's __path__ attribute
         # as the path argument.
         stem = name.split('.')[-1]