Patchwork [1,of,6,import-refactor,V3] mercurial: implement import hook for handling C/Python modules

login
register
mail settings
Submitter Gregory Szorc
Date Dec. 4, 2015, 5:51 a.m.
Message ID <6639f74c47892ea8ba2f.1449208267@ubuntu-main>
Download mbox | patch
Permalink /patch/11799/
State Accepted
Delegated to: Yuya Nishihara
Headers show

Comments

Gregory Szorc - Dec. 4, 2015, 5:51 a.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1449207421 28800
#      Thu Dec 03 21:37:01 2015 -0800
# Node ID 6639f74c47892ea8ba2f4b1deb2100bf0d4a6454
# Parent  71aa5a26162d6e4a165b68f07b331e3e2eedc117
mercurial: implement import hook for handling C/Python modules

There are a handful of modules that have both pure Python and C
extension implementations. Currently, setup.py copies files from
mercurial/pure/*.py to mercurial/ during the install process if C
extensions are not available. This way, "import mercurial.X" will
work whether C extensions are available or not.

This approach has a few drawbacks. First, there aren't run-time checks
verifying the C extensions are loaded when they should be. This could
lead to accidental use of the slower pure Python modules. Second, the
C extensions aren't compatible with PyPy and running Mercurial with
PyPy requires installing Mercurial - you can't run ./hg from a source
checkout. This makes developing while running PyPy somewhat difficult.

This patch implements a PEP-302 import hook for finding and loading the
modules with both C and Python implementations. When a module with dual
implementations is requested for import, its import is handled by our
import hook.

The importer has a mechanism that controls what types of modules we
allow to load. We call this loading behavior the "module load policy."
There are 3 settings:

* Only load C extensions
* Only load pure Python
* Try to load C and fall back to Python

An environment variable allows overriding this policy at run time. This
is mainly useful for developers and for performing actions against the
source checkout (such as installing), which require overriding the
default (strict) policy about requiring C extensions.

The default mode for now is to allow both. This isn't proper and is
technically backwards incompatible. However, it is necessary to
implement a sane patch series that doesn't break the world during
future bisections. The behavior will be corrected in future patch.

We choose the main mercurial/__init__.py module for this code out of
necessity: in a future world, if the custom module importer isn't
registered, we'll fail to find/import certain modules when running
from a pure installation. Without the magical import-time side-effects,
*any* importer of mercurial.* modules would be required to call a
function to register our importer. I'm not a fan of import time side
effects and I initially attempted to do this. However, I was foiled by
our own test harness, which has numerous `python` invoked scripts that
"import mercurial" and fail because the importer isn't registered.
Realizing this problem is probably present in random Python scripts
that have been written over the years, I decided that sacrificing
purity for backwards compatibility is necessary. Plus, if you are
programming Python, "import" should probably "just work."

It's worth noting that now that we have a custom module loader, it
would be possible to hook up demand module proxies at this level
instead of replacing __import__. We leave this work for another time,
if it's even desired.

This patch breaks importing in environments where Mercurial modules
are loaded from a zip file (such as py2exe distributions). This will
be addressed in a subsequent patch.

Patch

diff --git a/mercurial/__init__.py b/mercurial/__init__.py
--- a/mercurial/__init__.py
+++ b/mercurial/__init__.py
@@ -0,0 +1,109 @@ 
+# __init__.py - Startup and module loading logic for Mercurial.
+#
+# Copyright 2015 Gregory Szorc <gregory.szorc@gmail.com>
+#
+# This software may be used and distributed according to the terms of the
+# GNU General Public License version 2 or any later version.
+
+from __future__ import absolute_import
+
+import imp
+import os
+import sys
+
+__all__ = []
+
+# Rules for how modules can be loaded. Values are:
+#
+#    c - require C extensions
+#    allow - allow pure Python implementation when C loading fails
+#    py - only load pure Python modules
+modulepolicy = '@MODULELOADPOLICY@'
+
+# By default, require the C extensions for performance reasons.
+if modulepolicy == '@' 'MODULELOADPOLICY' '@':
+    # TODO change to 'c' once installer is changed.
+    modulepolicy = 'allow'
+
+# Environment variable can always force settings.
+modulepolicy = os.environ.get('HGMODULEPOLICY', modulepolicy)
+
+# Modules that have both Python and C implementations. See also the
+# set of .py files under mercurial/pure/.
+_dualmodules = set([
+    'mercurial.base85',
+    'mercurial.bdiff',
+    'mercurial.diffhelpers',
+    'mercurial.mpatch',
+    'mercurial.osutil',
+    'mercurial.parsers',
+])
+
+class hgimporter(object):
+    """Object that conforms to import hook interface defined in PEP-302."""
+    def find_module(self, name, path=None):
+        # We only care about modules that have both C and pure implementations.
+        if name in _dualmodules:
+            return self
+        return None
+
+    def load_module(self, name):
+        mod = sys.modules.get(name, None)
+        if mod:
+            return mod
+
+        mercurial = sys.modules['mercurial']
+
+        # Unlike the default importer which searches special locations and
+        # sys.path, we only look in the directory where "mercurial" was
+        # imported from.
+
+        # imp.find_module doesn't support submodules (modules with ".").
+        # Instead you have to pass the parent package's __path__ attribute
+        # as the path argument.
+        stem = name.split('.')[-1]
+
+        try:
+            if modulepolicy == 'py':
+                raise ImportError()
+
+            modinfo = imp.find_module(stem, mercurial.__path__)
+
+            # The Mercurial installer used to copy files from
+            # mercurial/pure/*.py to mercurial/*.py. Therefore, it's possible
+            # for some installations to have .py files under mercurial/*.
+            # Loading Python modules when we expected C versions could result
+            # in a) poor performance b) loading a version from a previous
+            # Mercurial version, potentially leading to incompatibility. Either
+            # scenario is bad. So we verify that modules loaded from
+            # mercurial/* are C extensions. If the current policy allows the
+            # loading of .py modules, the module will be re-imported from
+            # mercurial/pure/* below.
+            # TODO uncomment once setup.py is updated to actually install
+            # into mercurial/pure.
+            #if modinfo[2][2] != imp.C_EXTENSION:
+            #    raise ImportError('.py version of %s found where C '
+            #                      'version should exist' % name)
+
+        except ImportError:
+            if modulepolicy == 'c':
+                raise
+
+            # Could not load the C extension and pure Python is allowed. So
+            # try to load them.
+            from . import pure
+            modinfo = imp.find_module(stem, pure.__path__)
+            if not modinfo:
+                raise ImportError('could not find mercurial module %s' %
+                                  name)
+
+        mod = imp.load_module(name, *modinfo)
+        sys.modules[name] = mod
+        return mod
+
+# We automagically register our custom importer as a side-effect of loading.
+# This is necessary to ensure that any entry points are able to import
+# mercurial.* modules without having to perform this registration themselves.
+if not any(isinstance(x, hgimporter) for x in sys.meta_path):
+    # meta_path is used before any implicit finders and before sys.path.
+    sys.meta_path.insert(0, hgimporter())
diff --git a/setup.py b/setup.py
--- a/setup.py
+++ b/setup.py
@@ -172,21 +172,19 @@  def runhg(cmd, env):
     if err:
         printf("stderr from '%s':" % (' '.join(cmd)), file=sys.stderr)
         printf(b('\n').join([b('  ') + e for e in err]), file=sys.stderr)
         return ''
     return out
 
 version = ''
 
-# Execute hg out of this directory with a custom environment which
-# includes the pure Python modules in mercurial/pure. We also take
-# care to not use any hgrc files and do no localization.
-pypath = ['mercurial', os.path.join('mercurial', 'pure')]
-env = {'PYTHONPATH': os.pathsep.join(pypath),
+# Execute hg out of this directory with a custom environment which takes care
+# to not use any hgrc files and do no localization.
+env = {'HGMODULEPOLICY': 'py',
        'HGRCPATH': '',
        'LANGUAGE': 'C'}
 if 'LD_LIBRARY_PATH' in os.environ:
     env['LD_LIBRARY_PATH'] = os.environ['LD_LIBRARY_PATH']
 if 'SystemRoot' in os.environ:
     # Copy SystemRoot into the custom environment for Python 2.6
     # under Windows. Otherwise, the subprocess will fail with
     # error 0xc0150004. See: http://bugs.python.org/issue3440