Patchwork [4,of,4,V3] demandimport: replace more references to _demandmod instances

login
register
mail settings
Submitter Gregory Szorc
Date Oct. 3, 2015, 10:56 p.m.
Message ID <6e46ca7b091914c1ab8d.1443912973@126.1.168.192.in-addr.arpa>
Download mbox | patch
Permalink /patch/10766/
State Superseded
Commit 7e81305092a05622fe952b3b5c5f3df70ae26a32
Headers show

Comments

Gregory Szorc - Oct. 3, 2015, 10:56 p.m.
# HG changeset patch
# User Gregory Szorc <gregory.szorc@gmail.com>
# Date 1443912785 25200
#      Sat Oct 03 15:53:05 2015 -0700
# Node ID 6e46ca7b091914c1ab8d9019e04f4c4792c5da12
# Parent  9c85b9664b324a18b78e252526285dfb0a39c161
demandimport: replace more references to _demandmod instances

_demandmod instances may be referenced by multiple importing modules.
Before this patch, the _demandmod instance only maintained a reference
to its first consumer when using the "from X import Y" syntax. This is
because we only created a single _demandmod instance (attached to the
parent X module). If multiple modules A and B performed
"from X import Y", we'd produce a single _demandmod instance
"demandmod" with the following references:

  X.Y = <demandmod>
  A.Y = <demandmod>
  B.Y = <demandmod>

The locals from the first consumer (A) would be stored in <demandmod1>.
When <demandmod1> was loaded, we'd look at the locals for the first
consumer and replace the symbol, if necessary. This resulted in state:

  X.Y = <module>
  A.Y = <module>
  B.Y = <demandmod>

B's reference to Y wasn't updated and was still using the proxy object
because we just didn't record that B had a reference to <demandmod> that
needed updating!

With this patch, we add support for tracking which modules in addition
to the initial importer have a reference to the _demandmod instance and
we replace those references at module load time.

In the case of posix.py, this fixes an issue where the "encoding" module
was being proxied, resulting in hundreds of thousands of
__getattribute__ lookups on the _demandmod instance during dirstate
operations on mozilla-central, speeding up execution by 50-100ms. There
are likely several other operation that benefit from this change as
well.

It's worth noting that this sub-optimal behavior was made worse by the
introduction of absolute_import and its recommended "from . import X"
syntax for importing modules from the "mercurial" package.
Pierre-Yves David - Oct. 3, 2015, 11:47 p.m.
On 10/03/2015 03:56 PM, Gregory Szorc wrote:
> # HG changeset patch
> # User Gregory Szorc <gregory.szorc@gmail.com>
> # Date 1443912785 25200
> #      Sat Oct 03 15:53:05 2015 -0700
> # Node ID 6e46ca7b091914c1ab8d9019e04f4c4792c5da12
> # Parent  9c85b9664b324a18b78e252526285dfb0a39c161
> demandimport: replace more references to _demandmod instances

It seems like the big win of this series are patch 3-4. Can we get them 
first and see if patch 1-2 is still valuable after that?

Patch

diff --git a/mercurial/demandimport.py b/mercurial/demandimport.py
--- a/mercurial/demandimport.py
+++ b/mercurial/demandimport.py
@@ -72,17 +72,25 @@  class _demandmod(object):
         else:
             head = name
             after = []
         object.__setattr__(self, "_data",
-                           (head, globals, locals, after, level))
+                           (head, globals, locals, after, level, set()))
         object.__setattr__(self, "_module", None)
     def _extend(self, name):
         """add to the list of submodules to load"""
         object.__getattribute__(self, '_data')[3].append(name)
+
+    def _addref(self, name):
+        """Record that the named module ``name`` imports this module.
+
+        References to this proxy class will be replaced at module load time.
+        """
+        object.__getattribute__(self, '_data')[5].add(name)
+
     def _load(self):
         mod = object.__getattribute__(self, '_module')
         if not mod:
-            head, globals, locals, after, level = \
+            head, globals, locals, after, level, modrefs = \
                     object.__getattribute__(self, '_data')
             mod = _hgextimport(_import, head, globals, locals, None, level)
             # load submodules
             def subload(mod, p):
@@ -96,11 +104,20 @@  class _demandmod(object):
 
             for x in after:
                 subload(mod, x)
 
-            # are we in the locals dictionary still?
+            # Replace references to this proxy instance with the actual module.
             if locals and locals.get(head) == self:
                 locals[head] = mod
+
+            for modname in modrefs:
+                # We use module names rather than objects because otherwise
+                # we'd have to update accounting on _demandmod instances that
+                # are tracking submodules. This is much simpler.
+                modref = sys.modules.get(modname, None)
+                if modref and getattr(modref, head, None) == self:
+                    setattr(modref, head, mod)
+
             object.__setattr__(self, "_module", mod)
 
         return mod
 
@@ -110,9 +127,9 @@  class _demandmod(object):
         return "<unloaded module '%s'>" % self._data[0]
     def __call__(self, *args, **kwargs):
         raise TypeError("%s object is not callable" % repr(self))
     def __getattribute__(self, attr):
-        if attr in ('_data', '_extend', '_load', '_module'):
+        if attr in ('_data', '_extend', '_load', '_module', '_addref'):
             return object.__getattribute__(self, attr)
         mod = object.__getattribute__(self, '_load')()
         return getattr(mod, attr)
     def __setattr__(self, attr, val):
@@ -146,14 +163,29 @@  def _demandimport(name, globals=None, lo
         # level >= 0: absolute only (Python 2 w/ absolute_import and Python 3).
         # The modern Mercurial convention is to use absolute_import everywhere,
         # so modern Mercurial code will have level >= 0.
 
+        # The name of the module the import statement is located in.
+        globalname = globals.get('__name__')
+
         def processfromitem(mod, attr, **kwargs):
             """Process an imported symbol in the import statement.
 
             If the symbol doesn't exist in the parent module, it must be a
             module. We set missing modules up as _demandmod instances.
             """
+            symbol = getattr(mod, attr, nothing)
+            if symbol is nothing:
+                symbol = _demandmod(attr, mod.__dict__, locals, **kwargs)
+                setattr(mod, attr, symbol)
+
+            # Record the importing module references this symbol so we can
+            # replace the symbol with the actual module instance at load
+            # time.
+            if globalname and isinstance(symbol, _demandmod):
+                object.__getattribute__(symbol, '_addref')(globalname)
+
+
             if getattr(mod, attr, nothing) is nothing:
                 setattr(mod, attr,
                         _demandmod(attr, mod.__dict__, locals, **kwargs))
 
@@ -167,9 +199,8 @@  def _demandimport(name, globals=None, lo
                 return _hgextimport(_origimport, name, globals, locals,
                                     fromlist, level)
 
             mod = _hgextimport(_origimport, name, globals, locals, level=level)
-
             for x in fromlist:
                 processfromitem(mod, x, level=level)
 
             return mod