Patchwork [2,of,4,V2] import-checker: make imported_modules yield absolute dotted_name_of_path

login
register
mail settings
Submitter Katsunori FUJIWARA
Date May 15, 2015, 2:13 p.m.
Message ID <ab820b3793cc9a2680cc.1431699226@juju>
Download mbox | patch
Permalink /patch/9095/
State Accepted
Commit e350a4f6c9746f28654ea542154a7008498a3f66
Delegated to: Augie Fackler
Headers show

Comments

Katsunori FUJIWARA - May 15, 2015, 2:13 p.m.
# HG changeset patch
# User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
# Date 1431699038 -32400
#      Fri May 15 23:10:38 2015 +0900
# Node ID ab820b3793cc9a2680cc90f11fd760750c276adb
# Parent  d7a3d39a03eccc94a164ba44704452d7233c71d5
import-checker: make imported_modules yield absolute dotted_name_of_path

This patch makes "imported_modules()" always yield absolute
"dotted_name_of_path()"-ed name by strict detection with
"fromlocal()".

This change improves circular detection in some points:

  - locally defined modules, of which name collides against one of
    standard library, can be examined correctly

    for example, circular import related to "commands" is overlooked
    before this patch.

  - names not useful for circular detection are ignored

    for example, names below are also yielded before this patch:

    - module names of standard library (= not locally defined one)
    - non-module names (e.g. "node.nullid" of "from node import nullid")

    these redundant names decrease performance of circular detection.

  - "__init__" can be handled correctly in "checkmod()"

    for example, current implementation has problems below:

      - "from xxx import yyy" doesn't recognize "xxx.__init__" as imported

      - "xxx.__init__" imported via "import xxx" is treated as "xxx",
        and circular detection is aborted, because "key" of such
        module name is not "xxx" but "xxx.__init__"

  - it is easy to enhance for "from . import xxx" style or so (in the
    future)

    module name detection in "imported_modules()" can use information
    in "ast.ImportFrom" fully.

It is assumed that all locally defined modules are correctly specified
to "import-checker.py" at once.

Strictly speaking, when "from foo.bar.baz import module1" imports
"foo.bar.baz.module1" module, current "imported_modules()" yields only
"foo.bar.baz.__init__", even though also "foo.__init__" and
"foo.bar.__init__" should be yielded to detect circular import
exactly.

But this limitation is reasonable one for improvement in this patch,
because current "__init__" files in Mercurial seems to be implemented
carefully.
Augie Fackler - May 15, 2015, 9:33 p.m.
> On May 15, 2015, at 10:13, FUJIWARA Katsunori <foozy@lares.dti.ne.jp> wrote:
> 
> # HG changeset patch
> # User FUJIWARA Katsunori <foozy@lares.dti.ne.jp>
> # Date 1431699038 -32400
> #      Fri May 15 23:10:38 2015 +0900
> # Node ID ab820b3793cc9a2680cc90f11fd760750c276adb
> # Parent  d7a3d39a03eccc94a164ba44704452d7233c71d5
> import-checker: make imported_modules yield absolute dotted_name_of_path
> 
> This patch makes "imported_modules()" always yield absolute
> "dotted_name_of_path()"-ed name by strict detection with
> "fromlocal()".
> 
> This change improves circular detection in some points:
> 
>  - locally defined modules, of which name collides against one of
>    standard library, can be examined correctly
> 
>    for example, circular import related to "commands" is overlooked
>    before this patch.
> 
>  - names not useful for circular detection are ignored
> 
>    for example, names below are also yielded before this patch:
> 
>    - module names of standard library (= not locally defined one)
>    - non-module names (e.g. "node.nullid" of "from node import nullid")
> 
>    these redundant names decrease performance of circular detection.

So this patch makes things faster? (I think I parsed that correctly.)

> 
>  - "__init__" can be handled correctly in "checkmod()"
> 
>    for example, current implementation has problems below:
> 
>      - "from xxx import yyy" doesn't recognize "xxx.__init__" as imported
> 
>      - "xxx.__init__" imported via "import xxx" is treated as "xxx",
>        and circular detection is aborted, because "key" of such
>        module name is not "xxx" but "xxx.__init__"
> 
>  - it is easy to enhance for "from . import xxx" style or so (in the
>    future)
> 
>    module name detection in "imported_modules()" can use information
>    in "ast.ImportFrom" fully.
> 
> It is assumed that all locally defined modules are correctly specified
> to "import-checker.py" at once.
> 
> Strictly speaking, when "from foo.bar.baz import module1" imports
> "foo.bar.baz.module1" module, current "imported_modules()" yields only
> "foo.bar.baz.__init__", even though also "foo.__init__" and
> "foo.bar.__init__" should be yielded to detect circular import
> exactly.
> 
> But this limitation is reasonable one for improvement in this patch,
> because current "__init__" files in Mercurial seems to be implemented
> carefully.
> 
> diff --git a/contrib/import-checker.py b/contrib/import-checker.py
> --- a/contrib/import-checker.py
> +++ b/contrib/import-checker.py
> @@ -156,38 +156,94 @@ def list_stdlib_modules():
> 
> stdlib_modules = set(list_stdlib_modules())
> 
> -def imported_modules(source, ignore_nested=False):
> +def imported_modules(source, modulename, localmods, ignore_nested=False):
>     """Given the source of a file as a string, yield the names
>     imported by that file.
> 
>     Args:
>       source: The python source to examine as a string.
> +      modulename: of specified python source (may have `__init__`)
> +      localmods: dict of locally defined module names (may have `__init__`)
>       ignore_nested: If true, import statements that do not start in
>                      column zero will be ignored.
> 
>     Returns:
> -      A list of module names imported by the given source.
> -
> -    >>> sorted(imported_modules(
> -    ...         'import foo ; from baz import bar; import foo.qux'))
> -    ['baz.bar', 'foo', 'foo.qux']
> +      A list of absolute module names imported by the given source.
> +
> +    >>> modulename = 'foo.xxx'
> +    >>> localmods = {'foo.__init__': True,
> +    ...              'foo.foo1': True, 'foo.foo2': True,
> +    ...              'foo.bar.__init__': True, 'foo.bar.bar1': True,
> +    ...              'baz.__init__': True, 'baz.baz1': True }
> +    >>> # standard library (= not locally defined ones)
> +    >>> sorted(imported_modules(
> +    ...        'from stdlib1 import foo, bar; import stdlib2',
> +    ...        modulename, localmods))
> +    []
> +    >>> # relative importing
> +    >>> sorted(imported_modules(
> +    ...        'import foo1; from bar import bar1',
> +    ...        modulename, localmods))
> +    ['foo.bar.__init__', 'foo.bar.bar1', 'foo.foo1']
> +    >>> sorted(imported_modules(
> +    ...        'from bar.bar1 import name1, name2, name3',
> +    ...        modulename, localmods))
> +    ['foo.bar.bar1']
> +    >>> # absolute importing
> +    >>> sorted(imported_modules(
> +    ...        'from baz import baz1, name1',
> +    ...        modulename, localmods))
> +    ['baz.__init__', 'baz.baz1']
> +    >>> # mixed importing, even though it shouldn't be recommended
> +    >>> sorted(imported_modules(
> +    ...        'import stdlib, foo1, baz',
> +    ...        modulename, localmods))
> +    ['baz.__init__', 'foo.foo1']
> +    >>> # ignore_nested
>>>> sorted(imported_modules(
>     ... '''import foo
>     ... def wat():
>     ...     import bar
> -    ... ''', ignore_nested=True))
> -    ['foo']
> -    """
> +    ... ''', modulename, localmods))
> +    ['foo.__init__', 'foo.bar.__init__']
> +    >>> sorted(imported_modules(
> +    ... '''import foo
> +    ... def wat():
> +    ...     import bar
> +    ... ''', modulename, localmods, ignore_nested=True))
> +    ['foo.__init__']
> +    """
> +    fromlocal = fromlocalfunc(modulename, localmods)
>     for node in ast.walk(ast.parse(source)):
>         if ignore_nested and getattr(node, 'col_offset', 0) > 0:
>             continue
>         if isinstance(node, ast.Import):
>             for n in node.names:
> -                yield n.name
> +                found = fromlocal(n.name)
> +                if not found:
> +                    # this should import standard library
> +                    continue
> +                yield found[1]
>         elif isinstance(node, ast.ImportFrom):
> -            prefix = node.module + '.'
> +            found = fromlocal(node.module)
> +            if not found:
> +                # this should import standard library
> +                continue
> +
> +            absname, dottedpath, hassubmod = found
> +            yield dottedpath
> +            if not hassubmod:
> +                # examination of "node.names" should be redundant
> +                # e.g.: from mercurial.node import nullid, nullrev
> +                continue
> +
> +            prefix = absname + '.'
>             for n in node.names:
> -                yield prefix + n.name
> +                found = fromlocal(prefix + n.name)
> +                if not found:
> +                    # this should be a function or a property of "node.module"
> +                    continue
> +                yield found[1]
> 
> def verify_stdlib_on_own_line(source):
>     """Given some python source, verify that stdlib imports are done
> @@ -283,7 +339,7 @@ def main(argv):
>         f = open(source_path)
>         src = f.read()
>         used_imports[modname] = sorted(
> -            imported_modules(src, ignore_nested=True))
> +            imported_modules(src, modname, localmods, ignore_nested=True))
>         for error in verify_stdlib_on_own_line(src):
>             any_errors = True
>             print source_path, error
> diff --git a/tests/test-module-imports.t b/tests/test-module-imports.t
> --- a/tests/test-module-imports.t
> +++ b/tests/test-module-imports.t
> @@ -37,3 +37,4 @@ these may expose other cycles.
>      stdlib:    formatter
>      relative:  config, error, scmutil, util
>   Import cycle: mercurial.cmdutil -> mercurial.context -> mercurial.subrepo -> mercurial.cmdutil
> +  Import cycle: mercurial.commands -> mercurial.commandserver -> mercurial.dispatch -> mercurial.commands
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@selenic.com
> http://selenic.com/mailman/listinfo/mercurial-devel

Patch

diff --git a/contrib/import-checker.py b/contrib/import-checker.py
--- a/contrib/import-checker.py
+++ b/contrib/import-checker.py
@@ -156,38 +156,94 @@  def list_stdlib_modules():
 
 stdlib_modules = set(list_stdlib_modules())
 
-def imported_modules(source, ignore_nested=False):
+def imported_modules(source, modulename, localmods, ignore_nested=False):
     """Given the source of a file as a string, yield the names
     imported by that file.
 
     Args:
       source: The python source to examine as a string.
+      modulename: of specified python source (may have `__init__`)
+      localmods: dict of locally defined module names (may have `__init__`)
       ignore_nested: If true, import statements that do not start in
                      column zero will be ignored.
 
     Returns:
-      A list of module names imported by the given source.
-
-    >>> sorted(imported_modules(
-    ...         'import foo ; from baz import bar; import foo.qux'))
-    ['baz.bar', 'foo', 'foo.qux']
+      A list of absolute module names imported by the given source.
+
+    >>> modulename = 'foo.xxx'
+    >>> localmods = {'foo.__init__': True,
+    ...              'foo.foo1': True, 'foo.foo2': True,
+    ...              'foo.bar.__init__': True, 'foo.bar.bar1': True,
+    ...              'baz.__init__': True, 'baz.baz1': True }
+    >>> # standard library (= not locally defined ones)
+    >>> sorted(imported_modules(
+    ...        'from stdlib1 import foo, bar; import stdlib2',
+    ...        modulename, localmods))
+    []
+    >>> # relative importing
+    >>> sorted(imported_modules(
+    ...        'import foo1; from bar import bar1',
+    ...        modulename, localmods))
+    ['foo.bar.__init__', 'foo.bar.bar1', 'foo.foo1']
+    >>> sorted(imported_modules(
+    ...        'from bar.bar1 import name1, name2, name3',
+    ...        modulename, localmods))
+    ['foo.bar.bar1']
+    >>> # absolute importing
+    >>> sorted(imported_modules(
+    ...        'from baz import baz1, name1',
+    ...        modulename, localmods))
+    ['baz.__init__', 'baz.baz1']
+    >>> # mixed importing, even though it shouldn't be recommended
+    >>> sorted(imported_modules(
+    ...        'import stdlib, foo1, baz',
+    ...        modulename, localmods))
+    ['baz.__init__', 'foo.foo1']
+    >>> # ignore_nested
     >>> sorted(imported_modules(
     ... '''import foo
     ... def wat():
     ...     import bar
-    ... ''', ignore_nested=True))
-    ['foo']
-    """
+    ... ''', modulename, localmods))
+    ['foo.__init__', 'foo.bar.__init__']
+    >>> sorted(imported_modules(
+    ... '''import foo
+    ... def wat():
+    ...     import bar
+    ... ''', modulename, localmods, ignore_nested=True))
+    ['foo.__init__']
+    """
+    fromlocal = fromlocalfunc(modulename, localmods)
     for node in ast.walk(ast.parse(source)):
         if ignore_nested and getattr(node, 'col_offset', 0) > 0:
             continue
         if isinstance(node, ast.Import):
             for n in node.names:
-                yield n.name
+                found = fromlocal(n.name)
+                if not found:
+                    # this should import standard library
+                    continue
+                yield found[1]
         elif isinstance(node, ast.ImportFrom):
-            prefix = node.module + '.'
+            found = fromlocal(node.module)
+            if not found:
+                # this should import standard library
+                continue
+
+            absname, dottedpath, hassubmod = found
+            yield dottedpath
+            if not hassubmod:
+                # examination of "node.names" should be redundant
+                # e.g.: from mercurial.node import nullid, nullrev
+                continue
+
+            prefix = absname + '.'
             for n in node.names:
-                yield prefix + n.name
+                found = fromlocal(prefix + n.name)
+                if not found:
+                    # this should be a function or a property of "node.module"
+                    continue
+                yield found[1]
 
 def verify_stdlib_on_own_line(source):
     """Given some python source, verify that stdlib imports are done
@@ -283,7 +339,7 @@  def main(argv):
         f = open(source_path)
         src = f.read()
         used_imports[modname] = sorted(
-            imported_modules(src, ignore_nested=True))
+            imported_modules(src, modname, localmods, ignore_nested=True))
         for error in verify_stdlib_on_own_line(src):
             any_errors = True
             print source_path, error
diff --git a/tests/test-module-imports.t b/tests/test-module-imports.t
--- a/tests/test-module-imports.t
+++ b/tests/test-module-imports.t
@@ -37,3 +37,4 @@  these may expose other cycles.
      stdlib:    formatter
      relative:  config, error, scmutil, util
   Import cycle: mercurial.cmdutil -> mercurial.context -> mercurial.subrepo -> mercurial.cmdutil
+  Import cycle: mercurial.commands -> mercurial.commandserver -> mercurial.dispatch -> mercurial.commands