Patchwork D7416: procutil: support for obtaining an importlib.abc.ResourceReader

login
register
mail settings
Submitter phabricator
Date Nov. 15, 2019, 3:58 a.m.
Message ID <differential-rev-PHID-DREV-7ddppokntusqp74r7vkn-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/43252/
State New
Headers show

Comments

phabricator - Nov. 15, 2019, 3:58 a.m.
indygreg created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Python 3.7's importing mechanism offers a new "resource reader"
  API that allows module loaders to provide an object which can be
  used for obtaining information and data about "resources" in a
  Python package. This API is the new recommended way for modules
  to load data for non-module entities, such as text files and
  other support files residing next to sources. This functionality
  is similar to pkg_resources (which has existed for a long time)
  and importlib.abc.ResourceLoader, which was introduced in Python 3
  and is deprecated in favor of ResourceReader.
  
  Using a "resource reader" is a superior approach to pkg_resources
  because it is newer and faster. And both are superior to __file__
  based filesystem reading because they allow the storage of
  resources in places that aren't the filesystem. This allows things
  like embedding resources in zip files or single file binaries.
  
  This commit introduces a "resource reader" API into the procutil
  module.
  
  We introduce an importlib.abc.ResourceReader compatible class
  which can load resources from the filesystem.
  
  We introduce another class that wraps an existing ResourceReader
  so callers don't need to worry about bytes/str differences.
  
  Finally, there is a new "resourcereader()" that returns a
  ResourceReader for a given Python package. The idea is that
  code will call ``pycompat.resourcereader(__package__)`` (or
  similar) to obtain an object conforming to
  importlib.abc.ResourceReader then from there on, things will
  look like they would as if the code were Python 3.7 native.
  
  The ultimate goal is to remove the dependence on __file__
  and util.datapath so non-module data file access is abstract
  and doesn't need to be serviced by a traditional filesystem.
  
  The new code will return bad results for py2exe and macOS
  application builds because resource files live in immediate
  subdirectories of the binary instead of next to the source.
  This will be addressed in future commits.
  
  1. no-check-commit because we must use foo_bar naming

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D7416

AFFECTED FILES
  mercurial/utils/procutil.py

CHANGE DETAILS




To: indygreg, #hg-reviewers
Cc: mercurial-devel
phabricator - Nov. 16, 2019, 7:27 p.m.
indygreg added a comment.
indygreg added a subscriber: martinvonz.


  @martinvonz do you want to pick up this series or do you want me to rebase it on top of `packageutil`? If you leave it up to me, I'm not sure when I'll get around to it. (I'd prefer to focus my time on shoring up PyOxidizer.)
  
  If you do pick this up, an alternative to this approach would be to vendor `importlib_resources` (https://importlib-resources.readthedocs.io/en/latest/index.html) or consider using its code/approach instead of rolling so much of a custom solution. That being said, I feel like I already wrote enough of the code to support the modern resources API and Mercurial would require sufficient changes for things to work in the domain of `bytes` that importing `importlib_resources` doesn't seem worthwhile.
  
  If you do take this series, something to consider for templates is the implication for users wanting to edit templates. If templates are embedded in the binary, how do users define their own custom versions of the build-in templates? We already have a permissions problem when the provided templates are read-only and the user can't edit them in place. But with PyOxidizer, if templates are embedded in the binary, the user won't even have access to their raw content! We would likely need to supply an `hg debugextracttemplates` command or something to allow the user to obtain a copy of the embedded templates.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7416/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7416

To: indygreg, #hg-reviewers
Cc: martinvonz, mercurial-devel

Patch

diff --git a/mercurial/utils/procutil.py b/mercurial/utils/procutil.py
--- a/mercurial/utils/procutil.py
+++ b/mercurial/utils/procutil.py
@@ -12,6 +12,7 @@ 
 import contextlib
 import errno
 import imp
+import importlib
 import io
 import os
 import signal
@@ -544,3 +545,125 @@ 
     else:
         return os.path.dirname(
             os.path.dirname(pycompat.fsencode(__file__)))
+
+class filesystemresourcereader(object):
+    """An importlib.abc.ResourceReader that loads from the filesystem.
+
+    This API and the loader's ``get_resource_loader()`` method were introduced
+    in Python 3.7. This class implements the interface for older Python versions
+    and for Python 3.7+ environments where ``module.__loader__`` doesn't define
+    ``get_resource_loader()``.
+
+    In addition to implementing the basic functionality, we also tweak the API
+    to operate in the domain of bytes instead of str. This is wrong. But since
+    Mercurial should be the only consumer of this API, it should be OK.
+    """
+
+    def __init__(self, basepath):
+        self._base = basepath
+
+    def _raise_file_not_found(self):
+        # Various methods are mandated to raise FileNotFoundError. But this
+        # exception doesn't exist on older Python.
+        try:
+            raise FileNotFoundError
+        except NameError:
+            raise OSError(errno.ENOENT, r'file not found')
+
+    def open_resource(self, resource):
+        # Subdirectory checking isn't performed in at least Python 3.7.2. It is
+        # weird that is_resource() does do this checking and open_resource()
+        # will allow paths with separators. See also
+        # https://bugs.python.org/issue36128. We keep the Python 3.7 behavior
+        # for compatibility.
+
+        path = os.path.join(self._base, resource)
+
+        # We should raise FileNotFoundError on missing resource. But since this
+        # exception doesn't exist on older Python, we simply let the original
+        # exception be raised.
+        return open(path, r'rb')
+
+    def resource_path(self, resource):
+        if not self.is_resource(resource):
+            self._raise_file_not_found()
+
+        return pycompat.fsencode(os.path.join(self._base, resource))
+
+    def is_resource(self, resource):
+        # See comment in open_resource() about the odd behavior of checking for
+        # path separators.
+        if b'/' in resource or b'\\' in resource:
+            return False
+
+        path = os.path.join(self._base, resource)
+        return os.path.isfile(path)
+
+    def contents(self):
+        return iter(
+            sorted(pycompat.fsencode(p) for p in os.listdir(self._base)))
+
+class bytesresourceloaderproxy(object):
+    """An importlib.abc.ResourceReader proxy that uses bytes instead of str.
+
+    The ResourceReader API is supposed to operate in the domain of str.
+    But all of Mercurial uses bytes. Rather than have all callers worry about
+    bytes/str, we use this class to wrap an existing ResourceReader so the
+    bytes<->str conversion happens automatically. This violates the API
+    contract. But there should be no consumers of ResourceReader outside
+    of Mercurial, so this should be fine. Once the source code is Python
+    3 native, we can consider removing this.
+
+    We assume this class is only ever used in Python 3[.7]+.
+    """
+    def __init__(self, orig):
+        self._orig = orig
+
+    def open_resource(self, resource):
+        return self._orig.open_resource(pycompat.fsdecode(resource))
+
+    def resource_path(self, resource):
+        return os.fsencode(
+            self._orig.resource_path(pycompat.fsdecode(resource)))
+
+    def is_resource(self, resource):
+        return self._orig.is_resource(pycompat.fsdecode(resource))
+
+    def contents(self):
+        # Throw in a sort for good measure because we like determinism.
+        return iter(sorted(pycompat.fsencode(p) for p in self._orig.contents()))
+
+def resourcereader(package):
+    """Obtain an importlib.abc.ResourceReader for a Python package."""
+    # Raising TypeError here is a little weird. But it is what the standard
+    # library does.
+    package = pycompat.sysstr(package)
+    module = importlib.import_module(package)
+
+    spec = getattr(module, '__spec__', None)
+
+    # __spec__ should only be present in Python 3.
+    if spec:
+        if spec.submodule_search_locations is None:
+            raise TypeError(r'%s is not a package' % package)
+
+        try:
+            orig = spec.loader.get_resource_reader(spec.name)
+
+            if orig:
+                return bytesresourceloaderproxy(orig)
+        except Exception:
+            pass
+
+        # Fall through to lack of __spec__ case.
+
+    # We couldn't load a ResourceReader from an existing importlib.abc.Loader.
+    # So return one based on the filesystem location.
+
+    # Namespace packages can't have ResourceReaders.
+    if module.__file__ is None:
+        raise TypeError(r'%s is a namespace package, which cannot have '
+                        r'resource readers' % package)
+
+    return filesystemresourcereader(
+        pycompat.fsencode(os.path.dirname(module.__file__)))