Patchwork [04,of,10] localrepo: move ui loading to baselocalrepository

login
register
mail settings
Submitter Jun Wu
Date Feb. 10, 2017, 1:46 a.m.
Message ID <5e47a19fae82f4659f1e.1486691178@localhost.localdomain>
Download mbox | patch
Permalink /patch/18378/
State Deferred
Headers show

Comments

Jun Wu - Feb. 10, 2017, 1:46 a.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1486675034 28800
#      Thu Feb 09 13:17:14 2017 -0800
# Node ID 5e47a19fae82f4659f1e7df37f8f26dd56c4f246
# Parent  2c1834e1e6b5f734c27199d47de9b2252b8f4913
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 5e47a19fae82
localrepo: move ui loading to baselocalrepository

See the previous patch for why.
Jun Wu - Feb. 14, 2017, 1:09 a.m.
Excerpts from Augie Fackler's message of 2017-02-13 18:58:59 -0500:
> On Fri, Feb 10, 2017 at 01:07:49PM -0800, Jun Wu wrote:
> > Side-effects are basically everything related to writes, like mutating
> > (wrapping) internal Python objects, etc. They are done by Python code
> > provided by an extension out of core's control. Usually that means
> > reposetup, uisetup, etc.
> 
> Usually, but not always - I've seen many extensions that call
> wrapfunction or similar outside of {repo,ui}setup.

Those extensions need to be fixed. The new chg will rely on the
side-effect-free assumption as well.

> > Reading things is fine - loading configs, changelog.i, obsstore, etc. as
> > long as no malicious extension changes the loading logic. Currently for chg
> > to work correctly, this assumption must be true.
> >
> > The difference of the above two is whether we can revert the effects. For
> > objects loaded by reading certain things, once we discard the object, the
> > effect is considered reverted. But for things like x.__class__ = y;
> > extensions.wrapfoo(...), etc., they are basically impossible to revert
> > cleanly (confidently). So we avoid them at all costs.
> 
> I'm having trouble wrapping my head around this series in a useful way
> - should we defer it to the sprint, or do you want to try and explain
> the arc of this a little more in a v2 with more details in the first
> commit message as to what you mean by side effects and what the
> expected boundary between these components is?
> 
> (I'm specifically a little perplexed by the inheritance relationship
> to avoid side effects, I just can't quite get my head around it.)

The end-goal is to get repo.vfs and repo.svfs available to the chg master
process, which runs preload functions. So preload functions could read files
properly, without side effects (like actually writing any files in the repo):

    def preloadchangelog(side-effect-free-repo):
        changelogfile = repo.svfs('changelog.i')
                        ^^^^^^^^^
                     just want this to be available, with the confidence
                     that constructing the repo object is side-effect free

Without a "repo" object, the preload code will only have "repopath":

    def preloadchangelog(repopath):
        # check repo requirements and confirm it's a format that we can
        # support, probably also check that vfs is not modified
        ...
        path = os.path.join(repopath, '.hg', 'store', 'changelog.i')
        changelogfile = open(path, 'rb')

Although that's possible, that shifts a lot of burden (like repo
requirements handling, vfs checks etc) from the preloading framework to the
author of preload functions.
 

Whatever we will end up with, I want to make sure:

  1. There is a "repo" object passed to preload functions, not a "repopath"

     "repopath" is just not practically useful as too many checks need to be
     done on it. The repo object does not have to be a full repo - a minimal
     version could only have vfs and svfs, so preload functions can read
     contents properly.

  2. Avoid unexpected reposetup() runs

    - In the future, chg does not run uisetup()s, for correctness. Because
      uisetup() runs only once and will take incorrect configs.
    - Extensions do not expect reposetup() runs when uisetup() does not run.
    - So repo.__init__ should not call reposetup()s.

    (this could be done alternatively by ignoring reposetup()s for
     extensions that haven't run uisetup()s)

  3. Avoid unwanted side effects

    - An extension starts a new SSH connection when repo.__init__ is called
    - An extension writes to some log file when repo.__init__ is called 

So I think moving part of "localrepository" to a smaller object is
reasonable. An alternative I can think of is to disable side-effect methods
in a subclass of localrepository:
  
    class purerepo(localrepository):
        def __init__(..., create=False):
            super(..., create=False)
          
        def lock(self):
            raise ProgrammingError('cannot lock')
  
        def transaction(self):
            raise ProgrammingError('cannot do transaction')

If that looks more reasonable, I can go that approach, which will probably
be more compatible, while less confident about the side-effect-free part.
Jun Wu - Feb. 23, 2017, 4:27 a.m.
Excerpts from Kevin Bullock's message of 2017-02-22 20:31:14 -0600:
> There's another alternative we might consider: can we move all of the
> "side effects" out of the localrepository class entirely? That is, instead
> of resorting to inheritance, could we create a lower-level "repo storage"
> class that has enough logic to make vfs and svfs available, but doesn't
> call reposetup()?

That's what I'm doing - moving "extensions.loadall()" out from
localrepository.__init__, and together with chg refactoring,
extensions.extensions will eventually return a empty list so things could be
seen as side-effect free.
Yuya Nishihara - Feb. 23, 2017, 1:05 p.m.
On Wed, 22 Feb 2017 20:27:39 -0800, Jun Wu wrote:
> Excerpts from Kevin Bullock's message of 2017-02-22 20:31:14 -0600:
> > There's another alternative we might consider: can we move all of the
> > "side effects" out of the localrepository class entirely? That is, instead
> > of resorting to inheritance, could we create a lower-level "repo storage"
> > class that has enough logic to make vfs and svfs available, but doesn't
> > call reposetup()?
> 
> That's what I'm doing - moving "extensions.loadall()" out from
> localrepository.__init__, and together with chg refactoring,
> extensions.extensions will eventually return a empty list so things could be
> seen as side-effect free.

Perhaps what Kevin suggests would be similar to my vague idea. Instead of
introducing doubtful inheritance tree, can we create a layer that has vfs,
svfs, etc., and have localrepo delegate storage operation to it?

https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-February/092552.html

Maybe that would require more changes than using inheritance, but would be
less complex.
Jun Wu - Feb. 23, 2017, 5:15 p.m.
Excerpts from Yuya Nishihara's message of 2017-02-23 22:05:24 +0900:
> On Wed, 22 Feb 2017 20:27:39 -0800, Jun Wu wrote:
> > Excerpts from Kevin Bullock's message of 2017-02-22 20:31:14 -0600:
> > > There's another alternative we might consider: can we move all of the
> > > "side effects" out of the localrepository class entirely? That is, instead
> > > of resorting to inheritance, could we create a lower-level "repo storage"
> > > class that has enough logic to make vfs and svfs available, but doesn't
> > > call reposetup()?
> > 
> > That's what I'm doing - moving "extensions.loadall()" out from
> > localrepository.__init__, and together with chg refactoring,
> > extensions.extensions will eventually return a empty list so things could be
> > seen as side-effect free.
> 
> Perhaps what Kevin suggests would be similar to my vague idea. Instead of
> introducing doubtful inheritance tree, can we create a layer that has vfs,
> svfs, etc., and have localrepo delegate storage operation to it?
> 
> https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-February/092552.html 
> 
> Maybe that would require more changes than using inheritance, but would be
> less complex.

vfs is independent. But svfs depends on repo requirements and sharedpath.
Will that layer include requirements and sharedpath too? If so, it will
be similar to the "baselocalrepository" here.

Patch

diff --git a/mercurial/localrepo.py b/mercurial/localrepo.py
--- a/mercurial/localrepo.py
+++ b/mercurial/localrepo.py
@@ -258,4 +258,16 @@  class baselocalrepository(object):
 
         self.baseui = baseui
+        self.ui = baseui.copy()
+        self.ui.copy = baseui.copy # prevent copying repo configuration
+        try:
+            self.ui.readconfig(self.join("hgrc"), self.root)
+            self._loadextensions()
+        except IOError:
+            pass
+
+    def _loadextensions(self):
+        # baselocalrepository is side-effect free, so "loading extensions" is a
+        # no-op. implement this in subclasses.
+        pass
 
     def join(self, f, *insidef):
@@ -287,11 +299,4 @@  class localrepository(baselocalrepositor
         self.nofsauditor = pathutil.pathauditor(self.root, self._checknested,
                                                 realfs=False)
-        self.ui = baseui.copy()
-        self.ui.copy = baseui.copy # prevent copying repo configuration
-        try:
-            self.ui.readconfig(self.join("hgrc"), self.root)
-            extensions.loadall(self.ui)
-        except IOError:
-            pass
 
         if self.featuresetupfuncs:
@@ -392,4 +397,7 @@  class localrepository(baselocalrepositor
         self._writecaches()
 
+    def _loadextensions(self):
+        extensions.loadall(self.ui)
+
     def _writecaches(self):
         if self._revbranchcache: