Patchwork [3,of,3,RFC] chgserve: preimport enabled extensions

login
register
mail settings
Submitter Jun Wu
Date Oct. 3, 2016, 6:11 a.m.
Message ID <6f38a7d259add12c4404.1475475080@x1c>
Download mbox | patch
Permalink /patch/16825/
State Changes Requested
Delegated to: Yuya Nishihara
Headers show

Comments

Jun Wu - Oct. 3, 2016, 6:11 a.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1475463449 -3600
#      Mon Oct 03 03:57:29 2016 +0100
# Node ID 6f38a7d259add12c44040535681c4b3ceef0a791
# Parent  7d106c3386e17746c9b49ab9c788535634b3e8f6
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r 6f38a7d259ad
chgserve: preimport enabled extensions

This patch pre-imports enabled extensions, so performance improvement is
visible. It's about 0.01 seconds slower than the old chg, which seems to be
acceptable.
Jun Wu - Oct. 3, 2016, 6:32 a.m.
This is the first 3 patches of the chg re-arch series. They may look simpler
than expected at first sight, but locally I have 7 more patches, and the
handling of "ui" is still not done yet.

I have chosen these 3 to start the discussion, as they reveal the plan to
some extent, and seem to be most relevant to the next steps.

  - Should we hack "hg" / "dispatch" instead so they become aware of chg?
    This leads to simpler code but since chg is experimental, I guess the
    answer is no for now. But we may move chg to dispatch eventually.
  - "chgserve" as an executable, will couple with "hgext/chgserver.py".
    With the current direction, "chgserver" will make less sense as an
    extension. I'm not sure if moving it out is a good idea or not.


Excerpts from Jun Wu's message of 2016-10-03 07:11:20 +0100:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1475463449 -3600
> #      Mon Oct 03 03:57:29 2016 +0100
> # Node ID 6f38a7d259add12c44040535681c4b3ceef0a791
> # Parent  7d106c3386e17746c9b49ab9c788535634b3e8f6
> # Available At https://bitbucket.org/quark-zju/hg-draft 
> #              hg pull https://bitbucket.org/quark-zju/hg-draft  -r 6f38a7d259ad
> chgserve: preimport enabled extensions
> 
> This patch pre-imports enabled extensions, so performance improvement is
> visible. It's about 0.01 seconds slower than the old chg, which seems to be
> acceptable.
> 
> diff --git a/contrib/chg/chgserve b/contrib/chg/chgserve
> --- a/contrib/chg/chgserve
> +++ b/contrib/chg/chgserve
> @@ -11,4 +11,5 @@ from mercurial import (
>      commandserver,
>      dispatch,
> +    extensions,
>      fancyopts,
>      ui as uimod,
> @@ -24,4 +25,27 @@ def _confighash(ui):
>      return envhash[:12]
>  
> +_preimported = {} # {(name, path): mod}
> +
> +def _importext(orig, name, path=None, reportfunc=None):
> +    mod = _preimported.get((name, path))
> +    if mod:
> +        return mod
> +    else:
> +        return orig(name, path, reportfunc)
> +
> +def _preimportextensions(ui):
> +    for name, path in ui.configitems("extensions"):
> +        # only enabled extensions are pre-imported - assuming importing an
> +        # extension is side-effect free.
> +        if path.startswith('!'):
> +            continue
> +        try:
> +            mod = extensions._importext(name, path)
> +        except ImportError:
> +            pass
> +        else:
> +            _preimported[(name, path)] = mod
> +            ui.debug('chg: preimported %s\n' % name)
> +
>  def _startchgservice():
>      # patch chgserver's confighash to only hash environment variables
> @@ -35,4 +59,7 @@ def _startchgservice():
>      fancyopts.fancyopts(args, commands.table['^serve'][1], opts)
>  
> +    _preimportextensions(ui)
> +    extensions.wrapfunction(extensions, '_importext', _importext)
> +
>      service = chgserver.chgunixservice(ui, repo, opts)
>      cmdutil.service(opts, initfn=service.init, runfn=service.run)
Yuya Nishihara - Oct. 3, 2016, 12:59 p.m.
On Mon, 3 Oct 2016 07:11:20 +0100, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1475463449 -3600
> #      Mon Oct 03 03:57:29 2016 +0100
> # Node ID 6f38a7d259add12c44040535681c4b3ceef0a791
> # Parent  7d106c3386e17746c9b49ab9c788535634b3e8f6
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #              hg pull https://bitbucket.org/quark-zju/hg-draft -r 6f38a7d259ad
> chgserve: preimport enabled extensions
> 
> This patch pre-imports enabled extensions, so performance improvement is
> visible. It's about 0.01 seconds slower than the old chg, which seems to be
> acceptable.

(I haven't read the patch 1 and 2 yet.)

Regarding this, I was thinking about 'extensions.<name>:enabled' syntax so
users (or sysadmins) can define a set of conditionally-enabled extensions
globally.

  [extensions]
  rebase =
  topic = /path/to/topic.py
  topic:enabled = False

chg daemon will import all extensions listed in ~/.hgrc. And if :enabled = True
is flagged by repo/.hg/hgrc, ui/reposetup() will be run.

> +def _importext(orig, name, path=None, reportfunc=None):
> +    mod = _preimported.get((name, path))
> +    if mod:
> +        return mod
> +    else:
> +        return orig(name, path, reportfunc)
> +
> +def _preimportextensions(ui):
> +    for name, path in ui.configitems("extensions"):
> +        # only enabled extensions are pre-imported - assuming importing an
> +        # extension is side-effect free.
> +        if path.startswith('!'):
> +            continue
> +        try:
> +            mod = extensions._importext(name, path)
> +        except ImportError:
> +            pass
> +        else:
> +            _preimported[(name, path)] = mod

Perhaps path needs to be normalized.
Jun Wu - Oct. 3, 2016, 2:39 p.m.
Excerpts from Yuya Nishihara's message of 2016-10-03 21:59:02 +0900:
> Regarding this, I was thinking about 'extensions.<name>:enabled' syntax so
> users (or sysadmins) can define a set of conditionally-enabled extensions
> globally.
> 
>   [extensions]
>   rebase =
>   topic = /path/to/topic.py
>   topic:enabled = False
> 
> chg daemon will import all extensions listed in ~/.hgrc. And if :enabled = True
> is flagged by repo/.hg/hgrc, ui/reposetup() will be run.

This may make the implementation more complex and fragile. One area that chg
currently cannot handle 100% correct is things like "-r bundles", shared
repo, or inferrepo - confighash may mismatch forever because of differences
between chg's config loading logic and the non-chg one. If that happens, chg
may redirect forever. I have seen this but haven't got enough time to find
out what's going on exactly. In comparison, the new code can load a random
hgrc and do not worry about if the extensions listed will be enabled or
disabled because of the nice side-effect-free assumption.

Running "uisetup" is not safe for extensions reading configs from "ui". I
think controlling them via configs is not very friendly for average users.
They may mess up and get surprised.

Because of the above two reasons, I prefer no uisetup for all extensions. It
leads to a simpler and safer implementation.

That said, I understand "uisetup" can be fat and expensive. I think if that
happens, the extension author can do the slow logic at import-time. So
instead of:

  def uisetup(ui):
    x = expensive()
    ...

They can write:

  x = expensive()

  def uisetup(ui):
    global x
    ...

So uisetup would remain cheap. If "expensive()" relies on "ui", it cannot
be pre-calculated, since chg cannot provide a "correct" "ui" because of 
possible future config changes.

Regarding on reposetup, I'm aware that some repo state like the radix tree
index needs to be persistent in memory. In my opinion, it would be better
solved by other IPC means, like shared memory or a background daemon
speaking some protocol.

> 
> > +def _importext(orig, name, path=None, reportfunc=None):
> > +    mod = _preimported.get((name, path))
> > +    if mod:
> > +        return mod
> > +    else:
> > +        return orig(name, path, reportfunc)
> > +
> > +def _preimportextensions(ui):
> > +    for name, path in ui.configitems("extensions"):
> > +        # only enabled extensions are pre-imported - assuming importing an
> > +        # extension is side-effect free.
> > +        if path.startswith('!'):
> > +            continue
> > +        try:
> > +            mod = extensions._importext(name, path)
> > +        except ImportError:
> > +            pass
> > +        else:
> > +            _preimported[(name, path)] = mod
> 
> Perhaps path needs to be normalized.
Yuya Nishihara - Oct. 3, 2016, 3:29 p.m.
On Mon, 3 Oct 2016 15:39:24 +0100, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2016-10-03 21:59:02 +0900:
> > Regarding this, I was thinking about 'extensions.<name>:enabled' syntax so
> > users (or sysadmins) can define a set of conditionally-enabled extensions
> > globally.
> > 
> >   [extensions]
> >   rebase =
> >   topic = /path/to/topic.py
> >   topic:enabled = False
> > 
> > chg daemon will import all extensions listed in ~/.hgrc. And if :enabled = True
> > is flagged by repo/.hg/hgrc, ui/reposetup() will be run.
> 
> This may make the implementation more complex and fragile. One area that chg
> currently cannot handle 100% correct is things like "-r bundles", shared
> repo, or inferrepo - confighash may mismatch forever because of differences
> between chg's config loading logic and the non-chg one. If that happens, chg
> may redirect forever.

Isn't the goal of this series to get rid of the confighash?

> Because of the above two reasons, I prefer no uisetup for all extensions. It
> leads to a simpler and safer implementation.

Yep. No ui/reposetup() in the main server process. My idea is how to tell
chg server to pre-import extensions which can't be enabled globally.

> Regarding on reposetup, I'm aware that some repo state like the radix tree
> index needs to be persistent in memory. In my opinion, it would be better
> solved by other IPC means, like shared memory or a background daemon
> speaking some protocol.

Like memcached in some ways.
Jun Wu - Oct. 3, 2016, 5:57 p.m.
Excerpts from Yuya Nishihara's message of 2016-10-04 00:29:09 +0900:
> On Mon, 3 Oct 2016 15:39:24 +0100, Jun Wu wrote:
> > Excerpts from Yuya Nishihara's message of 2016-10-03 21:59:02 +0900:
> > > Regarding this, I was thinking about 'extensions.<name>:enabled' syntax so
> > > users (or sysadmins) can define a set of conditionally-enabled extensions
> > > globally.
> > > 
> > >   [extensions]
> > >   rebase =
> > >   topic = /path/to/topic.py
> > >   topic:enabled = False
> > > 
> > > chg daemon will import all extensions listed in ~/.hgrc. And if :enabled = True
> > > is flagged by repo/.hg/hgrc, ui/reposetup() will be run.
> > 
> > This may make the implementation more complex and fragile. One area that chg
> > currently cannot handle 100% correct is things like "-r bundles", shared
> > repo, or inferrepo - confighash may mismatch forever because of differences
> > between chg's config loading logic and the non-chg one. If that happens, chg
> > may redirect forever.
> 
> Isn't the goal of this series to get rid of the confighash?

Yes. "confighash" will only contain sensitive environment variables like
PYTHONPATH, LD_PRELOAD etc. ui.config won't affect confighash.

> > Because of the above two reasons, I prefer no uisetup for all extensions. It
> > leads to a simpler and safer implementation.
> 
> Yep. No ui/reposetup() in the main server process. My idea is how to tell
> chg server to pre-import extensions which can't be enabled globally.

That has been taken care of in patch 4, which I haven't sent yet. [1]

[1]: https://bitbucket.org/quark-zju/hg-draft/commits/51278775

> > Regarding on reposetup, I'm aware that some repo state like the radix tree
> > index needs to be persistent in memory. In my opinion, it would be better
> > solved by other IPC means, like shared memory or a background daemon
> > speaking some protocol.
> 
> Like memcached in some ways.
Yuya Nishihara - Oct. 4, 2016, 2:11 p.m.
On Mon, 3 Oct 2016 18:57:58 +0100, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2016-10-04 00:29:09 +0900:
> > On Mon, 3 Oct 2016 15:39:24 +0100, Jun Wu wrote:
> > > Excerpts from Yuya Nishihara's message of 2016-10-03 21:59:02 +0900:
> > > > Regarding this, I was thinking about 'extensions.<name>:enabled' syntax so
> > > > users (or sysadmins) can define a set of conditionally-enabled extensions
> > > > globally.
> > > > 
> > > >   [extensions]
> > > >   rebase =
> > > >   topic = /path/to/topic.py
> > > >   topic:enabled = False
> > > > 
> > > > chg daemon will import all extensions listed in ~/.hgrc. And if :enabled = True
> > > > is flagged by repo/.hg/hgrc, ui/reposetup() will be run.
> > > 
> > > This may make the implementation more complex and fragile. One area that chg
> > > currently cannot handle 100% correct is things like "-r bundles", shared
> > > repo, or inferrepo - confighash may mismatch forever because of differences
> > > between chg's config loading logic and the non-chg one. If that happens, chg
> > > may redirect forever.
> > 
> > Isn't the goal of this series to get rid of the confighash?
> 
> Yes. "confighash" will only contain sensitive environment variables like
> PYTHONPATH, LD_PRELOAD etc. ui.config won't affect confighash.

Nice. :)

> > > Because of the above two reasons, I prefer no uisetup for all extensions. It
> > > leads to a simpler and safer implementation.
> > 
> > Yep. No ui/reposetup() in the main server process. My idea is how to tell
> > chg server to pre-import extensions which can't be enabled globally.
> 
> That has been taken care of in patch 4, which I haven't sent yet. [1]
> 
> [1]: https://bitbucket.org/quark-zju/hg-draft/commits/51278775

Yeah, my initial idea was something like that, and I found :enabled would
work well for both chg and non-chg use cases. For instance, I can manage all
paths of third-party extensions in my ~/.hgrc, and enable them selectively
by repo/.hg/hgrc. Which seems an improvement over the current "!" syntax.
Jun Wu - Oct. 4, 2016, 7:17 p.m.
Excerpts from Yuya Nishihara's message of 2016-10-04 23:11:39 +0900:
> Yeah, my initial idea was something like that, and I found :enabled would
> work well for both chg and non-chg use cases. For instance, I can manage all
> paths of third-party extensions in my ~/.hgrc, and enable them selectively
> by repo/.hg/hgrc. Which seems an improvement over the current "!" syntax.

I agree it's better than the current syntax. Users would be able to use
"extname:enabled = True" without setting "extname" to enable a builtin
extension. As "configbool" defaults to False, we may want to use ":disabled"
instead.

When I was new to hg, the syntax of setting "extname=" to enable an
extension looked pretty strange. I guess it does confuse users.

To make things even cleaner, it may be worthwhile to explicitly append
":path" to override the path setting, for example, you can have both or
either or none of the two configs:

  extname:path    = path/to/ext.py
  extname:enabled = True

"!" cannot be used in "extname:path". The idea is inspired by CSS: you can
have "border: 1px red", and then "border-width: 2px", "border-color: blue"
will override the "border" property.

That said, the issue I want to solve is different: I think it can be tricky
for some strange extensions to be 100% side-effect free when being imported.
So we need a way to deal with that, like blacklisting those extensions.

Config is an option. But ideally, it's the extension itself telling chg to
not import it, like having some magic string in the code, because the author
of the extension knows best about the behavior, not the user. Having it in
code also means we need a customized module loader, which is not what I
really want to do because it's much more complex than a simple "__import__"
call.

Alternatively, we can do a check afterwards - import the extension as mod,
and check mod.importsideeffectfree - if it's "False", just abort and let the
user change their configs. This is not very pretty but should simply work.
What do you think?
Yuya Nishihara - Oct. 7, 2016, 5:55 a.m.
On Tue, 4 Oct 2016 20:17:06 +0100, Jun Wu wrote:
> Excerpts from Yuya Nishihara's message of 2016-10-04 23:11:39 +0900:
> > Yeah, my initial idea was something like that, and I found :enabled would
> > work well for both chg and non-chg use cases. For instance, I can manage all
> > paths of third-party extensions in my ~/.hgrc, and enable them selectively
> > by repo/.hg/hgrc. Which seems an improvement over the current "!" syntax.
> 
> I agree it's better than the current syntax. Users would be able to use
> "extname:enabled = True" without setting "extname" to enable a builtin
> extension. As "configbool" defaults to False, we may want to use ":disabled"
> instead.

We have some default=True, which should be okay.

> When I was new to hg, the syntax of setting "extname=" to enable an
> extension looked pretty strange. I guess it does confuse users.
> 
> To make things even cleaner, it may be worthwhile to explicitly append
> ":path" to override the path setting, for example, you can have both or
> either or none of the two configs:
> 
>   extname:path    = path/to/ext.py
>   extname:enabled = True
> 
> "!" cannot be used in "extname:path".

Seems fine, but I have no preference about that because [paths] have a
similar syntax, paths.<name> and paths.<name>:<subopt>.

> That said, the issue I want to solve is different: I think it can be tricky
> for some strange extensions to be 100% side-effect free when being imported.
> So we need a way to deal with that, like blacklisting those extensions.
>
> Config is an option. But ideally, it's the extension itself telling chg to
> not import it, like having some magic string in the code, because the author
> of the extension knows best about the behavior, not the user. Having it in
> code also means we need a customized module loader, which is not what I
> really want to do because it's much more complex than a simple "__import__"
> call.
> 
> Alternatively, we can do a check afterwards - import the extension as mod,
> and check mod.importsideeffectfree - if it's "False", just abort and let the
> user change their configs. This is not very pretty but should simply work.
> What do you think?

Do we have no option to make these extensions import-free? Do they have
something should be done before uisetup() ?

If I had to implement one of them, I would choose the magic string for
simplicity. But I don't like it because it is different from what we currently
have, testedwith, buglink, etc.
Yuya Nishihara - Oct. 7, 2016, 6:16 a.m.
On Mon, 3 Oct 2016 07:32:47 +0100, Jun Wu wrote:
> This is the first 3 patches of the chg re-arch series. They may look simpler
> than expected at first sight, but locally I have 7 more patches, and the
> handling of "ui" is still not done yet.
> 
> I have chosen these 3 to start the discussion, as they reveal the plan to
> some extent, and seem to be most relevant to the next steps.
> 
>   - Should we hack "hg" / "dispatch" instead so they become aware of chg?
>     This leads to simpler code but since chg is experimental, I guess the
>     answer is no for now. But we may move chg to dispatch eventually.

If we can get rid of the entry-point script by the "dispatch" hack, I want to
go that way just because that "looks" cleaner than installing new script into
PATH.

>   - "chgserve" as an executable, will couple with "hgext/chgserver.py".
>     With the current direction, "chgserver" will make less sense as an
>     extension. I'm not sure if moving it out is a good idea or not.

Anyway I want to move chgserver.py to the core. Only reason I kept it as an
extension is I couldn't address circular imports between chgserver.py an
commandserver.py.
Pierre-Yves David - Oct. 8, 2016, 7:08 a.m.
On 10/07/2016 08:16 AM, Yuya Nishihara wrote:
> On Mon, 3 Oct 2016 07:32:47 +0100, Jun Wu wrote:
>>   - "chgserve" as an executable, will couple with "hgext/chgserver.py".
>>     With the current direction, "chgserver" will make less sense as an
>>     extension. I'm not sure if moving it out is a good idea or not.
>
> Anyway I want to move chgserver.py to the core. Only reason I kept it as an
> extension is I couldn't address circular imports between chgserver.py an
> commandserver.py.

+1 for moving it into core now the chg integration is close to completion.

Cheers,

Patch

diff --git a/contrib/chg/chgserve b/contrib/chg/chgserve
--- a/contrib/chg/chgserve
+++ b/contrib/chg/chgserve
@@ -11,4 +11,5 @@  from mercurial import (
     commandserver,
     dispatch,
+    extensions,
     fancyopts,
     ui as uimod,
@@ -24,4 +25,27 @@  def _confighash(ui):
     return envhash[:12]
 
+_preimported = {} # {(name, path): mod}
+
+def _importext(orig, name, path=None, reportfunc=None):
+    mod = _preimported.get((name, path))
+    if mod:
+        return mod
+    else:
+        return orig(name, path, reportfunc)
+
+def _preimportextensions(ui):
+    for name, path in ui.configitems("extensions"):
+        # only enabled extensions are pre-imported - assuming importing an
+        # extension is side-effect free.
+        if path.startswith('!'):
+            continue
+        try:
+            mod = extensions._importext(name, path)
+        except ImportError:
+            pass
+        else:
+            _preimported[(name, path)] = mod
+            ui.debug('chg: preimported %s\n' % name)
+
 def _startchgservice():
     # patch chgserver's confighash to only hash environment variables
@@ -35,4 +59,7 @@  def _startchgservice():
     fancyopts.fancyopts(args, commands.table['^serve'][1], opts)
 
+    _preimportextensions(ui)
+    extensions.wrapfunction(extensions, '_importext', _importext)
+
     service = chgserver.chgunixservice(ui, repo, opts)
     cmdutil.service(opts, initfn=service.init, runfn=service.run)