Patchwork [5,of,5] chgserver: add a config option to preload repo

login
register
mail settings
Submitter Jun Wu
Date March 14, 2016, 8:33 p.m.
Message ID <a51d7eb2e998920b2f4c.1457987631@x1c>
Download mbox | patch
Permalink /patch/13888/
State Superseded
Delegated to: Yuya Nishihara
Headers show

Comments

Jun Wu - March 14, 2016, 8:33 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1457986958 0
#      Mon Mar 14 20:22:38 2016 +0000
# Node ID a51d7eb2e998920b2f4c4f7114cb8450df05ec44
# Parent  5569c3510113e4224cd630cf157a02646ce238d4
chgserver: add a config option to preload repo

chgserver was designed to load repo per request to minimize the number of
server processes and to be more confident. But it may also work with repo
preloaded as well, which should be good for performance but is more risky
in current situation due to repo.invalidateall and ui/config handling is
not perfect yet.

This patch adds an experimental config option "chgserver.preloadrepo" and
changes confighash and mtimehash to optionally include the repo directory.
It will also change the default behavior to drop the repo object, which
will solve the big issue that chg uses a wrong repo for users.
Jun Wu - March 14, 2016, 8:58 p.m.
On 03/14/2016 08:33 PM, Jun Wu wrote:
> +                return (st.st_ino, )

Sorry it fails check-code due to the extra space.
I probably will hook patchbomb to make sure at least test-check* are passed.
Yuya Nishihara - March 15, 2016, 4:22 p.m.
On Mon, 14 Mar 2016 20:33:51 +0000, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1457986958 0
> #      Mon Mar 14 20:22:38 2016 +0000
> # Node ID a51d7eb2e998920b2f4c4f7114cb8450df05ec44
> # Parent  5569c3510113e4224cd630cf157a02646ce238d4
> chgserver: add a config option to preload repo
> 
> chgserver was designed to load repo per request to minimize the number of
> server processes and to be more confident. But it may also work with repo
> preloaded as well, which should be good for performance but is more risky
> in current situation due to repo.invalidateall and ui/config handling is
> not perfect yet.
> 
> This patch adds an experimental config option "chgserver.preloadrepo" and
> changes confighash and mtimehash to optionally include the repo directory.
> It will also change the default behavior to drop the repo object, which
> will solve the big issue that chg uses a wrong repo for users.

Can you split patches to first fix the issue spotted by dropping "--cwd /" ?

This isn't a trivial change, and I doubt if repo per server would go well.
I think chgserver will need own cache of multiple repositories per server.
Jun Wu - March 16, 2016, 12:12 p.m.
On 03/15/2016 04:22 PM, Yuya Nishihara wrote:
> This isn't a trivial change, and I doubt if repo per server would go well.
> I think chgserver will need own cache of multiple repositories per server.

Right. Currently color does not work. But I do believe it is the direction
if we need future improve performance. It will save about 40ms in our setup.

The patch changes the hash functions a bit so it's not necessary to keep
multiple repo objects (actually it's also hard since the renewui stuff
happens in the forked process).
Jun Wu - March 18, 2016, 9:50 p.m.
According to the experiment suggested by Yuya, the patch does not work as
expected if the server keeps an outdated repo object. For example, after

   hg bookmark foo
   echo foo > foo
   hg add foo
   hg commit -m foo

the server will have an outdated repo object and commands are even slower
than "preloadrepo = False".

Therefore we decide to just drop this patch now. We may try look for other
ways to improve the performance.

On 03/14/2016 01:33 PM, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1457986958 0
> #      Mon Mar 14 20:22:38 2016 +0000
> # Node ID a51d7eb2e998920b2f4c4f7114cb8450df05ec44
> # Parent  5569c3510113e4224cd630cf157a02646ce238d4
> chgserver: add a config option to preload repo
>
> chgserver was designed to load repo per request to minimize the number of
> server processes and to be more confident. But it may also work with repo
> preloaded as well, which should be good for performance but is more risky
> in current situation due to repo.invalidateall and ui/config handling is
> not perfect yet.
>
> This patch adds an experimental config option "chgserver.preloadrepo" and
> changes confighash and mtimehash to optionally include the repo directory.
> It will also change the default behavior to drop the repo object, which
> will solve the big issue that chg uses a wrong repo for users.
>
> diff --git a/hgext/chgserver.py b/hgext/chgserver.py
> --- a/hgext/chgserver.py
> +++ b/hgext/chgserver.py
> @@ -34,8 +34,11 @@
>   ::
>
>     [chgserver]
> -  idletimeout = 3600 # seconds, after which an idle server will exit
> -  skiphash = False   # whether to skip config or env change checks
> +  idletimeout = 3600  # seconds, after which an idle server will exit
> +  preloadrepo = False # whether to preload repo object (experimental)
> +                      # set it to true may increase the number of server
> +                      # processes but will make things a bit faster
> +  skiphash = False    # whether to skip config or env change checks
>   """
>
>   from __future__ import absolute_import
> @@ -45,6 +48,7 @@
>   import inspect
>   import os
>   import re
> +import stat
>   import struct
>   import sys
>   import threading
> @@ -95,10 +99,12 @@
>                       |TZ
>                       )\Z''', re.X)
>
> -def _confighash(ui):
> +def _confighash(ui, reporoot=None):
>       """return a quick hash for detecting config/env changes
>
>       confighash is the hash of sensitive config items and environment variables.
> +    optionally it can include repo root path, which is useful if repo needs to
> +    be preloaded as well.
>
>       for chgserver, it is designed that once confighash changes, the server is
>       not qualified to serve its client and should redirect the client to a new
> @@ -112,15 +118,19 @@
>       sectionhash = _hashlist(sectionitems)
>       envitems = [(k, v) for k, v in os.environ.iteritems() if _envre.match(k)]
>       envhash = _hashlist(sorted(envitems))
> -    return sectionhash[:6] + envhash[:6]
> +    result = sectionhash[:6] + envhash[:6]
> +    if reporoot:
> +        result += _hashlist([os.path.abspath(reporoot)])[:6]
> +    return result
>
> -def _getmtimepaths(ui):
> +def _getmtimepaths(ui, reporoot=None):
>       """get a list of paths that should be checked to detect change
>
>       The list will include:
>       - extensions (will not cover all files for complex extensions)
>       - mercurial/__version__.py
>       - python binary
> +    - reporoot, reporoot/.hg (if reporoot is present)
>       """
>       modules = [m for n, m in extensions.extensions(ui)]
>       try:
> @@ -134,6 +144,8 @@
>               files.append(inspect.getabsfile(m))
>           except TypeError:
>               pass
> +    if reporoot:
> +        files += [reporoot, os.path.join(reporoot, '.hg')]
>       return sorted(set(files))
>
>   def _mtimehash(paths):
> @@ -145,6 +157,7 @@
>       it's possible to return different hashes for same file contents.
>       it's also possible to return a same hash for different file contents for
>       some carefully crafted situation.
> +    for directories, inode number is used instead of size and mtime.
>
>       for chgserver, it is designed that once mtimehash changes, the server is
>       considered outdated immediately and should no longer provide service.
> @@ -152,7 +165,10 @@
>       def trystat(path):
>           try:
>               st = os.stat(path)
> -            return (st.st_mtime, st.st_size)
> +            if stat.S_ISDIR(st.st_mode):
> +                return (st.st_ino, )
> +            else:
> +                return (st.st_mtime, st.st_size)
>           except OSError:
>               # could be ENOENT, EPERM etc. not fatal in any case
>               pass
> @@ -166,10 +182,10 @@
>           self.mtimepaths = mtimepaths
>
>       @staticmethod
> -    def fromui(ui, mtimepaths=None):
> +    def fromui(ui, reporoot=None, mtimepaths=None):
>           if mtimepaths is None:
> -            mtimepaths = _getmtimepaths(ui)
> -        confighash = _confighash(ui)
> +            mtimepaths = _getmtimepaths(ui, reporoot)
> +        confighash = _confighash(ui, reporoot)
>           mtimehash = _mtimehash(mtimepaths)
>           _log('confighash = %s mtimehash = %s\n' % (confighash, mtimehash))
>           return hashstate(confighash, mtimehash, mtimepaths)
> @@ -258,6 +274,7 @@
>       return chgui(srcui)
>
>   def _renewui(srcui, args=None):
> +    """Return (newui, reporoot)"""
>       if not args:
>           args = []
>
> @@ -291,7 +308,7 @@
>               # ui.configsource returns 'none' by default
>               source = ''
>           newui.setconfig(section, name, value, source)
> -    return newui
> +    return (newui, path)
>
>   class channeledsystem(object):
>       """Propagate ui.system() request in the following format:
> @@ -439,14 +456,17 @@
>           the instructions.
>           """
>           args = self._readlist()
> +        reporoot = None
>           try:
> -            self.ui = _renewui(self.ui, args)
> +            self.ui, reporoot = _renewui(self.ui, args)
> +            reporoot = os.path.realpath(reporoot)
>           except error.ParseError as inst:
>               dispatch._formatparse(self.ui.warn, inst)
>               self.ui.flush()
>               self.cresult.write('exit 255')
>               return
> -        newhash = hashstate.fromui(self.ui, self.hashstate.mtimepaths)
> +        newhash = hashstate.fromui(self.ui, self.repo and reporoot,
> +                                   self.hashstate.mtimepaths)
>           insts = []
>           if newhash.mtimehash != self.hashstate.mtimehash:
>               addr = _hashaddress(self.baseaddress, self.hashstate.confighash)
> @@ -522,7 +542,7 @@
>
>           if set(['HGPLAIN', 'HGPLAINEXCEPT']) & diffkeys:
>               # reload config so that ui.plain() takes effect
> -            self.ui = _renewui(self.ui)
> +            self.ui = _renewui(self.ui)[0]
>
>           _clearenvaliases(commands.table)
>
> @@ -569,6 +589,11 @@
>               traceback.print_exc(file=cerr)
>               raise
>
> +def _reporoot(repo):
> +    if not repo:
> +        return None
> +    return os.path.realpath(getattr(repo, 'root', None))
> +
>   def _tempaddress(address):
>       return '%s.%d.tmp' % (address, os.getpid())
>
> @@ -616,9 +641,9 @@
>
>       def issocketowner(self):
>           try:
> -            stat = os.stat(self.server_address)
> -            return (stat.st_ino == self._socketstat.st_ino and
> -                    stat.st_mtime == self._socketstat.st_mtime)
> +            st = os.stat(self.server_address)
> +            return (st.st_ino == self._socketstat.st_ino and
> +                    st.st_mtime == self._socketstat.st_mtime)
>           except OSError:
>               return False
>
> @@ -637,6 +662,11 @@
>
>   class chgunixservice(commandserver.unixservice):
>       def init(self):
> +        if not self.ui.configbool('chgserver', 'preloadrepo', False) \
> +           or not _reporoot(self.repo):
> +            # do not preload repo if we cannot get repo root (therefore cannot
> +            # hash it), or the user choose not to
> +            self.repo = None
>           self._inithashstate()
>           self._checkextensions()
>           class cls(AutoExitMixIn, SocketServer.ForkingMixIn,
> @@ -665,7 +695,7 @@
>           if self.ui.configbool('chgserver', 'skiphash', False):
>               self.hashstate = None
>               return
> -        self.hashstate = hashstate.fromui(self.ui)
> +        self.hashstate = hashstate.fromui(self.ui, _reporoot(self.repo))
>           self.address = _hashaddress(self.address, self.hashstate.confighash)
>
>       def _createsymlink(self):
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.mercurial-2Dscm.org_mailman_listinfo_mercurial-2Ddevel&d=CwIGaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=goC_KEU1hoLmVFdaD4QgHA&m=P03fAXr9iH8RETF1KxnqqPb8zd3bPlJxdw01ngBF2w8&s=Zeo48CRNoQ7UjQGsjee_byFGbmR5A0CsAMLME_gPt4Q&e=
>

Patch

diff --git a/hgext/chgserver.py b/hgext/chgserver.py
--- a/hgext/chgserver.py
+++ b/hgext/chgserver.py
@@ -34,8 +34,11 @@ 
 ::
 
   [chgserver]
-  idletimeout = 3600 # seconds, after which an idle server will exit
-  skiphash = False   # whether to skip config or env change checks
+  idletimeout = 3600  # seconds, after which an idle server will exit
+  preloadrepo = False # whether to preload repo object (experimental)
+                      # set it to true may increase the number of server
+                      # processes but will make things a bit faster
+  skiphash = False    # whether to skip config or env change checks
 """
 
 from __future__ import absolute_import
@@ -45,6 +48,7 @@ 
 import inspect
 import os
 import re
+import stat
 import struct
 import sys
 import threading
@@ -95,10 +99,12 @@ 
                     |TZ
                     )\Z''', re.X)
 
-def _confighash(ui):
+def _confighash(ui, reporoot=None):
     """return a quick hash for detecting config/env changes
 
     confighash is the hash of sensitive config items and environment variables.
+    optionally it can include repo root path, which is useful if repo needs to
+    be preloaded as well.
 
     for chgserver, it is designed that once confighash changes, the server is
     not qualified to serve its client and should redirect the client to a new
@@ -112,15 +118,19 @@ 
     sectionhash = _hashlist(sectionitems)
     envitems = [(k, v) for k, v in os.environ.iteritems() if _envre.match(k)]
     envhash = _hashlist(sorted(envitems))
-    return sectionhash[:6] + envhash[:6]
+    result = sectionhash[:6] + envhash[:6]
+    if reporoot:
+        result += _hashlist([os.path.abspath(reporoot)])[:6]
+    return result
 
-def _getmtimepaths(ui):
+def _getmtimepaths(ui, reporoot=None):
     """get a list of paths that should be checked to detect change
 
     The list will include:
     - extensions (will not cover all files for complex extensions)
     - mercurial/__version__.py
     - python binary
+    - reporoot, reporoot/.hg (if reporoot is present)
     """
     modules = [m for n, m in extensions.extensions(ui)]
     try:
@@ -134,6 +144,8 @@ 
             files.append(inspect.getabsfile(m))
         except TypeError:
             pass
+    if reporoot:
+        files += [reporoot, os.path.join(reporoot, '.hg')]
     return sorted(set(files))
 
 def _mtimehash(paths):
@@ -145,6 +157,7 @@ 
     it's possible to return different hashes for same file contents.
     it's also possible to return a same hash for different file contents for
     some carefully crafted situation.
+    for directories, inode number is used instead of size and mtime.
 
     for chgserver, it is designed that once mtimehash changes, the server is
     considered outdated immediately and should no longer provide service.
@@ -152,7 +165,10 @@ 
     def trystat(path):
         try:
             st = os.stat(path)
-            return (st.st_mtime, st.st_size)
+            if stat.S_ISDIR(st.st_mode):
+                return (st.st_ino, )
+            else:
+                return (st.st_mtime, st.st_size)
         except OSError:
             # could be ENOENT, EPERM etc. not fatal in any case
             pass
@@ -166,10 +182,10 @@ 
         self.mtimepaths = mtimepaths
 
     @staticmethod
-    def fromui(ui, mtimepaths=None):
+    def fromui(ui, reporoot=None, mtimepaths=None):
         if mtimepaths is None:
-            mtimepaths = _getmtimepaths(ui)
-        confighash = _confighash(ui)
+            mtimepaths = _getmtimepaths(ui, reporoot)
+        confighash = _confighash(ui, reporoot)
         mtimehash = _mtimehash(mtimepaths)
         _log('confighash = %s mtimehash = %s\n' % (confighash, mtimehash))
         return hashstate(confighash, mtimehash, mtimepaths)
@@ -258,6 +274,7 @@ 
     return chgui(srcui)
 
 def _renewui(srcui, args=None):
+    """Return (newui, reporoot)"""
     if not args:
         args = []
 
@@ -291,7 +308,7 @@ 
             # ui.configsource returns 'none' by default
             source = ''
         newui.setconfig(section, name, value, source)
-    return newui
+    return (newui, path)
 
 class channeledsystem(object):
     """Propagate ui.system() request in the following format:
@@ -439,14 +456,17 @@ 
         the instructions.
         """
         args = self._readlist()
+        reporoot = None
         try:
-            self.ui = _renewui(self.ui, args)
+            self.ui, reporoot = _renewui(self.ui, args)
+            reporoot = os.path.realpath(reporoot)
         except error.ParseError as inst:
             dispatch._formatparse(self.ui.warn, inst)
             self.ui.flush()
             self.cresult.write('exit 255')
             return
-        newhash = hashstate.fromui(self.ui, self.hashstate.mtimepaths)
+        newhash = hashstate.fromui(self.ui, self.repo and reporoot,
+                                   self.hashstate.mtimepaths)
         insts = []
         if newhash.mtimehash != self.hashstate.mtimehash:
             addr = _hashaddress(self.baseaddress, self.hashstate.confighash)
@@ -522,7 +542,7 @@ 
 
         if set(['HGPLAIN', 'HGPLAINEXCEPT']) & diffkeys:
             # reload config so that ui.plain() takes effect
-            self.ui = _renewui(self.ui)
+            self.ui = _renewui(self.ui)[0]
 
         _clearenvaliases(commands.table)
 
@@ -569,6 +589,11 @@ 
             traceback.print_exc(file=cerr)
             raise
 
+def _reporoot(repo):
+    if not repo:
+        return None
+    return os.path.realpath(getattr(repo, 'root', None))
+
 def _tempaddress(address):
     return '%s.%d.tmp' % (address, os.getpid())
 
@@ -616,9 +641,9 @@ 
 
     def issocketowner(self):
         try:
-            stat = os.stat(self.server_address)
-            return (stat.st_ino == self._socketstat.st_ino and
-                    stat.st_mtime == self._socketstat.st_mtime)
+            st = os.stat(self.server_address)
+            return (st.st_ino == self._socketstat.st_ino and
+                    st.st_mtime == self._socketstat.st_mtime)
         except OSError:
             return False
 
@@ -637,6 +662,11 @@ 
 
 class chgunixservice(commandserver.unixservice):
     def init(self):
+        if not self.ui.configbool('chgserver', 'preloadrepo', False) \
+           or not _reporoot(self.repo):
+            # do not preload repo if we cannot get repo root (therefore cannot
+            # hash it), or the user choose not to
+            self.repo = None
         self._inithashstate()
         self._checkextensions()
         class cls(AutoExitMixIn, SocketServer.ForkingMixIn,
@@ -665,7 +695,7 @@ 
         if self.ui.configbool('chgserver', 'skiphash', False):
             self.hashstate = None
             return
-        self.hashstate = hashstate.fromui(self.ui)
+        self.hashstate = hashstate.fromui(self.ui, _reporoot(self.repo))
         self.address = _hashaddress(self.address, self.hashstate.confighash)
 
     def _createsymlink(self):