Patchwork [1,of,2,V3] chgserver: add utilities to calculate mtimehash

login
register
mail settings
Submitter Jun Wu
Date Feb. 28, 2016, 12:17 p.m.
Message ID <94fd1cecd935b7ce05a5.1456661845@x1c>
Download mbox | patch
Permalink /patch/13458/
State Accepted
Delegated to: Yuya Nishihara
Headers show

Comments

Jun Wu - Feb. 28, 2016, 12:17 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1456498779 0
#      Fri Feb 26 14:59:39 2016 +0000
# Node ID 94fd1cecd935b7ce05a5b435ddb82a1eb610ec85
# Parent  10eaae94523c790d7b77c52a4cb4bfcf406021ef
chgserver: add utilities to calculate mtimehash

mtimehash is designed to detect file changes. These files include:
- single file extensions (__init__.py for complex extensions)
- mercurial/__version__.py
- hg (util.hgcmd())
- python (sys.executable)

mtimehash only uses stat to check files so it's fast but not 100% accurate.
However it should be good enough for our use case.

For chgserver, once mtimehash changes, the server is considered outdated
immediately and should no longer provide service.
Yuya Nishihara - Feb. 28, 2016, 3:20 p.m.
On Sun, 28 Feb 2016 12:17:25 +0000, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1456498779 0
> #      Fri Feb 26 14:59:39 2016 +0000
> # Node ID 94fd1cecd935b7ce05a5b435ddb82a1eb610ec85
> # Parent  10eaae94523c790d7b77c52a4cb4bfcf406021ef
> chgserver: add utilities to calculate mtimehash

Pushed modified version to the clowncopter, thanks!

> +def _getmtimepaths(ui):
> +    """get a list of paths that should be checked to detect change
> +
> +    The list will include:
> +    - extensions (will not cover all files for complex extensions)
> +    - mercurial/__version__.py
> +    - hg binary
> +    - python binary
> +    """
> +    modules = [m for n, m in extensions.extensions(ui)]
> +    try:
> +        from mercurial import __version__
> +        modules.append(__version__)
> +    except ImportError:
> +        pass
> +    files = [sys.executable, util.hgcmd()]
> +    for m in modules:
> +        try:
> +            files.append(inspect.getabsfile(m))
> +        except TypeError:
> +            pass
> +    return sorted(set(files))

I've removed util.hgcmd() because

 a) we have __version__.py
 b) util.hgcmd() returns a list of command args, which isn't hashable
    (e.g. ['/usr/bin/hg'] on Unix, and ['python', '\\path\\to\\hg'] on Windows)

I noticed that inspect.getabsfile() is undocumented. But using getabsfile()
won't cause problems, since it exists in Python 2.6, 2.7, and 3.5.

https://bugs.python.org/issue12317
Jun Wu - Feb. 28, 2016, 4:39 p.m.
On 02/28/2016 03:20 PM, Yuya Nishihara wrote:
> I noticed that inspect.getabsfile() is undocumented. But using
> getabsfile() won't cause problems, since it exists in Python 2.6,
> 2.7, and 3.5.

Right. There is an issue (https://bugs.python.org/issue12317)
about adding document for it.

Patch

diff --git a/hgext/chgserver.py b/hgext/chgserver.py
--- a/hgext/chgserver.py
+++ b/hgext/chgserver.py
@@ -30,10 +30,12 @@ 
 
 import SocketServer
 import errno
+import inspect
 import os
 import re
 import signal
 import struct
+import sys
 import threading
 import time
 import traceback
@@ -46,6 +48,7 @@ 
     commandserver,
     dispatch,
     error,
+    extensions,
     osutil,
     util,
 )
@@ -97,6 +100,51 @@ 
     envhash = _hashlist(sorted(envitems))
     return sectionhash[:6] + envhash[:6]
 
+def _getmtimepaths(ui):
+    """get a list of paths that should be checked to detect change
+
+    The list will include:
+    - extensions (will not cover all files for complex extensions)
+    - mercurial/__version__.py
+    - hg binary
+    - python binary
+    """
+    modules = [m for n, m in extensions.extensions(ui)]
+    try:
+        from mercurial import __version__
+        modules.append(__version__)
+    except ImportError:
+        pass
+    files = [sys.executable, util.hgcmd()]
+    for m in modules:
+        try:
+            files.append(inspect.getabsfile(m))
+        except TypeError:
+            pass
+    return sorted(set(files))
+
+def _mtimehash(paths):
+    """return a quick hash for detecting file changes
+
+    mtimehash calls stat on given paths and calculate a hash based on size and
+    mtime of each file. mtimehash does not read file content because reading is
+    expensive. therefore it's not 100% reliable for detecting content changes.
+    it's possible to return different hashes for same file contents.
+    it's also possible to return a same hash for different file contents for
+    some carefully crafted situation.
+
+    for chgserver, it is designed that once mtimehash changes, the server is
+    considered outdated immediately and should no longer provide service.
+    """
+    def trystat(path):
+        try:
+            st = os.stat(path)
+            return (st.st_mtime, st.st_size)
+        except OSError:
+            # could be ENOENT, EPERM etc. not fatal in any case
+            pass
+    return _hashlist(map(trystat, paths))[:12]
+
 # copied from hgext/pager.py:uisetup()
 def _setuppagercmd(ui, options, cmd):
     if not ui.formatted():