Patchwork D8023: chg: read CHGORIG_ values from env and handle these appropriately

login
register
mail settings
Submitter phabricator
Date Jan. 28, 2020, 12:57 a.m.
Message ID <differential-rev-PHID-DREV-krmxofdsolvqbnjebca2-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44705/
State Superseded
Headers show

Comments

phabricator - Jan. 28, 2020, 12:57 a.m.
spectral created this revision.
Herald added subscribers: mercurial-devel, mjpieters.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Python3.7+ will "coerce" the locale (specifically LC_CTYPE) in many situations,
  and this can cause chg to not start. My previous fix in D7550 <https://phab.mercurial-scm.org/D7550> did not cover all
  situations correctly, but hopefully this fix does.
  
  The C side of chg will set CHGORIG_LC_CTYPE in its environment before starting
  the command server and before calling setenv on the command server.
  
  When calculating the environment hash, we use the value from CHGORIG_LC_CTYPE to
  calculate the hash - intentionally ignoring the modifications that Python may
  have done during command server startup.
  
  When chg calls setenv on the command server, the command server will see
  CHGORIG_LC_CTYPE in the environment-to-set, and NOT modify LC_CTYPE to be the
  same as in the environment-to-set. This preserves the modifications that Python
  has done during startup. We'll still calculate the hash using the
  CHGORIG_LC_CTYPE variables appropriately, so we'll detect environment changes
  (even if they don't cause a change in the actual value). Example:
  
  - LC_CTYPE=invalid_1 chg cmd
    - Py3.7 sets LC_CTYPE=C.UTF-8 on Linux
    - CHGORIG_LC_CTYPE=1invalid_1
    - Environment hash is as-if 'LC_CTYPE=invalid_1', even though it really is LC_CTYPE=C.UTF-8
  - LC_CTYPE=invalid_2 chg cmd
    - Connect to the existing server, call setenv
    - Calculate hash as-if 'LC_CTYPE=invalid_2', even though it is identical to the other command server (C.UTF-8)
  
  This isn't a huge issue in practice. It can cause two separate command servers
  that are functionally identical to be executed. This should not be considered an
  observable/intentional effect, and is something that may change in the future.
  
  This is hopefully a more future-proof fix than the original one in D7550 <https://phab.mercurial-scm.org/D7550>: we
  won't have to worry about behavior changes (or incorrect readings of the current
  behavior) in future versions of Python. If more environment variables end up
  being modified, it's a simple one line fix in chg.c to also preserve those.
  
  Important Caveat: if something causes one of these variables to change *inside*
  the hg serve process, we're going to end up persisting that value. Example:
  
  - Command server starts up, Python sets LC_CTYPE=C.UTF-8
  - Some extension sets LC_CTYPE=en_US.UTF-8 in the environment
  - The next invocation of chg will call setenv, saying via CHGORIG_LC_CTYPE that the variable should not be in the environment
  - chgserver.py will preserve LC_CTYPE=en_US.UTF-8
  
  This is quite unlikely and would previously have caused a different problem:
  
  - Command server starts up, let's assume py2 and so LC_CTYPE is unmodified
  - Some extension sets LC_CTYPE=en_US.UTF-8 in the environment
  - The next invocation of chg will call setenv, saying LC_CTYPE shouldn't be in the environment
  - chgserver.py will say that the environment hash doesn't match and redirect chg to a new server
  - chg will create that server and use that, but it'll have an identical hash to the previous one (since at startup LC_CTYPE isn't modified by the extension yet). This should be fine, it'll then run the command like normal.
  - Every time chg is run, it restarts the command server due to this issue, slowing everything down :)

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D8023

AFFECTED FILES
  mercurial/chgserver.py
  tests/test-chg.t

CHANGE DETAILS




To: spectral, #hg-reviewers
Cc: mjpieters, mercurial-devel
phabricator - Feb. 10, 2020, 7:29 p.m.
spectral added a comment.
spectral abandoned this revision.


  We decided to use a different method to resolve this issue.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8023/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8023

To: spectral, #hg-reviewers
Cc: mjpieters, mercurial-devel

Patch

diff --git a/tests/test-chg.t b/tests/test-chg.t
--- a/tests/test-chg.t
+++ b/tests/test-chg.t
@@ -332,8 +332,7 @@ 
   YYYY/MM/DD HH:MM:SS (PID)> log -R cached
   YYYY/MM/DD HH:MM:SS (PID)> loaded repo into cache: $TESTTMP/cached (in  ...s)
 
-Test that chg works even when python "coerces" the locale (py3.7+, which is done
-by default if none of LC_ALL, LC_CTYPE, or LANG are set in the environment)
+Test that chg works even when python "coerces" the locale (py3.7+)
 
   $ cat > $TESTTMP/debugenv.py <<EOF
   > from mercurial import encoding
@@ -345,11 +344,31 @@ 
   >     for k in [b'LC_ALL', b'LC_CTYPE', b'LANG']:
   >         v = encoding.environ.get(k)
   >         if v is not None:
-  >             ui.write(b'%s=%s\n' % (k, encoding.environ[k]))
+  >             ui.write(b'debugenv: %s=%s\n' % (k, encoding.environ[k]))
   > EOF
-  $ LANG= LC_ALL= LC_CTYPE= chg \
-  >    --config extensions.debugenv=$TESTTMP/debugenv.py debugenv
-  LC_ALL=
-  LC_CTYPE=C.UTF-8 (py37 !)
-  LC_CTYPE= (no-py37 !)
-  LANG=
+  $ CHGDEBUG= LANG= LC_ALL= LC_CTYPE= chg \
+  >    --config extensions.debugenv=$TESTTMP/debugenv.py debugenv 2>&1 \
+  >    | egrep 'debugenv|start'
+  chg: debug: * start cmdserver at * (glob)
+  debugenv: LC_ALL=
+  debugenv: LC_CTYPE=C.UTF-8 (py37 !)
+  debugenv: LC_CTYPE= (no-py37 !)
+  debugenv: LANG=
+(Should not trigger a command server restart)
+  $ CHGDEBUG= LANG= LC_ALL= LC_CTYPE= chg \
+  >    --config extensions.debugenv=$TESTTMP/debugenv.py debugenv 2>&1 \
+  >    | egrep 'debugenv|start'
+  debugenv: LC_ALL=
+  debugenv: LC_CTYPE=C.UTF-8 (py37 !)
+  debugenv: LC_CTYPE= (no-py37 !)
+  debugenv: LANG=
+(Should trigger a command server restart, even though LC_CTYPE on py37 ended up
+identical on py37)
+  $ CHGDEBUG= LANG= LC_ALL= LC_CTYPE=C chg \
+  >    --config extensions.debugenv=$TESTTMP/debugenv.py debugenv 2>&1 \
+  >    | egrep 'debugenv|start'
+  chg: debug: * start cmdserver at * (glob)
+  debugenv: LC_ALL=
+  debugenv: LC_CTYPE=C.UTF-8 (py37 !)
+  debugenv: LC_CTYPE=C (no-py37 !)
+  debugenv: LANG=
diff --git a/mercurial/chgserver.py b/mercurial/chgserver.py
--- a/mercurial/chgserver.py
+++ b/mercurial/chgserver.py
@@ -129,12 +129,37 @@ 
         ignored = {b'HG'}
     else:
         ignored = set()
-    envitems = [
-        (k, v)
+    envitems = {
+        k: v
         for k, v in pycompat.iteritems(encoding.environ)
         if _envre.match(k) and k not in ignored
-    ]
-    envhash = _hashlist(sorted(envitems))
+    }
+    # Py3 may modify environment variables before `hg serve` starts. When chg
+    # connects to us, it passes in its environment (without the modifications),
+    # and we'll generate a different config hash.
+    #
+    # To handle this situation, chg will pass the variables through using
+    # CHGORIG_<envname>=<0 if not in env, 1 if in env><value, if 1>, and we use
+    # *that* value only when calculating the hash. Examples:
+    #   CHGORIG_LC_CTYPE=0   <-- not in chg's env, delete for hash
+    #   CHGORIG_LC_CTYPE=1   <-- in chg's env, but empty. Set to empty for hash.
+    #   CHGORIG_LC_CTYPE=1UTF-8  <-- set to UTF-8 for hash.
+    #
+    # This does NOT cause us to lose Py3's modifications to these variables (we
+    # aren't copying this into our environment), so chg and hg should continue
+    # to behave the same - both will operate with the modifications Py3 made.
+    for k, v in pycompat.iteritems(encoding.environ):
+        if k.startswith(b'CHGORIG_'):
+            origname = k[len(b'CHGORIG_') :]
+            if not _envre.match(origname) or origname in ignored:
+                continue
+
+            if v.startswith(b'0'):
+                del envitems[origname]
+            else:
+                envitems[origname] = v[1:]
+
+    envhash = _hashlist(sorted(pycompat.iteritems(envitems)))
     return sectionhash[:6] + envhash[:6]
 
 
@@ -550,39 +575,12 @@ 
             raise ValueError(b'unexpected value in setenv request')
         self.ui.log(b'chgserver', b'setenv: %r\n', sorted(newenv.keys()))
 
-        # Python3 has some logic to "coerce" the C locale to a UTF-8 capable
-        # one, and it sets LC_CTYPE in the environment to C.UTF-8 if none of
-        # 'LC_CTYPE', 'LC_ALL' or 'LANG' are set (to any value). This can be
-        # disabled with PYTHONCOERCECLOCALE=0 in the environment.
-        #
-        # When fromui is called via _inithashstate, python has already set
-        # this, so that's in the environment right when we start up the hg
-        # process. Then chg will call us and tell us to set the environment to
-        # the one it has; this might NOT have LC_CTYPE, so we'll need to
-        # carry-forward the LC_CTYPE that was coerced in these situations.
-        #
-        # If this is not handled, we will fail config+env validation and fail
-        # to start chg. If this is just ignored instead of carried forward, we
-        # may have different behavior between chg and non-chg.
-        if pycompat.ispy3:
-            # Rename for wordwrapping purposes
-            oldenv = encoding.environ
-            if not any(
-                e.get(b'PYTHONCOERCECLOCALE') == b'0' for e in [oldenv, newenv]
-            ):
-                keys = [b'LC_CTYPE', b'LC_ALL', b'LANG']
-                old_keys = [k for k, v in oldenv.items() if k in keys and v]
-                new_keys = [k for k, v in newenv.items() if k in keys and v]
-                # If the user's environment (from chg) doesn't have ANY of the
-                # keys that python looks for, and the environment (from
-                # initialization) has ONLY LC_CTYPE and it's set to C.UTF-8,
-                # carry it forward.
-                if (
-                    not new_keys
-                    and old_keys == [b'LC_CTYPE']
-                    and oldenv[b'LC_CTYPE'] == b'C.UTF-8'
-                ):
-                    newenv[b'LC_CTYPE'] = oldenv[b'LC_CTYPE']
+        # We want to *preserve* these variables in the environment across the
+        # clear, since Python may have modified them during startup, and we want
+        # to preserve that modification.
+        preserve = [k[8:] for k in newenv.keys() if k.startswith(b'CHGORIG_')]
+        for k in preserve:
+            newenv[k] = encoding.environ[k]
 
         encoding.environ.clear()
         encoding.environ.update(newenv)