Patchwork D8022: chg: pass copies of some envvars so we can detect py37+ modifications

login
register
mail settings
Submitter phabricator
Date Jan. 28, 2020, 12:57 a.m.
Message ID <differential-rev-PHID-DREV-jzopkrfnbul7ilxvekvq-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44703/
State Superseded
Headers show

Comments

phabricator - Jan. 28, 2020, 12:57 a.m.
spectral created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Python3.7+ will "coerce" some variables in the environment to different values.
  Currently this is just LC_CTYPE, which will be coerced to the first of
  ["C.UTF-8", "C.utf8", "UTF-8"] that works on the platform if LC_CTYPE is
  interpreted as "C" (such as by being set to "C", or having LC_ALL set to "C" or
  other mechanisms) or is set to an invalid value for the platform (such as
  "LC_CTYPE=UTF-8" on Linux).
  
  This is a second attempt at the fix I did in D7550 <https://phab.mercurial-scm.org/D7550>. That fix was based off of an
  incorrect reading of the code in Python - I had misinterpreted a #if block that
  dealt with Android as applying in all situations, and missed some cases where
  this would happen and cause issues.

REPOSITORY
  rHG Mercurial

BRANCH
  default

REVISION DETAIL
  https://phab.mercurial-scm.org/D8022

AFFECTED FILES
  contrib/chg/chg.c

CHANGE DETAILS




To: spectral, #hg-reviewers
Cc: mercurial-devel
Yuya Nishihara - Jan. 28, 2020, 4:08 p.m.
Another idea. How about sending the true LC_CTYPE value as a command
argument so the env mangling can be reverted?

 1. spawn `hg serve --cmdserver chgunix ...` with
    `--daemon-postexec setenv:LC_CTYPE=$LC_CTYPE`
 2. `runservice()` overwrites `LC_CTYPE` to the specified value (undo coercing)
 3. chgserver hashes the restored environ

Since the "coercing" is 100% useless for Mercurial, restoring LC_CTYPE
should be perfectly fine. And unlike setting `PYTHONCOERCECLOCALE=0`,
Python subprocesses can still see its own "coercing" results for better
or worse.
phabricator - Jan. 28, 2020, 4:10 p.m.
yuja added a comment.


  Another idea. How about sending the true LC_CTYPE value as a command
  argument so the env mangling can be reverted?
  
  1. spawn `hg serve --cmdserver chgunix ...` with `--daemon-postexec setenv:LC_CTYPE=$LC_CTYPE`
  2. `runservice()` overwrites `LC_CTYPE` to the specified value (undo coercing)
  3. chgserver hashes the restored environ
  
  Since the "coercing" is 100% useless for Mercurial, restoring LC_CTYPE
  should be perfectly fine. And unlike setting `PYTHONCOERCECLOCALE=0`,
  Python subprocesses can still see its own "coercing" results for better
  or worse.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8022/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8022

To: spectral, #hg-reviewers
Cc: yuja, mercurial-devel
phabricator - Jan. 28, 2020, 6:57 p.m.
spectral added a comment.


  In D8022#118325 <https://phab.mercurial-scm.org/D8022#118325>, @yuja wrote:
  
  > Another idea. How about sending the true LC_CTYPE value as a command
  > argument so the env mangling can be reverted?
  >
  > 1. spawn `hg serve --cmdserver chgunix ...` with `--daemon-postexec setenv:LC_CTYPE=$LC_CTYPE`
  > 2. `runservice()` overwrites `LC_CTYPE` to the specified value (undo coercing)
  > 3. chgserver hashes the restored environ
  >
  > Since the "coercing" is 100% useless for Mercurial, restoring LC_CTYPE
  > should be perfectly fine. And unlike setting `PYTHONCOERCECLOCALE=0`,
  > Python subprocesses can still see its own "coercing" results for better
  > or worse.
  
  This would cause a difference in behavior between hg and chg. I don't know how big of an issue that would be.
  
  hg: starts up, python coerces LC_CTYPE, hg spawns a non-python subprocess, LC_CTYPE is set to the coerced value
  chg: starts up, python coerces LC_CTYPE, chg fixes it, hg spawns a non-python subprocess, LC_CTYPE is set to the original value (or unset).
  
  Another case where it matters is: LC_CTYPE *is* important for hg when using any curses programs (or really anything that uses the locale module). If you are on a mac and set your region settings to have a region and a primary language that aren't representable using the locale settings (such as region = Brazil, language = English), then LC_CTYPE starts off as "UTF-8" (not "C.UTF-8"). This is allowed on macOS and I think other BSDs, but Linux doesn't like it. If the LC_CTYPE variable is forwarded when sshing to a Linux machine, this breaks curses. I'm sure this wasn't intentional, but the way that Python3.7 coerces the locale, it also happens to fix this particular issue and set it to C.UTF-8, which *does* work on Linux and makes `hg commit --interactive` with ui.interface=curses work; otherwise it gets a traceback.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8022/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8022

To: spectral, #hg-reviewers
Cc: yuja, mercurial-devel
Yuya Nishihara - Jan. 28, 2020, 11:54 p.m.
>   This would cause a difference in behavior between hg and chg. I don't know how big of an issue that would be.
>
>   hg: starts up, python coerces LC_CTYPE, hg spawns a non-python subprocess, LC_CTYPE is set to the coerced value
>   chg: starts up, python coerces LC_CTYPE, chg fixes it, hg spawns a non-python subprocess, LC_CTYPE is set to the original value (or unset).
>   

I think a minor behavior difference is acceptable. I see this is the problem
of Python 3 design "the world is unicode", and IMHO we want to work around
the problem with minimal changes.

`PYTHONCOERCECLOCALE=0` was rejected because it may break Python subprocesses
that expects the "coercing" will occur, but restoring the cheated `LC_CTYPE`
should be fine. The chg behavior will be more correct, and non-Python
subprocesses should work in the original "C" environment.

>   Another case where it matters is: LC_CTYPE *is* important for hg when using any curses programs (or really anything that uses the locale module). If you are on a mac and set your region settings to have a region and a primary language that aren't representable using the locale settings (such as region = Brazil, language = English), then LC_CTYPE starts off as "UTF-8" (not "C.UTF-8"). This is allowed on macOS and I think other BSDs, but Linux doesn't like it. If the LC_CTYPE variable is forwarded when sshing to a Linux machine, this breaks curses. I'm sure this wasn't intentional, but the way that Python3.7 coerces the locale, it also happens to fix this particular issue and set it to C.UTF-8, which *does* work on Linux and makes `hg commit --interactive` with ui.interface=curses work; otherwise it gets a traceback.

Does the curses depend on the locale environment variable, not on the
`setlocale()`-ed process state?

Anyway, I think this is an environment issue and it's totally fine for
chg to crash (or abort) if the environment variable is incorrectly set.
phabricator - Jan. 29, 2020, 12:31 a.m.
yuja added a comment.


  >   This would cause a difference in behavior between hg and chg. I don't know how big of an issue that would be.
  >   hg: starts up, python coerces LC_CTYPE, hg spawns a non-python subprocess, LC_CTYPE is set to the coerced value
  >   chg: starts up, python coerces LC_CTYPE, chg fixes it, hg spawns a non-python subprocess, LC_CTYPE is set to the original value (or unset).
  
  I think a minor behavior difference is acceptable. I see this is the problem
  of Python 3 design "the world is unicode", and IMHO we want to work around
  the problem with minimal changes.
  
  `PYTHONCOERCECLOCALE=0` was rejected because it may break Python subprocesses
  that expects the "coercing" will occur, but restoring the cheated `LC_CTYPE`
  should be fine. The chg behavior will be more correct, and non-Python
  subprocesses should work in the original "C" environment.
  
  >   Another case where it matters is: LC_CTYPE *is* important for hg when using any curses programs (or really anything that uses the locale module). If you are on a mac and set your region settings to have a region and a primary language that aren't representable using the locale settings (such as region = Brazil, language = English), then LC_CTYPE starts off as "UTF-8" (not "C.UTF-8"). This is allowed on macOS and I think other BSDs, but Linux doesn't like it. If the LC_CTYPE variable is forwarded when sshing to a Linux machine, this breaks curses. I'm sure this wasn't intentional, but the way that Python3.7 coerces the locale, it also happens to fix this particular issue and set it to C.UTF-8, which *does* work on Linux and makes `hg commit --interactive` with ui.interface=curses work; otherwise it gets a traceback.
  
  Does the curses depend on the locale environment variable, not on the
  `setlocale()`-ed process state?
  
  Anyway, I think this is an environment issue and it's totally fine for
  chg to crash (or abort) if the environment variable is incorrectly set.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8022/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8022

To: spectral, #hg-reviewers
Cc: yuja, mercurial-devel
phabricator - Jan. 29, 2020, 7:52 p.m.
spectral added a comment.


  In D8022#118428 <https://phab.mercurial-scm.org/D8022#118428>, @yuja wrote:
  
  >>   This would cause a difference in behavior between hg and chg. I don't know how big of an issue that would be.
  >>   hg: starts up, python coerces LC_CTYPE, hg spawns a non-python subprocess, LC_CTYPE is set to the coerced value
  >>   chg: starts up, python coerces LC_CTYPE, chg fixes it, hg spawns a non-python subprocess, LC_CTYPE is set to the original value (or unset).
  >
  > I think a minor behavior difference is acceptable. I see this is the problem
  > of Python 3 design "the world is unicode", and IMHO we want to work around
  > the problem with minimal changes.
  
  Getting this implemented "correctly" using daemon_postexec is getting relatively complicated - there's no way to backwards- and forwards-compatibly do this since there's no capabilities mechanism for daemon_postexec. If I blindly do it, it fails when connecting to an older hg saying it doesn't know how to do handle that command.
  
  Much of the stuff being done in this stack is to handle the case with the setenv command (after the chg daemon is started). If we don't care about overwriting the python-modified LC_CTYPE, we can do this quite a bit easier:
  
  In execcmdserver, we copy LC_CTYPE to CHGORIG_LC_CTYPE, export that when starting
  In chgserver.py, when doing the initial startup process, check if CHGORIG_LC_CTYPE is in the environment, and overwrite LC_CTYPE with its value.
  
  I'll see how easy that is to implement and send a separate review request once I have it working.
  
  > `PYTHONCOERCECLOCALE=0` was rejected because it may break Python subprocesses
  > that expects the "coercing" will occur, but restoring the cheated `LC_CTYPE`
  > should be fine. The chg behavior will be more correct, and non-Python
  > subprocesses should work in the original "C" environment.
  >
  >>   Another case where it matters is: LC_CTYPE *is* important for hg when using any curses programs (or really anything that uses the locale module). If you are on a mac and set your region settings to have a region and a primary language that aren't representable using the locale settings (such as region = Brazil, language = English), then LC_CTYPE starts off as "UTF-8" (not "C.UTF-8"). This is allowed on macOS and I think other BSDs, but Linux doesn't like it. If the LC_CTYPE variable is forwarded when sshing to a Linux machine, this breaks curses. I'm sure this wasn't intentional, but the way that Python3.7 coerces the locale, it also happens to fix this particular issue and set it to C.UTF-8, which *does* work on Linux and makes `hg commit --interactive` with ui.interface=curses work; otherwise it gets a traceback.
  >
  > Does the curses depend on the locale environment variable, not on the
  > `setlocale()`-ed process state?
  
  chunkselector (crecord.py:578) calls `locale.setlocale(locale.LC_ALL, '')`, which calls `_setlocale(category, locale)`, which raises `locale.Error: unsupported locale setting`. :(
  
  > Anyway, I think this is an environment issue and it's totally fine for
  > chg to crash (or abort) if the environment variable is incorrectly set.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8022/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8022

To: spectral, #hg-reviewers
Cc: yuja, mercurial-devel
phabricator - Jan. 29, 2020, 10:24 p.m.
spectral added a comment.


  > I'll see how easy that is to implement and send a separate review request once I have it working.
  
  D8039 <https://phab.mercurial-scm.org/D8039>

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8022/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8022

To: spectral, #hg-reviewers
Cc: yuja, mercurial-devel
Yuya Nishihara - Jan. 30, 2020, 2:34 p.m.
>   Getting this implemented "correctly" using daemon_postexec is getting relatively complicated - there's no way to backwards- and forwards-compatibly do this since there's no capabilities mechanism for daemon_postexec. If I blindly do it, it fails when connecting to an older hg saying it doesn't know how to do handle that command.

For the record, chg doesn't need to support old hg versions. We did once
add `--daemon-postexec chdir:/` for example.

>   In execcmdserver, we copy LC_CTYPE to CHGORIG_LC_CTYPE, export that when starting
>   In chgserver.py, when doing the initial startup process, check if CHGORIG_LC_CTYPE is in the environment, and overwrite LC_CTYPE with its value.
>   
>   I'll see how easy that is to implement and send a separate review request once I have it working.

Thanks. I'll review it soon.
phabricator - Jan. 30, 2020, 2:41 p.m.
yuja added a comment.


  >   Getting this implemented "correctly" using daemon_postexec is getting relatively complicated - there's no way to backwards- and forwards-compatibly do this since there's no capabilities mechanism for daemon_postexec. If I blindly do it, it fails when connecting to an older hg saying it doesn't know how to do handle that command.
  
  For the record, chg doesn't need to support old hg versions. We did once
  add `--daemon-postexec chdir:/` for example.
  
  >   In execcmdserver, we copy LC_CTYPE to CHGORIG_LC_CTYPE, export that when starting
  >   In chgserver.py, when doing the initial startup process, check if CHGORIG_LC_CTYPE is in the environment, and overwrite LC_CTYPE with its value.
  >   I'll see how easy that is to implement and send a separate review request once I have it working.
  
  Thanks. I'll review it soon.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D8022/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D8022

To: spectral, #hg-reviewers
Cc: yuja, mercurial-devel

Patch

diff --git a/contrib/chg/chg.c b/contrib/chg/chg.c
--- a/contrib/chg/chg.c
+++ b/contrib/chg/chg.c
@@ -203,6 +203,39 @@ 
 	return hgcmd;
 }
 
+static void putsidechannelenv(const char *envname)
+{
+	char *buf, *bufp;
+	const char *existing_val = getenv(envname);
+	size_t existing_val_len = 0;
+	size_t envname_len = strlen(envname);
+	size_t buf_len;
+
+	if (existing_val != NULL) {
+		existing_val_len = strlen(existing_val);
+	}
+
+        /* 11: 8 for CHGORIG_, 1 for the =, 1 for the "exists" flag, 1 for the
+         * nul */
+	buf_len = envname_len + existing_val_len + 11;
+	bufp = buf = mallocx(buf_len);
+	strlcpy(bufp, "CHGORIG_", 8);
+	bufp += 8;  /* strlen("CHGORIG_") */
+	strlcpy(bufp, envname, envname_len);
+	bufp += envname_len;
+	*bufp++ = '=';
+	*bufp++ = (existing_val == NULL) ? '0' : '1';
+	if (existing_val != NULL && existing_val_len > 0) {
+		strlcpy(bufp, existing_val, existing_val_len);
+		bufp += existing_val_len;
+	}
+	*bufp = '\0';
+	if (putenv(buf) != 0) {
+		abortmsgerrno("failed to putenv for stored vars");
+        }
+        /* don't free `buf`, putenv does NOT copy the string! */
+}
+
 static void execcmdserver(const struct cmdserveropts *opts)
 {
 	const char *hgcmd = gethgcmd();
@@ -410,6 +443,14 @@ 
 		         "wrapper to chg. Alternatively, set $CHGHG to the "
 		         "path of real hg.");
 
+	char *coerce = getenv("PYTHONCOERCECLOCALE");
+	if (coerce == NULL || coerce[0] != '0') {
+		/* Py3 modifies the environment on execution, we need a side
+		 * channel to get an unmodified environment to `hg serve` for
+		 * some variables */
+		putsidechannelenv("LC_CTYPE");
+	}
+
 	if (isunsupported(argc - 1, argv + 1))
 		execoriginalhg(argv);