Patchwork [1,of,2,stable] compat: initialize LC_CTYPE locale on all Python versions and platforms

login
register
mail settings
Submitter Manuel Jacob
Date June 26, 2020, 7:39 a.m.
Message ID <314f194471625fe63adb.1593157194@tmp>
Download mbox | patch
Permalink /patch/46576/
State New
Headers show

Comments

Manuel Jacob - June 26, 2020, 7:39 a.m.
# HG changeset patch
# User Manuel Jacob <me@manueljacob.de>
# Date 1593137270 -7200
#      Fri Jun 26 04:07:50 2020 +0200
# Branch stable
# Node ID 314f194471625fe63adb3e76f9315e82e7f2994b
# Parent  2fd8a8c1127378bce6e6a4205fe7e3c62134cb99
# EXP-Topic init_locale
compat: initialize LC_CTYPE locale on all Python versions and platforms

Previously, the LC_CTYPE locale was not initialized according to user settings
on all Python versions (e.g. never on Python 2) and platforms (e.g. not on
some Python < 3.8 on Windows).

This broke e.g. non-ASCII filenames passed to the Subversion bindings on Python
2, resulting in error messages like "file:///tmp/a%C3%A4 does not look like a
Subversion repository to libsvn version 1.14.0".

The following command could be used to test this functionality. Adding it to the
test suite would be pointless, as the locale is always set to "C" during test
runs.

@command(b'check_initial_codeset', norepo=True)
def check_initial_codeset(ui):
    codeset1 = locale.nl_langinfo(locale.CODESET)
    locale.setlocale(locale.LC_ALL, '')
    codeset2 = locale.nl_langinfo(locale.CODESET)
    assert codeset1 == codeset2

Patch

diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
--- a/mercurial/pycompat.py
+++ b/mercurial/pycompat.py
@@ -13,6 +13,7 @@ 
 import getopt
 import inspect
 import json
+import locale
 import os
 import shlex
 import sys
@@ -93,6 +94,26 @@ 
     return _rapply(f, xs)
 
 
+# Passing the '' locale means that the locale should be set according to the
+# user settings (environment variables).
+# Python sometimes avoids setting the global locale settings. When interfacing
+# with C code (e.g. the curses module or the Subversion bindings), the global
+# locale settings must be initialized correctly. Python 2 does not initialize
+# the global locale settings on interpreter startup. Python 3 sometimes
+# initializes LC_CTYPE, but not consistently at least on Windows. Therefore we
+# explicitly initialize it to get consistent behavior if it's not already
+# initialized. Since CPython commit 177d921c8c03d30daa32994362023f777624b10d,
+# LC_CTYPE is always initialized. If we require Python 3.8+, we should re-check
+# if we can remove this code.
+if locale.setlocale(locale.LC_CTYPE, None) == 'C':
+    try:
+        locale.setlocale(locale.LC_CTYPE, '')
+    except locale.Error:
+        # The likely case is that the locale from the environment variables is
+        # unknown.
+        pass
+
+
 if ispy3:
     import builtins
     import codecs