Patchwork [stable] pycompat: fix crash when default locale is unknown

login
register
mail settings
Submitter Manuel Jacob
Date June 24, 2020, 2:25 a.m.
Message ID <5d2b9295f7607e6f1bc6.1592965557@tmp>
Download mbox | patch
Permalink /patch/46554/
State Accepted
Headers show

Comments

Manuel Jacob - June 24, 2020, 2:25 a.m.
# HG changeset patch
# User Manuel Jacob <me@manueljacob.de>
# Date 1592965534 -7200
#      Wed Jun 24 04:25:34 2020 +0200
# Branch stable
# Node ID 5d2b9295f7607e6f1bc6004af2948f95c83ec4ee
# Parent  3d41172f2ac9ae7b644f3e5f239c28e895f1c4fa
# EXP-Topic locale
pycompat: fix crash when default locale is unknown

Instead, fall back to the filesystem encoding if the default locale is unknown.
Yuya Nishihara - June 24, 2020, 11:36 a.m.
On Wed, 24 Jun 2020 04:25:57 +0200, Manuel Jacob wrote:
> # HG changeset patch
> # User Manuel Jacob <me@manueljacob.de>
> # Date 1592965534 -7200
> #      Wed Jun 24 04:25:34 2020 +0200
> # Branch stable
> # Node ID 5d2b9295f7607e6f1bc6004af2948f95c83ec4ee
> # Parent  3d41172f2ac9ae7b644f3e5f239c28e895f1c4fa
> # EXP-Topic locale
> pycompat: fix crash when default locale is unknown
> 
> Instead, fall back to the filesystem encoding if the default locale is unknown.
> 
> diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
> --- a/mercurial/pycompat.py
> +++ b/mercurial/pycompat.py
> @@ -178,9 +178,16 @@
>          if os.name == r'nt':
>              sysargv = [a.encode("mbcs", "ignore") for a in sys.argv]
>          else:
> +
> +            def getdefaultlocale_if_known():
> +                try:
> +                    return locale.getdefaultlocale()
> +                except ValueError:
> +                    return None, None
> +
>              encoding = (
>                  locale.getlocale()[1]
> -                or locale.getdefaultlocale()[1]
> +                or getdefaultlocale_if_known()[1]

Seems fine, but it might be more correct to remove getdefaultlocale() at all.

The comment mentions mbstowcs(), which is a locale-dependent function.
If setlocale(LC_CTYPE, '') wouldn't be called yet, I think mbstowcs()
would behave as if the locale were C. So locale.getlocale(LC_CTYPE) should
return the encoding which mbstowcs() would see.
Manuel Jacob - June 24, 2020, 12:26 p.m.
On 2020-06-24 13:36, Yuya Nishihara wrote:
> On Wed, 24 Jun 2020 04:25:57 +0200, Manuel Jacob wrote:
>> # HG changeset patch
>> # User Manuel Jacob <me@manueljacob.de>
>> # Date 1592965534 -7200
>> #      Wed Jun 24 04:25:34 2020 +0200
>> # Branch stable
>> # Node ID 5d2b9295f7607e6f1bc6004af2948f95c83ec4ee
>> # Parent  3d41172f2ac9ae7b644f3e5f239c28e895f1c4fa
>> # EXP-Topic locale
>> pycompat: fix crash when default locale is unknown
>> 
>> Instead, fall back to the filesystem encoding if the default locale is 
>> unknown.
>> 
>> diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
>> --- a/mercurial/pycompat.py
>> +++ b/mercurial/pycompat.py
>> @@ -178,9 +178,16 @@
>>          if os.name == r'nt':
>>              sysargv = [a.encode("mbcs", "ignore") for a in sys.argv]
>>          else:
>> +
>> +            def getdefaultlocale_if_known():
>> +                try:
>> +                    return locale.getdefaultlocale()
>> +                except ValueError:
>> +                    return None, None
>> +
>>              encoding = (
>>                  locale.getlocale()[1]
>> -                or locale.getdefaultlocale()[1]
>> +                or getdefaultlocale_if_known()[1]
> 
> Seems fine, but it might be more correct to remove getdefaultlocale() 
> at all.

It would be more correct to use os.fsencode(). I am preparing a patch 
for that. However, this patch should probably go to default to not 
introduce regressions.

> The comment mentions mbstowcs(), which is a locale-dependent function.
> If setlocale(LC_CTYPE, '') wouldn't be called yet, I think mbstowcs()
> would behave as if the locale were C. So locale.getlocale(LC_CTYPE) 
> should
> return the encoding which mbstowcs() would see.

Python 3 sets `setlocale(LC_CTYPE, '')` on interpreter startup on 
non-Windows systems, which is the case we’re interested in here. (As an 
unrelated side node, another in-preparation patch calls 
`setlocale(LC_CTYPE, '')` on all platforms.)

I also think that calling `locale.getdefaultlocale()` is unnecessary. 
But as mentioned above, I didn’t want to introduce a regression.

I’d still like to have this patch go to stable, as the tests for 
upcoming patches (fixing the problem that using curses messes with the 
locales) will depend on it.
Yuya Nishihara - June 24, 2020, 12:54 p.m.
On Wed, 24 Jun 2020 14:26:48 +0200, Manuel Jacob wrote:
> On 2020-06-24 13:36, Yuya Nishihara wrote:
> > On Wed, 24 Jun 2020 04:25:57 +0200, Manuel Jacob wrote:
> >> # HG changeset patch
> >> # User Manuel Jacob <me@manueljacob.de>
> >> # Date 1592965534 -7200
> >> #      Wed Jun 24 04:25:34 2020 +0200
> >> # Branch stable
> >> # Node ID 5d2b9295f7607e6f1bc6004af2948f95c83ec4ee
> >> # Parent  3d41172f2ac9ae7b644f3e5f239c28e895f1c4fa
> >> # EXP-Topic locale
> >> pycompat: fix crash when default locale is unknown

> > Seems fine, but it might be more correct to remove getdefaultlocale() 
> > at all.
> 
> It would be more correct to use os.fsencode(). I am preparing a patch 
> for that.

It was previously fsencode(). I'm not convinced that the use of getlocale()
is more correct, but I didn't wanna follow every detail of the Py3 unicode
shit. No idea why they can't simply add sys.argvb.

> However, this patch should probably go to default to not 
> introduce regressions.

Okay, queued for stable, thanks.
Manuel Jacob - June 24, 2020, 1 p.m.
On 2020-06-24 14:54, Yuya Nishihara wrote:
> On Wed, 24 Jun 2020 14:26:48 +0200, Manuel Jacob wrote:
>> On 2020-06-24 13:36, Yuya Nishihara wrote:
>> > On Wed, 24 Jun 2020 04:25:57 +0200, Manuel Jacob wrote:
>> >> # HG changeset patch
>> >> # User Manuel Jacob <me@manueljacob.de>
>> >> # Date 1592965534 -7200
>> >> #      Wed Jun 24 04:25:34 2020 +0200
>> >> # Branch stable
>> >> # Node ID 5d2b9295f7607e6f1bc6004af2948f95c83ec4ee
>> >> # Parent  3d41172f2ac9ae7b644f3e5f239c28e895f1c4fa
>> >> # EXP-Topic locale
>> >> pycompat: fix crash when default locale is unknown
> 
>> > Seems fine, but it might be more correct to remove getdefaultlocale()
>> > at all.
>> 
>> It would be more correct to use os.fsencode(). I am preparing a patch
>> for that.
> 
> It was previously fsencode(). I'm not convinced that the use of 
> getlocale()
> is more correct, but I didn't wanna follow every detail of the Py3 
> unicode
> shit. No idea why they can't simply add sys.argvb.

I seems like the move away from os.fsencode() (in 00e0c5c06ed5e) was to 
fix the Windows case. See the other patch. :)

>> However, this patch should probably go to default to not
>> introduce regressions.
> 
> Okay, queued for stable, thanks.

Patch

diff --git a/mercurial/pycompat.py b/mercurial/pycompat.py
--- a/mercurial/pycompat.py
+++ b/mercurial/pycompat.py
@@ -178,9 +178,16 @@ 
         if os.name == r'nt':
             sysargv = [a.encode("mbcs", "ignore") for a in sys.argv]
         else:
+
+            def getdefaultlocale_if_known():
+                try:
+                    return locale.getdefaultlocale()
+                except ValueError:
+                    return None, None
+
             encoding = (
                 locale.getlocale()[1]
-                or locale.getdefaultlocale()[1]
+                or getdefaultlocale_if_known()[1]
                 or sys.getfilesystemencoding()
             )
             sysargv = [a.encode(encoding, "surrogateescape") for a in sys.argv]
diff --git a/tests/test-locale.t b/tests/test-locale.t
new file mode 100644
--- /dev/null
+++ b/tests/test-locale.t
@@ -0,0 +1,2 @@ 
+  $ LANG=nonexistent LC_ALL=nonexistent LANGUAGE=nonexistent hg version -q
+  Mercurial Distributed SCM (version *) (glob)