Submitter | Manuel Jacob |
---|---|
Date | June 26, 2020, 8:35 a.m. |
Message ID | <36dcc87c0c15d8757ef0.1593160524@tmp> |
Download | mbox | patch |
Permalink | /patch/46578/ |
State | Accepted |
Headers | show |
Comments
On Fri, 26 Jun 2020 10:35:24 +0200, Manuel Jacob wrote: > # HG changeset patch > # User Manuel Jacob <me@manueljacob.de> > # Date 1593157054 -7200 > # Fri Jun 26 09:37:34 2020 +0200 > # Branch stable > # Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1 > # Parent 9bb2fc4ac22b0b986d56fd13be4e6b524af7b648 > # EXP-Topic init_locale > curses: do not initialize LC_ALL to user settings (issue6358) Queued these to default, thanks. The issue to be fixed is relatively small, but the setlocale() change might have unwanted side effect.
On 2020-06-26 13:47, Yuya Nishihara wrote: > On Fri, 26 Jun 2020 10:35:24 +0200, Manuel Jacob wrote: >> # HG changeset patch >> # User Manuel Jacob <me@manueljacob.de> >> # Date 1593157054 -7200 >> # Fri Jun 26 09:37:34 2020 +0200 >> # Branch stable >> # Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1 >> # Parent 9bb2fc4ac22b0b986d56fd13be4e6b524af7b648 >> # EXP-Topic init_locale >> curses: do not initialize LC_ALL to user settings (issue6358) > > Queued these to default, thanks. The issue to be fixed is relatively > small, > but the setlocale() change might have unwanted side effect. As mentioned in the first patch, non-ASCII filenames in hgext.convert.subversion are fixed by it. If I make further non-ASCII fixes to hgext.convert.subversion, should do they go to stable (for testing, I can ensure that LC_CTYPE is initialized otherwise on Python 2) or default? For me as a user, it doesn’t make a difference, as I don’t use Subversion (especially not with non-ASCII filenames).
On Fri, 26 Jun 2020 14:42:58 +0200, Manuel Jacob wrote: > On 2020-06-26 13:47, Yuya Nishihara wrote: > > On Fri, 26 Jun 2020 10:35:24 +0200, Manuel Jacob wrote: > >> # HG changeset patch > >> # User Manuel Jacob <me@manueljacob.de> > >> # Date 1593157054 -7200 > >> # Fri Jun 26 09:37:34 2020 +0200 > >> # Branch stable > >> # Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1 > >> # Parent 9bb2fc4ac22b0b986d56fd13be4e6b524af7b648 > >> # EXP-Topic init_locale > >> curses: do not initialize LC_ALL to user settings (issue6358) > > > > Queued these to default, thanks. The issue to be fixed is relatively > > small, > > but the setlocale() change might have unwanted side effect. > > As mentioned in the first patch, non-ASCII filenames in > hgext.convert.subversion are fixed by it. If I make further non-ASCII > fixes to hgext.convert.subversion, should do they go to stable (for > testing, I can ensure that LC_CTYPE is initialized otherwise on Python > 2) or default? For me as a user, it doesn’t make a difference, as I > don’t use Subversion (especially not with non-ASCII filenames). If the scope of the change is narrowed down to the conversion extension, I think it may go stable. As a user, I never use non-ASCII filenames, too.
On Fri, 26 Jun 2020 10:35:24 +0200, Manuel Jacob wrote: > # HG changeset patch > # User Manuel Jacob <me@manueljacob.de> > # Date 1593157054 -7200 > # Fri Jun 26 09:37:34 2020 +0200 > # Branch stable > # Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1 > # Parent 9bb2fc4ac22b0b986d56fd13be4e6b524af7b648 > # EXP-Topic init_locale > curses: do not initialize LC_ALL to user settings (issue6358) > > 701341f57ceb moved the setlocale() call to right before curses was used. This > didn’t fully solve the problem it was supposed to solve (locale-dependent > functions, like date formatting/parsing), but only postponed it. > > Initializing LC_CTYPE seems to be sufficient for curses to work correctly. > Luckily this is already done at interpreter startup on modern Python versions > and, since recently, by Mercurial in the pycompat module in all other cases. Sorry to be a bit late, but I noticed a problem on Python 2. LC_CTYPE changes the character classification rules, which unfortunately changes the behavior of bytes.is*() functions on Python 2. https://man7.org/linux/man-pages/man7/locale.7.html https://docs.python.org/2.7/library/stdtypes.html?highlight=isalpha#str.isalnum % python -c 'import locale; locale.setlocale(locale.LC_CTYPE, "de_DE"); print(b"\xe4".isalpha())' True % python3 -c 'import locale; locale.setlocale(locale.LC_CTYPE, "de_DE"); print(b"\xe4".isalpha())' False So, on Python 2, this patch would make template syntax locale-dependent for example.
On 2020-06-27 05:27, Yuya Nishihara wrote: > On Fri, 26 Jun 2020 10:35:24 +0200, Manuel Jacob wrote: >> # HG changeset patch >> # User Manuel Jacob <me@manueljacob.de> >> # Date 1593157054 -7200 >> # Fri Jun 26 09:37:34 2020 +0200 >> # Branch stable >> # Node ID 36dcc87c0c15d8757ef0322c820861b1d10350c1 >> # Parent 9bb2fc4ac22b0b986d56fd13be4e6b524af7b648 >> # EXP-Topic init_locale >> curses: do not initialize LC_ALL to user settings (issue6358) >> >> 701341f57ceb moved the setlocale() call to right before curses was >> used. This >> didn’t fully solve the problem it was supposed to solve >> (locale-dependent >> functions, like date formatting/parsing), but only postponed it. >> >> Initializing LC_CTYPE seems to be sufficient for curses to work >> correctly. >> Luckily this is already done at interpreter startup on modern Python >> versions >> and, since recently, by Mercurial in the pycompat module in all other >> cases. > > Sorry to be a bit late, but I noticed a problem on Python 2. > > LC_CTYPE changes the character classification rules, which > unfortunately > changes the behavior of bytes.is*() functions on Python 2. > > https://man7.org/linux/man-pages/man7/locale.7.html > https://docs.python.org/2.7/library/stdtypes.html?highlight=isalpha#str.isalnum > > % python -c 'import locale; locale.setlocale(locale.LC_CTYPE, > "de_DE"); print(b"\xe4".isalpha())' > True > % python3 -c 'import locale; locale.setlocale(locale.LC_CTYPE, > "de_DE"); print(b"\xe4".isalpha())' > False > > So, on Python 2, this patch would make template syntax locale-dependent > for > example. I spent quite some time trying to find out why Python 2 doesn’t initialize LC_CTYPE. Now I know! For curses, we can probably work around the problem by resetting LC_CTYPE after calling initscr() or after we close the curses UI. For Subversion it seems like LC_CTYPE must be set all the time instead of only during initialization, but I have to double-check. I can prepare more patches during this weekend. If that’s too long, feel free to backout the existing changes.
Patch
diff --git a/hgext/histedit.py b/hgext/histedit.py --- a/hgext/histedit.py +++ b/hgext/histedit.py @@ -201,7 +201,6 @@ termios = None import functools -import locale import os import struct @@ -1710,10 +1709,6 @@ ctxs = [] for i, r in enumerate(revs): ctxs.append(histeditrule(ui, repo[r], i)) - # Curses requires setting the locale or it will default to the C - # locale. This sets the locale to the user's default system - # locale. - locale.setlocale(locale.LC_ALL, '') rc = curses.wrapper(functools.partial(_chisteditmain, repo, ctxs)) curses.echo() curses.endwin() diff --git a/mercurial/crecord.py b/mercurial/crecord.py --- a/mercurial/crecord.py +++ b/mercurial/crecord.py @@ -10,7 +10,6 @@ from __future__ import absolute_import -import locale import os import re import signal @@ -574,9 +573,6 @@ """ ui.write(_(b'starting interactive selection\n')) chunkselector = curseschunkselector(headerlist, ui, operation) - # This is required for ncurses to display non-ASCII characters in - # default user locale encoding correctly. --immerrr - locale.setlocale(locale.LC_ALL, '') origsigtstp = sentinel = object() if util.safehasattr(signal, b'SIGTSTP'): origsigtstp = signal.getsignal(signal.SIGTSTP)