Patchwork D6052: global: use raw string for setlocale() argument

login
register
mail settings
Submitter phabricator
Date March 2, 2019, 9:26 p.m.
Message ID <differential-rev-PHID-DREV-ckkwrgxlp2cl4pgxqjzd-req@phab.mercurial-scm.org>
Download mbox | patch
Permalink /patch/39003/
State Superseded
Headers show

Comments

phabricator - March 2, 2019, 9:26 p.m.
indygreg created this revision.
Herald added a reviewer: durin42.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Otherwise Python 2 will coerce a unicode to str, which fails
  on HGUNICODEPEDANTRY=1.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6052

AFFECTED FILES
  hgext/histedit.py
  mercurial/crecord.py

CHANGE DETAILS




To: indygreg, durin42, #hg-reviewers
Cc: mercurial-devel
phabricator - March 2, 2019, 9:32 p.m.
indygreg added a subscriber: yuja.
indygreg added a comment.


  The context for this series is that `HGUNICODEPEDANTRY` is completely broken without it.
  
  For those not aware, on Python 2, if you call `sys.setdefaultencoding('undefined')`, it loads a special codec (https://docs.python.org/2/library/codecs.html#python-specific-encodings) which makes automatic coercion raise an exception instead of silently proceeding. It essentially finds all the places where you are using `str`/`unicode` improperly.
  
  After this series, the number of test failures in that mode is ~70. Most of the remaining failures are in `stringutil.wrap()`. Essentially `textwrap` from the standard library wants to operate on native `str`. Our version is feeding in `unicode`. We get into trouble when the standard library code does a `''.join()` and some `unicode` is coerced into `str`. TBH I'm not sure how we're not seeing bugs due to this on Python 2. I'm guessing we're lacking test coverage for something passing non-ASCII into `stringutil.wrap()`?
  
  @yuja you may be interested in the text wrapping issue.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6052

To: indygreg, durin42, #hg-reviewers
Cc: yuja, mercurial-devel
Yuya Nishihara - March 3, 2019, 10 a.m.
>   After this series, the number of test failures in that mode is ~70. Most of the remaining failures are in `stringutil.wrap()`. Essentially `textwrap` from the standard library wants to operate on native `str`. Our version is feeding in `unicode`. We get into trouble when the standard library code does a `''.join()` and some `unicode` is coerced into `str`. TBH I'm not sure how we're not seeing bugs due to this on Python 2. I'm guessing we're lacking test coverage for something passing non-ASCII into `stringutil.wrap()`?

Can you share some example of the test failure?

I'm not sure what's wrong with the textwrap. AFAIK, it's string-neutral.
`''.join()` would return a unicode string if one of the inputs is a unicode,
and the others are all ASCII bytes.
phabricator - March 3, 2019, 10:01 a.m.
yuja added a comment.


  >   After this series, the number of test failures in that mode is ~70. Most of the remaining failures are in `stringutil.wrap()`. Essentially `textwrap` from the standard library wants to operate on native `str`. Our version is feeding in `unicode`. We get into trouble when the standard library code does a `''.join()` and some `unicode` is coerced into `str`. TBH I'm not sure how we're not seeing bugs due to this on Python 2. I'm guessing we're lacking test coverage for something passing non-ASCII into `stringutil.wrap()`?
  
  Can you share some example of the test failure?
  
  I'm not sure what's wrong with the textwrap. AFAIK, it's string-neutral.
  `''.join()` would return a unicode string if one of the inputs is a unicode,
  and the others are all ASCII bytes.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6052

To: indygreg, durin42, #hg-reviewers
Cc: yuja, mercurial-devel
phabricator - March 3, 2019, 4:54 p.m.
indygreg added a comment.


  Yeah, `''.join([u''])` will coerce the `str` to `unicode`, so we're fine on that front. The problem is only with `HGUNICODEPEDANTRY=1`, which will raise during implicit `str` <-> `unicode` coercion.
  
  One can reproduce a failure with `HGUNICODEPEDANTRY=1 ./run-tests.py -l test-record.t`.
  
  I doubt the test suite has been remotely close to passing with `HGUNICODEPEDANTRY` for possibly years. I'm not sure it is even worth keeping around, since Python 3 has very similar functionality for free. I dunno.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D6052

To: indygreg, durin42, #hg-reviewers
Cc: yuja, mercurial-devel

Patch

diff --git a/mercurial/crecord.py b/mercurial/crecord.py
--- a/mercurial/crecord.py
+++ b/mercurial/crecord.py
@@ -30,7 +30,7 @@ 
 
 # This is required for ncurses to display non-ASCII characters in default user
 # locale encoding correctly.  --immerrr
-locale.setlocale(locale.LC_ALL, u'')
+locale.setlocale(locale.LC_ALL, r'')
 
 # patch comments based on the git one
 diffhelptext = _("""# To remove '-' lines, make them ' ' lines (context).
diff --git a/hgext/histedit.py b/hgext/histedit.py
--- a/hgext/histedit.py
+++ b/hgext/histedit.py
@@ -953,7 +953,7 @@ 
     # locale. This sets the locale to the user's default system
     # locale.
     import locale
-    locale.setlocale(locale.LC_ALL, u'')
+    locale.setlocale(locale.LC_ALL, r'')
 except ImportError:
     curses = None