Patchwork [STABLE] encoding: alias cp65001 to utf-8 on Windows

login
register
mail settings
Submitter Yuya Nishihara
Date July 1, 2018, 3 p.m.
Message ID <b0f8bd19072c425b76e7.1530457248@mimosa>
Download mbox | patch
Permalink /patch/32549/
State New
Headers show

Comments

Yuya Nishihara - July 1, 2018, 3 p.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1530455813 -32400
#      Sun Jul 01 23:36:53 2018 +0900
# Branch stable
# Node ID b0f8bd19072c425b76e7df268b4c25ae41f701bc
# Parent  0b63a6743010dfdbf8a8154186e119949bdaa1cc
encoding: alias cp65001 to utf-8 on Windows

As far as I can tell, cp65001 is the Windows name for UTF-8. I don't know
how different it is from the UTF-8, but Python 3 appears to have introduced
new codec for cp65001, so the alias is enabled only for Python 2.

https://bugs.python.org/issue13216

This patch is untested, but hopefully fixes the following issue.

https://bitbucket.org/tortoisehg/thg/issues/5127/

Patch

diff --git a/mercurial/encoding.py b/mercurial/encoding.py
--- a/mercurial/encoding.py
+++ b/mercurial/encoding.py
@@ -72,6 +72,11 @@  else:
     '646': lambda: 'ascii',
     'ANSI_X3.4-1968': lambda: 'ascii',
 }
+# cp65001 is a Windows variant of utf-8, which isn't supported on Python 2.
+# No idea if it should be rewritten to the canonical name 'utf-8' on Python 3.
+# https://bugs.python.org/issue13216
+if pycompat.iswindows and not pycompat.ispy3:
+    _encodingfixers['cp65001'] = lambda: 'utf-8'
 
 try:
     encoding = environ.get("HGENCODING")