Patchwork [v3] mail: take --encoding and HGENCODING into account

login
register
mail settings
Submitter Gábor Stefanik
Date Oct. 8, 2016, 10:48 a.m.
Message ID <d7125caa00dcc6036d3d.1475923712@waste.org>
Download mbox | patch
Permalink /patch/16915/
State Accepted
Headers show

Comments

Gábor Stefanik - Oct. 8, 2016, 10:48 a.m.
# HG changeset patch
# User Gábor Stefanik <gabor.stefanik@nng.com>
# Date 1475667922 -7200
#      Wed Oct 05 13:45:22 2016 +0200
# Node ID d7125caa00dcc6036d3d5aee3e4d524211b95e02
# Parent  dbcef8918bbdd8a64d9f79a37bcfa284a26f3a39
mail: take --encoding and HGENCODING into account

Fall back to our encoding strategy for sending MIME text
that's neither ASCII nor UTF-8.
Yuya Nishihara - Oct. 9, 2016, 5:49 a.m.
On Sat, 08 Oct 2016 05:48:32 -0500, Gábor Stefanik wrote:
> # HG changeset patch
> # User Gábor Stefanik <gabor.stefanik@nng.com>
> # Date 1475667922 -7200
> #      Wed Oct 05 13:45:22 2016 +0200
> # Node ID d7125caa00dcc6036d3d5aee3e4d524211b95e02
> # Parent  dbcef8918bbdd8a64d9f79a37bcfa284a26f3a39
> mail: take --encoding and HGENCODING into account
> 
> Fall back to our encoding strategy for sending MIME text
> that's neither ASCII nor UTF-8.

Queued this, thanks.

Patch

diff --git a/mercurial/mail.py b/mercurial/mail.py
--- a/mercurial/mail.py
+++ b/mercurial/mail.py
@@ -8,6 +8,7 @@ 
 from __future__ import absolute_import, print_function
 
 import email
+import email.charset
 import os
 import quopri
 import smtplib
@@ -203,24 +204,33 @@ 
             raise error.Abort(_('%r specified as email transport, '
                                'but not in PATH') % method)
 
+def codec2iana(cs):
+    ''''''
+    cs = email.charset.Charset(cs).input_charset.lower()
+
+    # "latin1" normalizes to "iso8859-1", standard calls for "iso-8859-1"
+    if cs.startswith("iso") and not cs.startswith("iso-"):
+        return "iso-" + cs[3:]
+    return cs
+
 def mimetextpatch(s, subtype='plain', display=False):
     '''Return MIME message suitable for a patch.
-    Charset will be detected as utf-8 or (possibly fake) us-ascii.
+    Charset will be detected by first trying to decode as us-ascii, then utf-8,
+    and finally the global encodings. If all those fail, fall back to
+    ISO-8859-1, an encoding with that allows all byte sequences.
     Transfer encodings will be used if necessary.'''
 
-    cs = 'us-ascii'
-    if not display:
+    cs = ['us-ascii', 'utf-8', encoding.encoding, encoding.fallbackencoding]
+    if display:
+        return mimetextqp(s, subtype, 'us-ascii')
+    for charset in cs:
         try:
-            s.decode('us-ascii')
+            s.decode(charset)
+            return mimetextqp(s, subtype, codec2iana(charset))
         except UnicodeDecodeError:
-            try:
-                s.decode('utf-8')
-                cs = 'utf-8'
-            except UnicodeDecodeError:
-                # We'll go with us-ascii as a fallback.
-                pass
+            pass
 
-    return mimetextqp(s, subtype, cs)
+    return mimetextqp(s, subtype, "iso-8859-1")
 
 def mimetextqp(body, subtype, charset):
     '''Return MIME message.
diff --git a/tests/test-patchbomb.t b/tests/test-patchbomb.t
--- a/tests/test-patchbomb.t
+++ b/tests/test-patchbomb.t
@@ -632,7 +632,7 @@ 
   $ hg commit -A -d '5 0' -m 'isolatin 8-bit encoding'
   adding isolatin
 
-fake ascii mbox:
+iso-8859-1 mbox:
   $ hg email --date '1970-1-1 0:5' -f quux -t foo -c bar -r tip -m mbox
   this patch series consists of 1 patches.
   
@@ -640,9 +640,9 @@ 
   sending [PATCH] isolatin 8-bit encoding ...
   $ cat mbox
   From quux ... ... .. ..:..:.. .... (re)
-  Content-Type: text/plain; charset="us-ascii"
+  Content-Type: text/plain; charset="iso-8859-1"
   MIME-Version: 1.0
-  Content-Transfer-Encoding: 8bit
+  Content-Transfer-Encoding: quoted-printable
   Subject: [PATCH] isolatin 8-bit encoding
   X-Mercurial-Node: 240fb913fc1b7ff15ddb9f33e73d82bf5277c720
   X-Mercurial-Series-Index: 1
@@ -667,7 +667,7 @@ 
   --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
   +++ b/isolatin	Thu Jan 01 00:00:05 1970 +0000
   @@ -0,0 +1,1 @@
-  +h\xf6mma! (esc)
+  +h=F6mma!