Patchwork [5,of,7] py3: work around weird handling of bytes/unicode in decode_header()

login
register
mail settings
Submitter Yuya Nishihara
Date April 8, 2018, 9:09 a.m.
Message ID <d08cb276664c86807fc4.1523178588@mimosa>
Download mbox | patch
Permalink /patch/30561/
State Accepted
Headers show

Comments

Yuya Nishihara - April 8, 2018, 9:09 a.m.
# HG changeset patch
# User Yuya Nishihara <yuya@tcha.org>
# Date 1523168550 -32400
#      Sun Apr 08 15:22:30 2018 +0900
# Node ID d08cb276664c86807fc497b0c7b3980d3752b613
# Parent  323e34785dacaad98d06dd3b15c2370779149e97
py3: work around weird handling of bytes/unicode in decode_header()

Basically decode_header() works as follows, and on Python 3, email headers
ARE UNICODE.

  def decode_header(header):
      if not ecre.search(header):  # ecre is unicode regexp
          return [(header, None)]  # so header is unicode string
      ... decode header into [(bytes_data, unicode_charset_name)]
      return collapsed

Patch

diff --git a/mercurial/mail.py b/mercurial/mail.py
--- a/mercurial/mail.py
+++ b/mercurial/mail.py
@@ -332,6 +332,11 @@  def headdecode(s):
                 continue
             except UnicodeDecodeError:
                 pass
+        # On Python 3, decode_header() may return either bytes or unicode
+        # depending on whether the header has =?<charset>? or not
+        if isinstance(part, type(u'')):
+            uparts.append(part)
+            continue
         try:
             uparts.append(part.decode('UTF-8'))
             continue