Submitter | Julien Cristau |
---|---|
Date | March 3, 2016, 8:44 p.m. |
Message ID | <20160303204426.GL6200@betterave.cristau.org> |
Download | mbox | patch |
Permalink | /patch/13585/ |
State | Accepted |
Delegated to: | Yuya Nishihara |
Headers | show |
Comments
Julien Cristau <jcristau@debian.org> wrote: > >> > Reported at https://bugs.debian.org/737498 >> >> You should probably immediately relay such reports upstream. You should actually file a bug in bz.mercurial-scm.org for this issue, you should be able to change the bts bug to link to it. Once you do that, you should use the issue number in the commit description (see check-commit).
On Thu, 3 Mar 2016 21:44:26 +0100, Julien Cristau wrote: > # HG changeset patch > # User Julien Cristau <julien.cristau@logilab.fr> > # Date 1457026459 -3600 > # Thu Mar 03 18:34:19 2016 +0100 > # Node ID 981e5fd56a9973e0069173b5f6c03639d9e176aa > # Parent e00e57d836535aadcb13337613d2f891492d8e04 > patch: when importing from email, RFC2047-decode From/Subject headers Looks good. Pushed to the clowncopter, thanks. > +def headdecode(s): > + '''Decodes RFC-2047 header''' > + uparts = [] > + for part, charset in email.Header.decode_header(s): > + if charset is not None: > + try: > + uparts.append(part.decode(charset)) > + continue > + except UnicodeDecodeError: > + pass > + try: > + uparts.append(part.decode('UTF-8')) > + continue > + except UnicodeDecodeError: > + pass > + uparts.append(part.decode('ISO-8859-1')) FWIW, email.charsets might be useful as a fallback charset. https://www.selenic.com/mercurial/hgrc.5.html#email
Patch
diff --git a/mercurial/mail.py b/mercurial/mail.py --- a/mercurial/mail.py +++ b/mercurial/mail.py @@ -332,3 +332,21 @@ def mimeencode(ui, s, charsets=None, dis if not display: s, cs = _encode(ui, s, charsets) return mimetextqp(s, 'plain', cs) + +def headdecode(s): + '''Decodes RFC-2047 header''' + uparts = [] + for part, charset in email.Header.decode_header(s): + if charset is not None: + try: + uparts.append(part.decode(charset)) + continue + except UnicodeDecodeError: + pass + try: + uparts.append(part.decode('UTF-8')) + continue + except UnicodeDecodeError: + pass + uparts.append(part.decode('ISO-8859-1')) + return encoding.tolocal(u' '.join(uparts).encode('UTF-8')) diff --git a/mercurial/patch.py b/mercurial/patch.py --- a/mercurial/patch.py +++ b/mercurial/patch.py @@ -31,6 +31,7 @@ from . import ( diffhelpers, encoding, error, + mail, mdiff, pathutil, scmutil, @@ -210,8 +211,8 @@ def extract(ui, fileobj): try: msg = email.Parser.Parser().parse(fileobj) - subject = msg['Subject'] - data['user'] = msg['From'] + subject = msg['Subject'] and mail.headdecode(msg['Subject']) + data['user'] = msg['From'] and mail.headdecode(msg['From']) if not subject and not data['user']: # Not an email, restore parsed headers if any subject = '\n'.join(': '.join(h) for h in msg.items()) + '\n' diff --git a/tests/test-import-git.t b/tests/test-import-git.t --- a/tests/test-import-git.t +++ b/tests/test-import-git.t @@ -822,4 +822,27 @@ Test corner case involving copies and mu > EOF applying patch from stdin +Test email metadata + + $ hg revert -qa + $ hg --encoding utf-8 import - <<EOF + > From: =?UTF-8?q?Rapha=C3=ABl=20Hertzog?= <hertzog@debian.org> + > Subject: [PATCH] =?UTF-8?q?=C5=A7=E2=82=AC=C3=9F=E1=B9=AA?= + > + > diff --git a/a b/a + > --- a/a + > +++ b/a + > @@ -1,1 +1,2 @@ + > a + > +a + > EOF + applying patch from stdin + $ hg --encoding utf-8 log -r . + changeset: 2:* (glob) + tag: tip + user: Rapha\xc3\xabl Hertzog <hertzog@debian.org> (esc) + date: * (glob) + summary: \xc5\xa7\xe2\x82\xac\xc3\x9f\xe1\xb9\xaa (esc) + + $ cd ..