Patchwork email encoding help needed

login
register
mail settings
Submitter Augie Fackler
Date Feb. 15, 2019, 8:41 p.m.
Message ID <A31D43B5-43D9-4E86-8986-3B661F50DC65@durin42.com>
Download mbox | patch
Permalink /patch/38769/
State New
Headers show

Comments

Augie Fackler - Feb. 15, 2019, 8:41 p.m.
Howdy folks! We're down to only a few (single digits!) failing tests on Python 3, but one in particular has us stuck:

cd tests && python3 run-tests.py test-notify.t
running 1 tests using 1 parallel processes
Gregory Szorc - Feb. 15, 2019, 9:28 p.m.
https://www.mercurial-scm.org/repo/hg/rev/9b3be572ff0c documented my
findings when I looked at this a few days back.

Something feels "off" with regards to our handling of encodings here. But
I'm not sure exactly what we should change.

On Fri, Feb 15, 2019 at 12:41 PM Augie Fackler <raf@durin42.com> wrote:

> Howdy folks! We're down to only a few (single digits!) failing tests on
> Python 3, but one in particular has us stuck:
>
> cd tests && python3 run-tests.py test-notify.t
> running 1 tests using 1 parallel processes
>
> --- tests/test-notify.t
> +++ /tests/test-notify.t.err
> @@ -415,36 +415,28 @@
>    >   -m `"$PYTHON" -c
> 'print("\xc3\xa0\xc3\xa1\xc3\xa2\xc3\xa3\xc3\xa4")'`
>    $ hg --traceback --cwd b --encoding utf-8 pull ../a | \
>    >   "$PYTHON" $TESTTMP/filter.py
> +  error: incoming.notify hook raised an exception: 'ascii' codec can't
> encode characters in position 42-51: ordinal not in range(128)
> +  Traceback (most recent call last):
> +    File "hgtests.fckrh2v2/install/lib/python/mercurial/hook.py", line
> 98, in pythonhook
> +      r = obj(ui=ui, repo=repo, hooktype=htype,
> **pycompat.strkwargs(args))
> +    File "/hgtests.fckrh2v2/install/lib/python/hgext/notify.py", line
> 519, in hook
> +      n.send(ctx, count, data)
> +    File "hgtests.fckrh2v2/install/lib/python/hgext/notify.py", line 384,
> in send
> +      msg = mail.mimeencode(self.ui, payload, self.charsets, self.test)
> +    File "hgtests.fckrh2v2/install/lib/python/mercurial/mail.py", line
> 366, in mimeencode
> +      return mimetextqp(s, 'plain', cs)
> +    File "hgtests.fckrh2v2/install/lib/python/mercurial/mail.py", line
> 253, in mimetextqp
> +      msg.set_payload(body, cs)
> +    File "lib/python3.7/email/message.py", line 315, in set_payload
> +      payload = payload.encode(charset.output_charset)
> +  UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 42-51: ordinal not in range(128)
>
>
> The wrinkle is that the commit message comes from this:
>
>   $ hg --cwd a --encoding utf-8 commit -A -d '0 0' \
>   >   -m `"$PYTHON" -c 'print("\xc3\xa0\xc3\xa1\xc3\xa2\xc3\xa3\xc3\xa4")'`
>
>
> IOW, it's intentionally some UTF-8. For commit messages we can expect
> UTF8, but for patch bodies we're not so lucky, so I'm curious what we
> should do. Does anyone have an informed opinion on an encoding we should
> (or should not!) use for plain-text patches in message bodies? I'm pretty
> convinced at this point that we're doing invalid things in our emails
> today, and they're largely working by good fortune.
>
> Thanks,
> Augie

Patch

--- tests/test-notify.t
+++ /tests/test-notify.t.err
@@ -415,36 +415,28 @@ 
   >   -m `"$PYTHON" -c 'print("\xc3\xa0\xc3\xa1\xc3\xa2\xc3\xa3\xc3\xa4")'`
   $ hg --traceback --cwd b --encoding utf-8 pull ../a | \
   >   "$PYTHON" $TESTTMP/filter.py
+  error: incoming.notify hook raised an exception: 'ascii' codec can't encode characters in position 42-51: ordinal not in range(128)
+  Traceback (most recent call last):
+    File "hgtests.fckrh2v2/install/lib/python/mercurial/hook.py", line 98, in pythonhook
+      r = obj(ui=ui, repo=repo, hooktype=htype, **pycompat.strkwargs(args))
+    File "/hgtests.fckrh2v2/install/lib/python/hgext/notify.py", line 519, in hook
+      n.send(ctx, count, data)
+    File "hgtests.fckrh2v2/install/lib/python/hgext/notify.py", line 384, in send
+      msg = mail.mimeencode(self.ui, payload, self.charsets, self.test)
+    File "hgtests.fckrh2v2/install/lib/python/mercurial/mail.py", line 366, in mimeencode
+      return mimetextqp(s, 'plain', cs)
+    File "hgtests.fckrh2v2/install/lib/python/mercurial/mail.py", line 253, in mimetextqp
+      msg.set_payload(body, cs)
+    File "lib/python3.7/email/message.py", line 315, in set_payload
+      payload = payload.encode(charset.output_charset)
+  UnicodeEncodeError: 'ascii' codec can't encode characters in position 42-51: ordinal not in range(128)


The wrinkle is that the commit message comes from this:

  $ hg --cwd a --encoding utf-8 commit -A -d '0 0' \
  >   -m `"$PYTHON" -c 'print("\xc3\xa0\xc3\xa1\xc3\xa2\xc3\xa3\xc3\xa4")'`


IOW, it's intentionally some UTF-8. For commit messages we can expect UTF8, but for patch bodies we're not so lucky, so I'm curious what we should do. Does anyone have an informed opinion on an encoding we should (or should not!) use for plain-text patches in message bodies? I'm pretty convinced at this point that we're doing invalid things in our emails today, and they're largely working by good fortune.

Thanks,
Augie
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel