Patchwork [2,of,2,STABLE] py3: use native strings when forming email headers in patchbomb

login
register
mail settings
Submitter Denis Laxalde
Date Oct. 25, 2019, 1:02 p.m.
Message ID <6058175493d062dbaec2.1572008531@steppe.local>
Download mbox | patch
Permalink /patch/42591/
State New
Headers show

Comments

Denis Laxalde - Oct. 25, 2019, 1:02 p.m.
# HG changeset patch
# User Denis Laxalde <denis@laxalde.org>
# Date 1572008488 -7200
#      Fri Oct 25 15:01:28 2019 +0200
# Branch stable
# Node ID 6058175493d062dbaec2cb0e897e49da4596d147
# Parent  17fad3f1b4742d75db23e17a04c85223bcc239a2
py3: use native strings when forming email headers in patchbomb

We use raw strings for headers' keys and native strings for values. This
fixes "hg email" with a changeset with a message containing non-ascii
characters. This also removes the "if pycompat.ispy3:" TODO.
Yuya Nishihara - Oct. 26, 2019, 1:37 a.m.
On Fri, 25 Oct 2019 15:02:11 +0200, Denis Laxalde wrote:
> # HG changeset patch
> # User Denis Laxalde <denis@laxalde.org>
> # Date 1572008488 -7200
> #      Fri Oct 25 15:01:28 2019 +0200
> # Branch stable
> # Node ID 6058175493d062dbaec2cb0e897e49da4596d147
> # Parent  17fad3f1b4742d75db23e17a04c85223bcc239a2
> py3: use native strings when forming email headers in patchbomb

I feel this change is too big to be in stable.

> -        # Fix up all headers to be native strings.
> -        # TODO(durin42): this should probably be cleaned up above in the future.
> -        if pycompat.ispy3:
> -            for hdr, val in list(m.items()):

Can't we somehow get around the issue here?
Denis Laxalde - Oct. 28, 2019, 9:09 a.m.
Yuya Nishihara a écrit :
> On Fri, 25 Oct 2019 15:02:11 +0200, Denis Laxalde wrote:
> > # HG changeset patch
> > # User Denis Laxalde <denis@laxalde.org>
> > # Date 1572008488 -7200
> > #      Fri Oct 25 15:01:28 2019 +0200
> > # Branch stable
> > # Node ID 6058175493d062dbaec2cb0e897e49da4596d147
> > # Parent  17fad3f1b4742d75db23e17a04c85223bcc239a2
> > py3: use native strings when forming email headers in patchbomb
> 
> I feel this change is too big to be in stable.
> 
> > -        # Fix up all headers to be native strings.
> > -        # TODO(durin42): this should probably be cleaned up above in the future.
> > -        if pycompat.ispy3:
> > -            for hdr, val in list(m.items()):
> 
> Can't we somehow get around the issue here?

Well, I'm not sure I'd be interested in implementing a workaround,
especially here since the change aims at removing this TODO while making
the code Python 3 compatible. Fixing the issue is a side effect somehow.
Yuya Nishihara - Oct. 28, 2019, 2:57 p.m.
On Mon, 28 Oct 2019 10:09:22 +0100, Denis Laxalde wrote:
> Yuya Nishihara a écrit :
> > On Fri, 25 Oct 2019 15:02:11 +0200, Denis Laxalde wrote:
> > > # HG changeset patch
> > > # User Denis Laxalde <denis@laxalde.org>
> > > # Date 1572008488 -7200
> > > #      Fri Oct 25 15:01:28 2019 +0200
> > > # Branch stable
> > > # Node ID 6058175493d062dbaec2cb0e897e49da4596d147
> > > # Parent  17fad3f1b4742d75db23e17a04c85223bcc239a2
> > > py3: use native strings when forming email headers in patchbomb
> > 
> > I feel this change is too big to be in stable.
> > 
> > > -        # Fix up all headers to be native strings.
> > > -        # TODO(durin42): this should probably be cleaned up above in the future.
> > > -        if pycompat.ispy3:
> > > -            for hdr, val in list(m.items()):
> > 
> > Can't we somehow get around the issue here?
> 
> Well, I'm not sure I'd be interested in implementing a workaround,
> especially here since the change aims at removing this TODO while making
> the code Python 3 compatible. Fixing the issue is a side effect somehow.

Okay, then let's move the series to default branch if you aren't interested
in minimizing the stable changes.

There are too much py3 changes after the freeze.
Augie Fackler - Oct. 28, 2019, 7:16 p.m.
On Mon, Oct 28, 2019 at 11:57:38PM +0900, Yuya Nishihara wrote:
> On Mon, 28 Oct 2019 10:09:22 +0100, Denis Laxalde wrote:
> > Yuya Nishihara a écrit :
> > > On Fri, 25 Oct 2019 15:02:11 +0200, Denis Laxalde wrote:
> > > > # HG changeset patch
> > > > # User Denis Laxalde <denis@laxalde.org>
> > > > # Date 1572008488 -7200
> > > > #      Fri Oct 25 15:01:28 2019 +0200
> > > > # Branch stable
> > > > # Node ID 6058175493d062dbaec2cb0e897e49da4596d147
> > > > # Parent  17fad3f1b4742d75db23e17a04c85223bcc239a2
> > > > py3: use native strings when forming email headers in patchbomb
> > >
> > > I feel this change is too big to be in stable.
> > >
> > > > -        # Fix up all headers to be native strings.
> > > > -        # TODO(durin42): this should probably be cleaned up above in the future.
> > > > -        if pycompat.ispy3:
> > > > -            for hdr, val in list(m.items()):
> > >
> > > Can't we somehow get around the issue here?
> >
> > Well, I'm not sure I'd be interested in implementing a workaround,
> > especially here since the change aims at removing this TODO while making
> > the code Python 3 compatible. Fixing the issue is a side effect somehow.
>
> Okay, then let's move the series to default branch if you aren't interested
> in minimizing the stable changes.
>
> There are too much py3 changes after the freeze.

I don't love this for stable, but I think I'd rather have it (in 5.2.1
maybe?) as part of our "works on py3" release than punt it to 5.3.

Thoughts?

> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Yuya Nishihara - Oct. 29, 2019, 12:36 p.m.
On Mon, 28 Oct 2019 15:16:49 -0400, Augie Fackler wrote:
> On Mon, Oct 28, 2019 at 11:57:38PM +0900, Yuya Nishihara wrote:
> > On Mon, 28 Oct 2019 10:09:22 +0100, Denis Laxalde wrote:
> > > Yuya Nishihara a écrit :
> > > > On Fri, 25 Oct 2019 15:02:11 +0200, Denis Laxalde wrote:
> > > > > # HG changeset patch
> > > > > # User Denis Laxalde <denis@laxalde.org>
> > > > > # Date 1572008488 -7200
> > > > > #      Fri Oct 25 15:01:28 2019 +0200
> > > > > # Branch stable
> > > > > # Node ID 6058175493d062dbaec2cb0e897e49da4596d147
> > > > > # Parent  17fad3f1b4742d75db23e17a04c85223bcc239a2
> > > > > py3: use native strings when forming email headers in patchbomb
> > > >
> > > > I feel this change is too big to be in stable.
> > > >
> > > > > -        # Fix up all headers to be native strings.
> > > > > -        # TODO(durin42): this should probably be cleaned up above in the future.
> > > > > -        if pycompat.ispy3:
> > > > > -            for hdr, val in list(m.items()):
> > > >
> > > > Can't we somehow get around the issue here?
> > >
> > > Well, I'm not sure I'd be interested in implementing a workaround,
> > > especially here since the change aims at removing this TODO while making
> > > the code Python 3 compatible. Fixing the issue is a side effect somehow.
> >
> > Okay, then let's move the series to default branch if you aren't interested
> > in minimizing the stable changes.
> >
> > There are too much py3 changes after the freeze.
> 
> I don't love this for stable, but I think I'd rather have it (in 5.2.1
> maybe?) as part of our "works on py3" release than punt it to 5.3.

I looked this patch more closely, and I'm getting to think patchbomb works
on py3 without this patch. Most (all?) header values are "encoded", which
should be ASCII and strurl() wouldn't crash.

So, -1 for this change. Can you at least add some tests that would fail
without this change?
Denis Laxalde - Oct. 29, 2019, 5:09 p.m.
Yuya Nishihara a écrit :
> On Mon, 28 Oct 2019 15:16:49 -0400, Augie Fackler wrote:
> > On Mon, Oct 28, 2019 at 11:57:38PM +0900, Yuya Nishihara wrote:
> > > On Mon, 28 Oct 2019 10:09:22 +0100, Denis Laxalde wrote:
> > > > Yuya Nishihara a écrit :
> > > > > On Fri, 25 Oct 2019 15:02:11 +0200, Denis Laxalde wrote:
> > > > > > # HG changeset patch
> > > > > > # User Denis Laxalde <denis@laxalde.org>
> > > > > > # Date 1572008488 -7200
> > > > > > #      Fri Oct 25 15:01:28 2019 +0200
> > > > > > # Branch stable
> > > > > > # Node ID 6058175493d062dbaec2cb0e897e49da4596d147
> > > > > > # Parent  17fad3f1b4742d75db23e17a04c85223bcc239a2
> > > > > > py3: use native strings when forming email headers in patchbomb
> > > > >
> > > > > I feel this change is too big to be in stable.
> > > > >
> > > > > > -        # Fix up all headers to be native strings.
> > > > > > -        # TODO(durin42): this should probably be cleaned up above in the future.
> > > > > > -        if pycompat.ispy3:
> > > > > > -            for hdr, val in list(m.items()):
> > > > >
> > > > > Can't we somehow get around the issue here?
> > > >
> > > > Well, I'm not sure I'd be interested in implementing a workaround,
> > > > especially here since the change aims at removing this TODO while making
> > > > the code Python 3 compatible. Fixing the issue is a side effect somehow.
> > >
> > > Okay, then let's move the series to default branch if you aren't interested
> > > in minimizing the stable changes.
> > >
> > > There are too much py3 changes after the freeze.
> > 
> > I don't love this for stable, but I think I'd rather have it (in 5.2.1
> > maybe?) as part of our "works on py3" release than punt it to 5.3.
> 
> I looked this patch more closely, and I'm getting to think patchbomb works
> on py3 without this patch. Most (all?) header values are "encoded", which
> should be ASCII and strurl() wouldn't crash.
> 
> So, -1 for this change. Can you at least add some tests that would fail
> without this change?

The test introduced in patch 1 of the series fails on Python 3 without
this change (a subject header with non-ascii characters). This crashes
in strurl(). On second thought, I can fix that with minimal changes, so
I'll send a follow up on patch 1.

Nevertheless, I still think this change here is worth. I'm fine if it
goes into default or if it waits after the release. Just let me know the
plan.
Yuya Nishihara - Oct. 29, 2019, 11:29 p.m.
On Tue, 29 Oct 2019 18:09:11 +0100, Denis Laxalde wrote:
> Yuya Nishihara a écrit :
> > On Mon, 28 Oct 2019 15:16:49 -0400, Augie Fackler wrote:
> > > On Mon, Oct 28, 2019 at 11:57:38PM +0900, Yuya Nishihara wrote:
> > > > On Mon, 28 Oct 2019 10:09:22 +0100, Denis Laxalde wrote:
> > > > > Yuya Nishihara a écrit :
> > > > > > On Fri, 25 Oct 2019 15:02:11 +0200, Denis Laxalde wrote:
> > > > > > > # HG changeset patch
> > > > > > > # User Denis Laxalde <denis@laxalde.org>
> > > > > > > # Date 1572008488 -7200
> > > > > > > #      Fri Oct 25 15:01:28 2019 +0200
> > > > > > > # Branch stable
> > > > > > > # Node ID 6058175493d062dbaec2cb0e897e49da4596d147
> > > > > > > # Parent  17fad3f1b4742d75db23e17a04c85223bcc239a2
> > > > > > > py3: use native strings when forming email headers in patchbomb
> > > > > >
> > > > > > I feel this change is too big to be in stable.
> > > > > >
> > > > > > > -        # Fix up all headers to be native strings.
> > > > > > > -        # TODO(durin42): this should probably be cleaned up above in the future.
> > > > > > > -        if pycompat.ispy3:
> > > > > > > -            for hdr, val in list(m.items()):
> > > > > >
> > > > > > Can't we somehow get around the issue here?
> > > > >
> > > > > Well, I'm not sure I'd be interested in implementing a workaround,
> > > > > especially here since the change aims at removing this TODO while making
> > > > > the code Python 3 compatible. Fixing the issue is a side effect somehow.
> > > >
> > > > Okay, then let's move the series to default branch if you aren't interested
> > > > in minimizing the stable changes.
> > > >
> > > > There are too much py3 changes after the freeze.
> > > 
> > > I don't love this for stable, but I think I'd rather have it (in 5.2.1
> > > maybe?) as part of our "works on py3" release than punt it to 5.3.
> > 
> > I looked this patch more closely, and I'm getting to think patchbomb works
> > on py3 without this patch. Most (all?) header values are "encoded", which
> > should be ASCII and strurl() wouldn't crash.
> > 
> > So, -1 for this change. Can you at least add some tests that would fail
> > without this change?
> 
> The test introduced in patch 1 of the series fails on Python 3 without
> this change (a subject header with non-ascii characters). This crashes
> in strurl(). On second thought, I can fix that with minimal changes, so
> I'll send a follow up on patch 1.

Ah, right. The test fails because of --test, which disables headencode().
Then, I think s/pycompat.strurl(val)/encoding.strfromlocal(val)/ is the
simplest fix.

> Nevertheless, I still think this change here is worth. I'm fine if it
> goes into default or if it waits after the release. Just let me know the
> plan.

I also want to remove this TODO, but I don't think inserting many
strfromlocal()s is the best way to do that. Maybe we'll need some helper
to cope with py3's new mail API?
Denis Laxalde - Oct. 30, 2019, 6:10 p.m.
Yuya Nishihara a écrit :
> > Nevertheless, I still think this change here is worth. I'm fine if it
> > goes into default or if it waits after the release. Just let me know the
> > plan.
> 
> I also want to remove this TODO, but I don't think inserting many
> strfromlocal()s is the best way to do that. Maybe we'll need some helper
> to cope with py3's new mail API?

I agree, all these strfromlocal() are not nice. At a first glance, I'd
say we should use native strings whenever possible, but there probably
are details here and there...

Patch

diff --git a/hgext/patchbomb.py b/hgext/patchbomb.py
--- a/hgext/patchbomb.py
+++ b/hgext/patchbomb.py
@@ -321,10 +321,12 @@  def makepatch(
         subj = b' '.join([prefix, opts.get(b'subject') or subj])
     else:
         subj = b' '.join([prefix, subj])
-    msg[b'Subject'] = mail.headencode(ui, subj, _charsets, opts.get(b'test'))
-    msg[b'X-Mercurial-Node'] = node
-    msg[b'X-Mercurial-Series-Index'] = b'%i' % idx
-    msg[b'X-Mercurial-Series-Total'] = b'%i' % total
+    msg[r'Subject'] = encoding.strfromlocal(
+        mail.headencode(ui, subj, _charsets, opts.get(b'test'))
+    )
+    msg[r'X-Mercurial-Node'] = pycompat.sysstr(node)
+    msg[r'X-Mercurial-Series-Index'] = '%i' % idx
+    msg[r'X-Mercurial-Series-Total'] = '%i' % total
     return msg, subj, ds
 
 
@@ -421,7 +423,9 @@  def _getbundlemsgs(repo, sender, bundle,
     )
     emailencoders.encode_base64(datapart)
     msg.attach(datapart)
-    msg[b'Subject'] = mail.headencode(ui, subj, _charsets, opts.get(r'test'))
+    msg[r'Subject'] = encoding.strfromlocal(
+        mail.headencode(ui, subj, _charsets, opts.get(r'test'))
+    )
     return [(msg, subj, None)]
 
 
@@ -454,7 +458,9 @@  def _makeintro(repo, sender, revs, patch
 
     body = _getdescription(repo, body, sender, **opts)
     msg = mail.mimeencode(ui, body, _charsets, opts.get(r'test'))
-    msg[b'Subject'] = mail.headencode(ui, subj, _charsets, opts.get(r'test'))
+    msg[r'Subject'] = encoding.strfromlocal(
+        mail.headencode(ui, subj, _charsets, opts.get(r'test'))
+    )
     return (msg, subj, diffstat)
 
 
@@ -523,8 +529,10 @@  def _getoutgoing(repo, dest, revs):
 
 def _msgid(node, timestamp):
     hostname = encoding.strtolocal(socket.getfqdn())
-    hostname = encoding.environ.get(b'HGHOSTNAME', hostname)
-    return b'<%s.%d@%s>' % (node, timestamp, hostname)
+    hostname = encoding.strfromlocal(
+        encoding.environ.get(b'HGHOSTNAME', hostname)
+    )
+    return '<%s.%d@%s>' % (node, timestamp, hostname)
 
 
 emailopts = [
@@ -854,7 +862,7 @@  def email(ui, repo, *revs, **opts):
 
     showaddrs = []
 
-    def getaddrs(header, ask=False, default=None):
+    def _getaddrs(header, ask=False, default=None):
         configkey = header.lower()
         opt = header.replace(b'-', b'_').lower()
         addrs = opts.get(opt)
@@ -881,6 +889,12 @@  def email(ui, repo, *revs, **opts):
             )
         return []
 
+    def getaddrs(header, ask=False, default=None):
+        return [
+            encoding.strfromlocal(a)
+            for a in _getaddrs(header, ask=ask, default=default)
+        ]
+
     to = getaddrs(b'To', ask=True)
     if not to:
         # we can get here in non-interactive mode
@@ -912,10 +926,11 @@  def email(ui, repo, *revs, **opts):
     parent = opts.get(b'in_reply_to') or None
     # angle brackets may be omitted, they're not semantically part of the msg-id
     if parent is not None:
-        if not parent.startswith(b'<'):
-            parent = b'<' + parent
-        if not parent.endswith(b'>'):
-            parent += b'>'
+        parent = encoding.strfromlocal(parent)
+        if not parent.startswith('<'):
+            parent = '<' + parent
+        if not parent.endswith('>'):
+            parent += '>'
 
     sender_addr = eutil.parseaddr(encoding.strfromlocal(sender))[1]
     sender = mail.addressencode(ui, sender, _charsets, opts.get(b'test'))
@@ -926,47 +941,32 @@  def email(ui, repo, *revs, **opts):
     )
     for i, (m, subj, ds) in enumerate(msgs):
         try:
-            m[b'Message-Id'] = genmsgid(m[b'X-Mercurial-Node'])
+            m[r'Message-Id'] = genmsgid(m[r'X-Mercurial-Node'])
             if not firstpatch:
-                firstpatch = m[b'Message-Id']
-            m[b'X-Mercurial-Series-Id'] = firstpatch
+                firstpatch = m[r'Message-Id']
+            m[r'X-Mercurial-Series-Id'] = firstpatch
         except TypeError:
-            m[b'Message-Id'] = genmsgid(b'patchbomb')
+            m[r'Message-Id'] = genmsgid('patchbomb')
         if parent:
-            m[b'In-Reply-To'] = parent
-            m[b'References'] = parent
-        if not parent or b'X-Mercurial-Node' not in m:
-            parent = m[b'Message-Id']
+            m[r'In-Reply-To'] = parent
+            m[r'References'] = parent
+        if not parent or r'X-Mercurial-Node' not in m:
+            parent = m[r'Message-Id']
 
-        m[b'User-Agent'] = b'Mercurial-patchbomb/%s' % util.version()
-        m[b'Date'] = eutil.formatdate(start_time[0], localtime=True)
+        m[r'User-Agent'] = encoding.strfromlocal(
+            b'Mercurial-patchbomb/%s' % util.version()
+        )
+        m[r'Date'] = eutil.formatdate(start_time[0], localtime=True)
 
         start_time = (start_time[0] + 1, start_time[1])
-        m[b'From'] = sender
-        m[b'To'] = b', '.join(to)
+        m[r'From'] = encoding.strfromlocal(sender)
+        m[r'To'] = ', '.join(to)
         if cc:
-            m[b'Cc'] = b', '.join(cc)
+            m[r'Cc'] = ', '.join(cc)
         if bcc:
-            m[b'Bcc'] = b', '.join(bcc)
+            m[r'Bcc'] = ', '.join(bcc)
         if replyto:
-            m[b'Reply-To'] = b', '.join(replyto)
-        # Fix up all headers to be native strings.
-        # TODO(durin42): this should probably be cleaned up above in the future.
-        if pycompat.ispy3:
-            for hdr, val in list(m.items()):
-                change = False
-                if isinstance(hdr, bytes):
-                    del m[hdr]
-                    hdr = pycompat.strurl(hdr)
-                    change = True
-                if isinstance(val, bytes):
-                    val = pycompat.strurl(val)
-                    if not change:
-                        # prevent duplicate headers
-                        del m[hdr]
-                    change = True
-                if change:
-                    m[hdr] = val
+            m[r'Reply-To'] = ', '.join(replyto)
         if opts.get(b'test'):
             ui.status(_(b'displaying '), subj, b' ...\n')
             ui.pager(b'email')
@@ -984,12 +984,11 @@  def email(ui, repo, *revs, **opts):
             progress.update(i, item=subj)
             if not mbox:
                 # Exim does not remove the Bcc field
-                del m[b'Bcc']
+                del m[r'Bcc']
             fp = stringio()
             generator = mail.Generator(fp, mangle_from_=False)
             generator.flatten(m, 0)
             alldests = to + bcc + cc
-            alldests = [encoding.strfromlocal(d) for d in alldests]
             sendmail(sender_addr, alldests, fp.getvalue())
 
     progress.complete()