Patchwork [STABLE] phabricator: convert unicode to binary when writing patches

login
register
mail settings
Submitter Jun Wu
Date July 27, 2017, 7:03 p.m.
Message ID <e3b08c0b6d7ca92f0d8b.1501182221@x1c>
Download mbox | patch
Permalink /patch/22571/
State Rejected
Headers show

Comments

Jun Wu - July 27, 2017, 7:03 p.m.
# HG changeset patch
# User Jun Wu <quark@fb.com>
# Date 1501182181 25200
#      Thu Jul 27 12:03:01 2017 -0700
# Branch stable
# Node ID e3b08c0b6d7ca92f0d8b741585338b68ed0d8c48
# Parent  c5607b65fcb8cf5b789c49a8cf4fecfe83931727
phabricator: convert unicode to binary when writing patches

This is a quick fix to make `hg phabread D189` work.

It seems we might want to replace all `r''` to `u''`, and add more
`encoding.*to*` to be more explicit when interacting with `json` module.
Gregory Szorc - July 28, 2017, 6:54 a.m.
On Thu, Jul 27, 2017 at 12:03 PM, Jun Wu <quark@fb.com> wrote:

> # HG changeset patch
> # User Jun Wu <quark@fb.com>
> # Date 1501182181 25200
> #      Thu Jul 27 12:03:01 2017 -0700
> # Branch stable
> # Node ID e3b08c0b6d7ca92f0d8b741585338b68ed0d8c48
> # Parent  c5607b65fcb8cf5b789c49a8cf4fecfe83931727
> phabricator: convert unicode to binary when writing patches
>
> This is a quick fix to make `hg phabread D189` work.
>
> It seems we might want to replace all `r''` to `u''`, and add more
> `encoding.*to*` to be more explicit when interacting with `json` module.
>

This brings back bad memories for me:
https://bugzilla.mozilla.org/show_bug.cgi?id=1249004#c5

Essentially, since pretty much everything in the world tries to ensure that
JSON is UTF-8, it is not uncommon to have non-UTF-8 byte sequences mangled
somewhere in a JSON exchange. After trying to coerce Python's stdlib json
module on both the client and server, I figured it was best to side-step
the issue by base64 encoding the diff bytes before they hit the JSON layer.
JSON just isn't a good format for shipping binary around. The problem is
compounded by Python's stdlib implementation making certain choices.

I'm curious what will happen if you attempt to import
https://reviewboard-hg.mozilla.org/gecko/rev/4bf2ecabfd98 into Phabricator.
The bytes between the quotes (0x22) are supposed to be 0xffff7e.





> diff --git a/contrib/phabricator.py b/contrib/phabricator.py
> --- a/contrib/phabricator.py
> +++ b/contrib/phabricator.py
> @@ -561,5 +561,6 @@ def readpatch(repo, params, write, stack
>                  header += '# %s %s\n' % (_metanamemap[k], meta[k])
>
> -        write(('%s%s\n%s') % (header, desc, body))
> +        content = '%s%s\n%s' % (header, desc, body)
> +        write(encoding.unitolocal(content))
>
>  @command('phabread',
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
Yuya Nishihara - July 28, 2017, 1:21 p.m.
On Thu, 27 Jul 2017 23:54:34 -0700, Gregory Szorc wrote:
> On Thu, Jul 27, 2017 at 12:03 PM, Jun Wu <quark@fb.com> wrote:
> 
> > # HG changeset patch
> > # User Jun Wu <quark@fb.com>
> > # Date 1501182181 25200
> > #      Thu Jul 27 12:03:01 2017 -0700
> > # Branch stable
> > # Node ID e3b08c0b6d7ca92f0d8b741585338b68ed0d8c48
> > # Parent  c5607b65fcb8cf5b789c49a8cf4fecfe83931727
> > phabricator: convert unicode to binary when writing patches
> >
> > This is a quick fix to make `hg phabread D189` work.
> >
> > It seems we might want to replace all `r''` to `u''`, and add more
> > `encoding.*to*` to be more explicit when interacting with `json` module.
> >
> 
> This brings back bad memories for me:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1249004#c5
> 
> Essentially, since pretty much everything in the world tries to ensure that
> JSON is UTF-8, it is not uncommon to have non-UTF-8 byte sequences mangled
> somewhere in a JSON exchange. After trying to coerce Python's stdlib json
> module on both the client and server, I figured it was best to side-step
> the issue by base64 encoding the diff bytes before they hit the JSON layer.
> JSON just isn't a good format for shipping binary around. The problem is
> compounded by Python's stdlib implementation making certain choices.
> 
> I'm curious what will happen if you attempt to import
> https://reviewboard-hg.mozilla.org/gecko/rev/4bf2ecabfd98 into Phabricator.
> The bytes between the quotes (0x22) are supposed to be 0xffff7e.

Phabricator has an open issue for that.

https://secure.phabricator.com/T8669
https://secure.phabricator.com/T5955

JSON is horrible choice for serializing a text (in Unix sense.) So is
Python 3.

> > -        write(('%s%s\n%s') % (header, desc, body))
> > +        content = '%s%s\n%s' % (header, desc, body)
> > +        write(encoding.unitolocal(content))

Strictly speaking, encoding.unitolocal(header + desc) makes sense, but
body doesn't.
Jun Wu - July 28, 2017, 3:26 p.m.
Excerpts from Gregory Szorc's message of 2017-07-27 23:54:34 -0700:
> I'm curious what will happen if you attempt to import
> https://reviewboard-hg.mozilla.org/gecko/rev/4bf2ecabfd98  into Phabricator.
> The bytes between the quotes (0x22) are supposed to be 0xffff7e.

It seems arc will just treat the file as binary [1]. `phabsend` would behave
wrong in this case. But that might be fixable by replacing `isbinary`
temporarily.

[1]: https://secure.phabricator.com/D325

Patch

diff --git a/contrib/phabricator.py b/contrib/phabricator.py
--- a/contrib/phabricator.py
+++ b/contrib/phabricator.py
@@ -561,5 +561,6 @@  def readpatch(repo, params, write, stack
                 header += '# %s %s\n' % (_metanamemap[k], meta[k])
 
-        write(('%s%s\n%s') % (header, desc, body))
+        content = '%s%s\n%s' % (header, desc, body)
+        write(encoding.unitolocal(content))
 
 @command('phabread',