Submitter | Jun Wu |
---|---|
Date | July 27, 2017, 7:03 p.m. |
Message ID | <e3b08c0b6d7ca92f0d8b.1501182221@x1c> |
Download | mbox | patch |
Permalink | /patch/22571/ |
State | Rejected |
Headers | show |
Comments
On Thu, Jul 27, 2017 at 12:03 PM, Jun Wu <quark@fb.com> wrote: > # HG changeset patch > # User Jun Wu <quark@fb.com> > # Date 1501182181 25200 > # Thu Jul 27 12:03:01 2017 -0700 > # Branch stable > # Node ID e3b08c0b6d7ca92f0d8b741585338b68ed0d8c48 > # Parent c5607b65fcb8cf5b789c49a8cf4fecfe83931727 > phabricator: convert unicode to binary when writing patches > > This is a quick fix to make `hg phabread D189` work. > > It seems we might want to replace all `r''` to `u''`, and add more > `encoding.*to*` to be more explicit when interacting with `json` module. > This brings back bad memories for me: https://bugzilla.mozilla.org/show_bug.cgi?id=1249004#c5 Essentially, since pretty much everything in the world tries to ensure that JSON is UTF-8, it is not uncommon to have non-UTF-8 byte sequences mangled somewhere in a JSON exchange. After trying to coerce Python's stdlib json module on both the client and server, I figured it was best to side-step the issue by base64 encoding the diff bytes before they hit the JSON layer. JSON just isn't a good format for shipping binary around. The problem is compounded by Python's stdlib implementation making certain choices. I'm curious what will happen if you attempt to import https://reviewboard-hg.mozilla.org/gecko/rev/4bf2ecabfd98 into Phabricator. The bytes between the quotes (0x22) are supposed to be 0xffff7e. > diff --git a/contrib/phabricator.py b/contrib/phabricator.py > --- a/contrib/phabricator.py > +++ b/contrib/phabricator.py > @@ -561,5 +561,6 @@ def readpatch(repo, params, write, stack > header += '# %s %s\n' % (_metanamemap[k], meta[k]) > > - write(('%s%s\n%s') % (header, desc, body)) > + content = '%s%s\n%s' % (header, desc, body) > + write(encoding.unitolocal(content)) > > @command('phabread', > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel >
On Thu, 27 Jul 2017 23:54:34 -0700, Gregory Szorc wrote: > On Thu, Jul 27, 2017 at 12:03 PM, Jun Wu <quark@fb.com> wrote: > > > # HG changeset patch > > # User Jun Wu <quark@fb.com> > > # Date 1501182181 25200 > > # Thu Jul 27 12:03:01 2017 -0700 > > # Branch stable > > # Node ID e3b08c0b6d7ca92f0d8b741585338b68ed0d8c48 > > # Parent c5607b65fcb8cf5b789c49a8cf4fecfe83931727 > > phabricator: convert unicode to binary when writing patches > > > > This is a quick fix to make `hg phabread D189` work. > > > > It seems we might want to replace all `r''` to `u''`, and add more > > `encoding.*to*` to be more explicit when interacting with `json` module. > > > > This brings back bad memories for me: > https://bugzilla.mozilla.org/show_bug.cgi?id=1249004#c5 > > Essentially, since pretty much everything in the world tries to ensure that > JSON is UTF-8, it is not uncommon to have non-UTF-8 byte sequences mangled > somewhere in a JSON exchange. After trying to coerce Python's stdlib json > module on both the client and server, I figured it was best to side-step > the issue by base64 encoding the diff bytes before they hit the JSON layer. > JSON just isn't a good format for shipping binary around. The problem is > compounded by Python's stdlib implementation making certain choices. > > I'm curious what will happen if you attempt to import > https://reviewboard-hg.mozilla.org/gecko/rev/4bf2ecabfd98 into Phabricator. > The bytes between the quotes (0x22) are supposed to be 0xffff7e. Phabricator has an open issue for that. https://secure.phabricator.com/T8669 https://secure.phabricator.com/T5955 JSON is horrible choice for serializing a text (in Unix sense.) So is Python 3. > > - write(('%s%s\n%s') % (header, desc, body)) > > + content = '%s%s\n%s' % (header, desc, body) > > + write(encoding.unitolocal(content)) Strictly speaking, encoding.unitolocal(header + desc) makes sense, but body doesn't.
Excerpts from Gregory Szorc's message of 2017-07-27 23:54:34 -0700: > I'm curious what will happen if you attempt to import > https://reviewboard-hg.mozilla.org/gecko/rev/4bf2ecabfd98 into Phabricator. > The bytes between the quotes (0x22) are supposed to be 0xffff7e. It seems arc will just treat the file as binary [1]. `phabsend` would behave wrong in this case. But that might be fixable by replacing `isbinary` temporarily. [1]: https://secure.phabricator.com/D325
Patch
diff --git a/contrib/phabricator.py b/contrib/phabricator.py --- a/contrib/phabricator.py +++ b/contrib/phabricator.py @@ -561,5 +561,6 @@ def readpatch(repo, params, write, stack header += '# %s %s\n' % (_metanamemap[k], meta[k]) - write(('%s%s\n%s') % (header, desc, body)) + content = '%s%s\n%s' % (header, desc, body) + write(encoding.unitolocal(content)) @command('phabread',