Patchwork [07,of,11] py3: byteify the decoded JSON responses upon receipt in the LFS blobstore

login
register
mail settings
Submitter Matt Harbison
Date Jan. 28, 2019, 5:20 a.m.
Message ID <b98988169d4a9c7890b9.1548652853@Envy>
Download mbox | patch
Permalink /patch/38133/
State Accepted
Headers show

Comments

Matt Harbison - Jan. 28, 2019, 5:20 a.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1548629295 18000
#      Sun Jan 27 17:48:15 2019 -0500
# Node ID b98988169d4a9c7890b93091683fa4ec38d61a47
# Parent  1d6f4c32abc28ea54e3d1d8487a1d773033aedf0
py3: byteify the decoded JSON responses upon receipt in the LFS blobstore

It got too confusing juggling r'' vs b'' across several functions.
Yuya Nishihara - Jan. 28, 2019, 11:59 a.m.
On Mon, 28 Jan 2019 00:20:53 -0500, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1548629295 18000
> #      Sun Jan 27 17:48:15 2019 -0500
> # Node ID b98988169d4a9c7890b93091683fa4ec38d61a47
> # Parent  1d6f4c32abc28ea54e3d1d8487a1d773033aedf0
> py3: byteify the decoded JSON responses upon receipt in the LFS blobstore

> -        return response
> +        def encodestr(x):
> +            if isinstance(x, unicode):

Fixed s/unicode/pycompat.unicode/ in flight.

> +                return x.encode(u'utf-8')
> +            return x
> +
> +        return pycompat.rapply(encodestr, response)

I assume here JSON strings are encoding agnostic (i.e. ASCII.) If the JSON
had a filename for example, it wouldn't be always correct to encode data as
UTF-8.
Matt Harbison - Feb. 2, 2019, 6:18 p.m.
On Mon, 28 Jan 2019 06:59:44 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Mon, 28 Jan 2019 00:20:53 -0500, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1548629295 18000
>> #      Sun Jan 27 17:48:15 2019 -0500
>> # Node ID b98988169d4a9c7890b93091683fa4ec38d61a47
>> # Parent  1d6f4c32abc28ea54e3d1d8487a1d773033aedf0
>> py3: byteify the decoded JSON responses upon receipt in the LFS  
>> blobstore
>
>> -        return response
>> +        def encodestr(x):
>> +            if isinstance(x, unicode):
>
> Fixed s/unicode/pycompat.unicode/ in flight.
>
>> +                return x.encode(u'utf-8')
>> +            return x
>> +
>> +        return pycompat.rapply(encodestr, response)
>
> I assume here JSON strings are encoding agnostic (i.e. ASCII.) If the  
> JSON
> had a filename for example, it wouldn't be always correct to encode data  
> as
> UTF-8.

Somewhere along the line, I got it in my head that the spec explicitly  
said the JSON payload was utf-8 encoded.  Of course, I can't find that  
now, and the test-lfs-test-server.t#git-server output doesn't have a  
charset in the header for the JSON exchange.  That said, some  
clients/servers do[1]:

   > POST /systec_bin4.git/objects/batch HTTP/1.1
   > Host: 172.17.31.252:9999
   > Accept: application/vnd.git-lfs+json; charset=utf-8
   > Authorization: RemoteAuth  ........secret key...................
   > Content-Length: 216
   > Content-Type: application/vnd.git-lfs+json; charset=utf-8
   > User-Agent: git-lfs/2.2.1 (GitHub; windows amd64; go 1.8.3; git  
621d1f82)
   >
   {... JSON ...}trace git-lfs: HTTP: 401


   < HTTP/1.1 401 Unauthorized
   < Content-Length: 26
   < Content-Type: text/plain; charset=utf-8
   < Date: Wed, 27 Sep 2017 06:55:39 GMT
   < Www-Authenticate: Basic realm=git-lfs-server

Is it worth trying to use that header info, if it exists?

There wouldn't be any real file names in the payload- just URLs containing  
'.hg/lfs/objects' and the oid.  The schemas are here:

https://github.com/git-lfs/git-lfs/blob/master/tq/schemas/http-batch-request-schema.json
https://github.com/git-lfs/git-lfs/blob/master/tq/schemas/http-batch-response-schema.json

[1] https://github.com/git-lfs/git-lfs/issues/2618
Yuya Nishihara - Feb. 3, 2019, 1:52 a.m.
On Sat, 02 Feb 2019 13:18:37 -0500, Matt Harbison wrote:
> On Mon, 28 Jan 2019 06:59:44 -0500, Yuya Nishihara <yuya@tcha.org> wrote:
> 
> > On Mon, 28 Jan 2019 00:20:53 -0500, Matt Harbison wrote:
> >> # HG changeset patch
> >> # User Matt Harbison <matt_harbison@yahoo.com>
> >> # Date 1548629295 18000
> >> #      Sun Jan 27 17:48:15 2019 -0500
> >> # Node ID b98988169d4a9c7890b93091683fa4ec38d61a47
> >> # Parent  1d6f4c32abc28ea54e3d1d8487a1d773033aedf0
> >> py3: byteify the decoded JSON responses upon receipt in the LFS  
> >> blobstore
> >
> >> -        return response
> >> +        def encodestr(x):
> >> +            if isinstance(x, unicode):
> >
> > Fixed s/unicode/pycompat.unicode/ in flight.
> >
> >> +                return x.encode(u'utf-8')
> >> +            return x
> >> +
> >> +        return pycompat.rapply(encodestr, response)
> >
> > I assume here JSON strings are encoding agnostic (i.e. ASCII.) If the  
> > JSON
> > had a filename for example, it wouldn't be always correct to encode data  
> > as
> > UTF-8.
> 
> Somewhere along the line, I got it in my head that the spec explicitly  
> said the JSON payload was utf-8 encoded.  Of course, I can't find that  
> now, and the test-lfs-test-server.t#git-server output doesn't have a  
> charset in the header for the JSON exchange.

JSON should be utf-8, but IIUC, we're translating JSON back to Mercurial
world. So if the response had a unicode string which Mercurial thinks is a
platform byte string, x.encode(u'utf-8') shouldn't be used. That's the point.

Patch

diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py
--- a/hgext/lfs/blobstore.py
+++ b/hgext/lfs/blobstore.py
@@ -307,7 +307,7 @@  class _gitlfsremote(object):
         except util.urlerr.httperror as ex:
             hints = {
                 400: _(b'check that lfs serving is enabled on %s and "%s" is '
-                       'supported') % (self.baseurl, action),
+                       b'supported') % (self.baseurl, action),
                 404: _(b'the "lfs.url" config may be used to override %s')
                        % self.baseurl,
             }
@@ -342,7 +342,12 @@  class _gitlfsremote(object):
                                          separators=(r'', r': '),
                                          sort_keys=True)))
 
-        return response
+        def encodestr(x):
+            if isinstance(x, unicode):
+                return x.encode(u'utf-8')
+            return x
+
+        return pycompat.rapply(encodestr, response)
 
     def _checkforservererror(self, pointers, responses, action):
         """Scans errors from objects
@@ -410,12 +415,11 @@  class _gitlfsremote(object):
         See https://github.com/git-lfs/git-lfs/blob/master/docs/api/\
         basic-transfers.md
         """
-        oid = pycompat.bytestr(obj['oid'])
+        oid = obj[b'oid']
+        href = obj[b'actions'][action].get(b'href')
+        headers = obj[b'actions'][action].get(b'header', {}).items()
 
-        href = pycompat.bytestr(obj['actions'][action].get('href'))
-        headers = obj['actions'][action].get('header', {}).items()
-
-        request = util.urlreq.request(href)
+        request = util.urlreq.request(pycompat.strurl(href))
         if action == b'upload':
             # If uploading blobs, read data from local blobstore.
             if not localstore.verify(oid):
@@ -426,7 +430,7 @@  class _gitlfsremote(object):
             request.add_header(r'Content-Type', r'application/octet-stream')
 
         for k, v in headers:
-            request.add_header(k, v)
+            request.add_header(pycompat.strurl(k), pycompat.strurl(v))
 
         response = b''
         try: