Submitter | Matt Harbison |
---|---|
Date | Jan. 28, 2019, 5:20 a.m. |
Message ID | <b98988169d4a9c7890b9.1548652853@Envy> |
Download | mbox | patch |
Permalink | /patch/38133/ |
State | Accepted |
Headers | show |
Comments
On Mon, 28 Jan 2019 00:20:53 -0500, Matt Harbison wrote: > # HG changeset patch > # User Matt Harbison <matt_harbison@yahoo.com> > # Date 1548629295 18000 > # Sun Jan 27 17:48:15 2019 -0500 > # Node ID b98988169d4a9c7890b93091683fa4ec38d61a47 > # Parent 1d6f4c32abc28ea54e3d1d8487a1d773033aedf0 > py3: byteify the decoded JSON responses upon receipt in the LFS blobstore > - return response > + def encodestr(x): > + if isinstance(x, unicode): Fixed s/unicode/pycompat.unicode/ in flight. > + return x.encode(u'utf-8') > + return x > + > + return pycompat.rapply(encodestr, response) I assume here JSON strings are encoding agnostic (i.e. ASCII.) If the JSON had a filename for example, it wouldn't be always correct to encode data as UTF-8.
On Mon, 28 Jan 2019 06:59:44 -0500, Yuya Nishihara <yuya@tcha.org> wrote: > On Mon, 28 Jan 2019 00:20:53 -0500, Matt Harbison wrote: >> # HG changeset patch >> # User Matt Harbison <matt_harbison@yahoo.com> >> # Date 1548629295 18000 >> # Sun Jan 27 17:48:15 2019 -0500 >> # Node ID b98988169d4a9c7890b93091683fa4ec38d61a47 >> # Parent 1d6f4c32abc28ea54e3d1d8487a1d773033aedf0 >> py3: byteify the decoded JSON responses upon receipt in the LFS >> blobstore > >> - return response >> + def encodestr(x): >> + if isinstance(x, unicode): > > Fixed s/unicode/pycompat.unicode/ in flight. > >> + return x.encode(u'utf-8') >> + return x >> + >> + return pycompat.rapply(encodestr, response) > > I assume here JSON strings are encoding agnostic (i.e. ASCII.) If the > JSON > had a filename for example, it wouldn't be always correct to encode data > as > UTF-8. Somewhere along the line, I got it in my head that the spec explicitly said the JSON payload was utf-8 encoded. Of course, I can't find that now, and the test-lfs-test-server.t#git-server output doesn't have a charset in the header for the JSON exchange. That said, some clients/servers do[1]: > POST /systec_bin4.git/objects/batch HTTP/1.1 > Host: 172.17.31.252:9999 > Accept: application/vnd.git-lfs+json; charset=utf-8 > Authorization: RemoteAuth ........secret key................... > Content-Length: 216 > Content-Type: application/vnd.git-lfs+json; charset=utf-8 > User-Agent: git-lfs/2.2.1 (GitHub; windows amd64; go 1.8.3; git 621d1f82) > {... JSON ...}trace git-lfs: HTTP: 401 < HTTP/1.1 401 Unauthorized < Content-Length: 26 < Content-Type: text/plain; charset=utf-8 < Date: Wed, 27 Sep 2017 06:55:39 GMT < Www-Authenticate: Basic realm=git-lfs-server Is it worth trying to use that header info, if it exists? There wouldn't be any real file names in the payload- just URLs containing '.hg/lfs/objects' and the oid. The schemas are here: https://github.com/git-lfs/git-lfs/blob/master/tq/schemas/http-batch-request-schema.json https://github.com/git-lfs/git-lfs/blob/master/tq/schemas/http-batch-response-schema.json [1] https://github.com/git-lfs/git-lfs/issues/2618
On Sat, 02 Feb 2019 13:18:37 -0500, Matt Harbison wrote: > On Mon, 28 Jan 2019 06:59:44 -0500, Yuya Nishihara <yuya@tcha.org> wrote: > > > On Mon, 28 Jan 2019 00:20:53 -0500, Matt Harbison wrote: > >> # HG changeset patch > >> # User Matt Harbison <matt_harbison@yahoo.com> > >> # Date 1548629295 18000 > >> # Sun Jan 27 17:48:15 2019 -0500 > >> # Node ID b98988169d4a9c7890b93091683fa4ec38d61a47 > >> # Parent 1d6f4c32abc28ea54e3d1d8487a1d773033aedf0 > >> py3: byteify the decoded JSON responses upon receipt in the LFS > >> blobstore > > > >> - return response > >> + def encodestr(x): > >> + if isinstance(x, unicode): > > > > Fixed s/unicode/pycompat.unicode/ in flight. > > > >> + return x.encode(u'utf-8') > >> + return x > >> + > >> + return pycompat.rapply(encodestr, response) > > > > I assume here JSON strings are encoding agnostic (i.e. ASCII.) If the > > JSON > > had a filename for example, it wouldn't be always correct to encode data > > as > > UTF-8. > > Somewhere along the line, I got it in my head that the spec explicitly > said the JSON payload was utf-8 encoded. Of course, I can't find that > now, and the test-lfs-test-server.t#git-server output doesn't have a > charset in the header for the JSON exchange. JSON should be utf-8, but IIUC, we're translating JSON back to Mercurial world. So if the response had a unicode string which Mercurial thinks is a platform byte string, x.encode(u'utf-8') shouldn't be used. That's the point.
Patch
diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py --- a/hgext/lfs/blobstore.py +++ b/hgext/lfs/blobstore.py @@ -307,7 +307,7 @@ class _gitlfsremote(object): except util.urlerr.httperror as ex: hints = { 400: _(b'check that lfs serving is enabled on %s and "%s" is ' - 'supported') % (self.baseurl, action), + b'supported') % (self.baseurl, action), 404: _(b'the "lfs.url" config may be used to override %s') % self.baseurl, } @@ -342,7 +342,12 @@ class _gitlfsremote(object): separators=(r'', r': '), sort_keys=True))) - return response + def encodestr(x): + if isinstance(x, unicode): + return x.encode(u'utf-8') + return x + + return pycompat.rapply(encodestr, response) def _checkforservererror(self, pointers, responses, action): """Scans errors from objects @@ -410,12 +415,11 @@ class _gitlfsremote(object): See https://github.com/git-lfs/git-lfs/blob/master/docs/api/\ basic-transfers.md """ - oid = pycompat.bytestr(obj['oid']) + oid = obj[b'oid'] + href = obj[b'actions'][action].get(b'href') + headers = obj[b'actions'][action].get(b'header', {}).items() - href = pycompat.bytestr(obj['actions'][action].get('href')) - headers = obj['actions'][action].get('header', {}).items() - - request = util.urlreq.request(href) + request = util.urlreq.request(pycompat.strurl(href)) if action == b'upload': # If uploading blobs, read data from local blobstore. if not localstore.verify(oid): @@ -426,7 +430,7 @@ class _gitlfsremote(object): request.add_header(r'Content-Type', r'application/octet-stream') for k, v in headers: - request.add_header(k, v) + request.add_header(pycompat.strurl(k), pycompat.strurl(v)) response = b'' try: