Patchwork [RFC] lfs: clean up a variety of strings/bytes issues

login
register
mail settings
Submitter Augie Fackler
Date Jan. 25, 2019, 10:33 p.m.
Message ID <d59571aba933f16db8e5.1548455634@augie-macbookpro2.roam.corp.google.com>
Download mbox | patch
Permalink /patch/38040/
State New
Headers show

Comments

Augie Fackler - Jan. 25, 2019, 10:33 p.m.
# HG changeset patch
# User Augie Fackler <augie@google.com>
# Date 1548455589 18000
#      Fri Jan 25 17:33:09 2019 -0500
# Node ID d59571aba933f16db8e58ec9b61365ea4db0e2fa
# Parent  ffdac0067a147f878ac45dc2c4b3b3e490865252
lfs: clean up a variety of strings/bytes issues

Mostly these are around json encoding/decoding, which is a hassle for
us. After this change there are fewer tracebacks in the lfs tests on
Python 3, but I'm very confused by the remaining failures: it looks
very much like destrepo.ui.setconfig() in mercurial.hg.clone() isn't
working: I can see a repo object with a stable repr() (which includes
the object's id) that appears to _lose_ the setconfig()ed
paths.default by the time we get into the lfs blobstore setup.

I'm not sure where to debug from here, but I _suspect_ this code is
reasonable (or at least partially so) as a starting place to try and
get the remaining lfs tests working on Python 3.
Matt Harbison - Jan. 26, 2019, 6 a.m.
On Fri, 25 Jan 2019 17:33:54 -0500, Augie Fackler <raf@durin42.com> wrote:

> # HG changeset patch
> # User Augie Fackler <augie@google.com>
> # Date 1548455589 18000
> #      Fri Jan 25 17:33:09 2019 -0500
> # Node ID d59571aba933f16db8e58ec9b61365ea4db0e2fa
> # Parent  ffdac0067a147f878ac45dc2c4b3b3e490865252
> lfs: clean up a variety of strings/bytes issues
>
> Mostly these are around json encoding/decoding, which is a hassle for
> us. After this change there are fewer tracebacks in the lfs tests on
> Python 3, but I'm very confused by the remaining failures: it looks
> very much like destrepo.ui.setconfig() in mercurial.hg.clone() isn't
> working: I can see a repo object with a stable repr() (which includes
> the object's id) that appears to _lose_ the setconfig()ed
> paths.default by the time we get into the lfs blobstore setup.

I don't think it is losing `paths.default`.  There's a typo in the  
followup hacks patch, such that it is looking up 'path.default' (not  
plural).  I adjusted that, and it prints the path fine in the prompt  
assertion.

The blobstore is selected during reposetup(), and cached.  A local copy is  
recreated in the exchange.push() and prefetch() wrappers, which is  
happening here.  Therefore, the `if not store.has(oid)` in readfromstore()  
must be busted somehow.
Matt Harbison - Jan. 26, 2019, 6:22 a.m.
On Sat, 26 Jan 2019 01:00:03 -0500, Matt Harbison <mharbison72@gmail.com>  
wrote:

> On Fri, 25 Jan 2019 17:33:54 -0500, Augie Fackler <raf@durin42.com>  
> wrote:
>
>> # HG changeset patch
>> # User Augie Fackler <augie@google.com>
>> # Date 1548455589 18000
>> #      Fri Jan 25 17:33:09 2019 -0500
>> # Node ID d59571aba933f16db8e58ec9b61365ea4db0e2fa
>> # Parent  ffdac0067a147f878ac45dc2c4b3b3e490865252
>> lfs: clean up a variety of strings/bytes issues
>>
>> Mostly these are around json encoding/decoding, which is a hassle for
>> us. After this change there are fewer tracebacks in the lfs tests on
>> Python 3, but I'm very confused by the remaining failures: it looks
>> very much like destrepo.ui.setconfig() in mercurial.hg.clone() isn't
>> working: I can see a repo object with a stable repr() (which includes
>> the object's id) that appears to _lose_ the setconfig()ed
>> paths.default by the time we get into the lfs blobstore setup.
>
> I don't think it is losing `paths.default`.  There's a typo in the  
> followup hacks patch, such that it is looking up 'path.default' (not  
> plural).  I adjusted that, and it prints the path fine in the prompt  
> assertion.
>
> The blobstore is selected during reposetup(), and cached.  A local copy  
> is recreated in the exchange.push() and prefetch() wrappers, which is  
> happening here.  Therefore, the `if not store.has(oid)` in  
> readfromstore() must be busted somehow.

Looking more closely at the debug output, the prefetch request is asking  
to download an empty list of objects.  You can see it if you add `--debug`  
to `hg clone -q http://localhost:$HGPORT $TESTTMP/client3_clone` around  
test-lfs-serve.t:236.  Therefore prefetch is doing nothing, so then it  
wants to call out individually using the original blobstore as the  
filelogs are accessed.
Matt Harbison - Jan. 27, 2019, 3:35 a.m.
On Fri, 25 Jan 2019 17:33:54 -0500, Augie Fackler <raf@durin42.com> wrote:

> # HG changeset patch
> # User Augie Fackler <augie@google.com>
> # Date 1548455589 18000
> #      Fri Jan 25 17:33:09 2019 -0500
> # Node ID d59571aba933f16db8e58ec9b61365ea4db0e2fa
> # Parent  ffdac0067a147f878ac45dc2c4b3b3e490865252
> lfs: clean up a variety of strings/bytes issues

I think I have this mostly stabilized now.  But I need to break it up into  
digestible chunks, and flag some things for review.  Hopefully I'll get to  
it in the next day or so.
Augie Fackler - Jan. 27, 2019, 4:11 a.m.
> On Jan 26, 2019, at 10:35 PM, Matt Harbison <mharbison72@gmail.com> wrote:
> 
> On Fri, 25 Jan 2019 17:33:54 -0500, Augie Fackler <raf@durin42.com> wrote:
> 
>> # HG changeset patch
>> # User Augie Fackler <augie@google.com>
>> # Date 1548455589 18000
>> #      Fri Jan 25 17:33:09 2019 -0500
>> # Node ID d59571aba933f16db8e58ec9b61365ea4db0e2fa
>> # Parent  ffdac0067a147f878ac45dc2c4b3b3e490865252
>> lfs: clean up a variety of strings/bytes issues
> 
> I think I have this mostly stabilized now.  But I need to break it up into digestible chunks, and flag some things for review.  Hopefully I'll get to it in the next day or so.

Awesome. I’m also happy to try and split it up for you since you took a mess I started and finished it...

Patch

diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py
--- a/hgext/lfs/blobstore.py
+++ b/hgext/lfs/blobstore.py
@@ -79,7 +79,8 @@  class nullvfs(lfsvfs):
         # self.vfs.  Raise the same error as a normal vfs when asked to read a
         # file that doesn't exist.  The only difference is the full file path
         # isn't available in the error.
-        raise IOError(errno.ENOENT, '%s: No such file or directory' % oid)
+        raise IOError(errno.ENOENT,
+                      pycompat.sysstr('%s: No such file or directory' % oid))
 
     def walk(self, path=None, onerror=None):
         return ('', [], [])
@@ -289,15 +290,16 @@  class _gitlfsremote(object):
         Return decoded JSON object like {'objects': [{'oid': '', 'size': 1}]}
         See https://github.com/git-lfs/git-lfs/blob/master/docs/api/batch.md
         """
-        objects = [{'oid': p.oid(), 'size': p.size()} for p in pointers]
-        requestdata = json.dumps({
-            'objects': objects,
-            'operation': action,
-        })
-        url = '%s/objects/batch' % self.baseurl
+        objects = [{r'oid': pycompat.strurl(p.oid()),
+                    r'size': p.size()} for p in pointers]
+        requestdata = pycompat.bytesurl(json.dumps({
+            r'objects': objects,
+            r'operation': pycompat.strurl(action),
+        }))
+        url = pycompat.strurl('%s/objects/batch' % self.baseurl)
         batchreq = util.urlreq.request(url, data=requestdata)
-        batchreq.add_header('Accept', 'application/vnd.git-lfs+json')
-        batchreq.add_header('Content-Type', 'application/vnd.git-lfs+json')
+        batchreq.add_header(r'Accept', r'application/vnd.git-lfs+json')
+        batchreq.add_header(r'Content-Type', r'application/vnd.git-lfs+json')
         try:
             with contextlib.closing(self.urlopener.open(batchreq)) as rsp:
                 rawjson = rsp.read()
@@ -308,8 +310,10 @@  class _gitlfsremote(object):
                 404: _('the "lfs.url" config may be used to override %s')
                        % self.baseurl,
             }
-            hint = hints.get(ex.code, _('api=%s, action=%s') % (url, action))
-            raise LfsRemoteError(_('LFS HTTP error: %s') % ex, hint=hint)
+            hint = hints.get(ex.code, _('api=%s, action=%s') % (
+                pycompat.bytesurl(url), action))
+            raise LfsRemoteError(
+                _('LFS HTTP error: %s') % pycompat.bytestr(ex), hint=hint)
         except util.urlerr.urlerror as ex:
             hint = (_('the "lfs.url" config may be used to override %s')
                     % self.baseurl)
@@ -325,14 +329,17 @@  class _gitlfsremote(object):
             self.ui.debug('Status: %d\n' % rsp.status)
             # lfs-test-server and hg serve return headers in different order
             self.ui.debug('%s\n'
-                          % '\n'.join(sorted(str(rsp.info()).splitlines())))
+                          % '\n'.join(sorted(
+                              pycompat.bytestr(rsp.info()).splitlines())))
 
             if 'objects' in response:
                 response['objects'] = sorted(response['objects'],
                                              key=lambda p: p['oid'])
             self.ui.debug('%s\n'
-                          % json.dumps(response, indent=2,
-                                       separators=('', ': '), sort_keys=True))
+                          % pycompat.bytesurl(
+                              json.dumps(response, indent=2,
+                                         separators=(r'', r': '),
+                                         sort_keys=True)))
 
         return response
 
diff --git a/hgext/lfs/wireprotolfsserver.py b/hgext/lfs/wireprotolfsserver.py
--- a/hgext/lfs/wireprotolfsserver.py
+++ b/hgext/lfs/wireprotolfsserver.py
@@ -138,11 +138,14 @@  def _processbatchrequest(repo, req, res)
                       b'Only the basic LFS transfer handler is supported')
         return True
 
-    operation = lfsreq.get('operation')
-    if operation not in ('upload', 'download'):
-        _sethttperror(res, HTTP_BAD_REQUEST,
-                      b'Unsupported LFS transfer operation: %s' % operation)
+    operation = lfsreq.get(r'operation')
+    if operation not in (r'upload', r'download'):
+        _sethttperror(
+            res, HTTP_BAD_REQUEST,
+            b'Unsupported LFS transfer operation: %s' % pycompat.bytestr(
+                operation))
         return True
+    operation = pycompat.bytesurl(operation)
 
     localstore = repo.svfs.lfslocalblobstore
 
@@ -150,8 +153,8 @@  def _processbatchrequest(repo, req, res)
                                                 operation, localstore)]
 
     rsp = {
-        'transfer': 'basic',
-        'objects': objects,
+        r'transfer': r'basic',
+        r'objects': objects,
     }
 
     res.status = hgwebcommon.statusmessage(HTTP_OK)
@@ -190,10 +193,11 @@  def _batchresponseobjects(req, objects, 
 
     for obj in objects:
         # Convert unicode to ASCII to create a filesystem path
-        oid = obj.get('oid').encode('ascii')
+        soid = obj.get(r'oid')
+        oid = soid.encode('ascii')
         rsp = {
-            'oid': oid,
-            'size': obj.get('size'),  # XXX: should this check the local size?
+            r'oid': oid,
+            r'size': obj.get(r'size'),  # XXX: should this check the local size?
             #'authenticated': True,
         }
 
@@ -217,9 +221,9 @@  def _batchresponseobjects(req, objects, 
             if inst.errno != errno.ENOENT:
                 _logexception(req)
 
-                rsp['error'] = {
-                    'code': 500,
-                    'message': inst.strerror or 'Internal Server Server'
+                rsp[r'error'] = {
+                    r'code': 500,
+                    r'message': inst.strerror or r'Internal Server Server'
                 }
                 yield rsp
                 continue
@@ -230,17 +234,17 @@  def _batchresponseobjects(req, objects, 
         # IFF they already exist locally.
         if action == 'download':
             if not exists:
-                rsp['error'] = {
-                    'code': 404,
-                    'message': "The object does not exist"
+                rsp[r'error'] = {
+                    r'code': 404,
+                    r'message': r"The object does not exist"
                 }
                 yield rsp
                 continue
 
             elif not verifies:
-                rsp['error'] = {
-                    'code': 422,   # XXX: is this the right code?
-                    'message': "The object is corrupt"
+                rsp[r'error'] = {
+                    r'code': 422,   # XXX: is this the right code?
+                    r'message': r"The object is corrupt"
                 }
                 yield rsp
                 continue
@@ -256,22 +260,22 @@  def _batchresponseobjects(req, objects, 
             # a gratuitous deviation from lfs-test-server in the test
             # output.
             hdr = {
-                'Accept': 'application/vnd.git-lfs'
+                r'Accept': r'application/vnd.git-lfs'
             }
 
-            auth = req.headers.get('Authorization', '')
-            if auth.startswith('Basic '):
-                hdr['Authorization'] = auth
+            auth = req.headers.get(r'Authorization', r'')
+            if auth.startswith(r'Basic '):
+                hdr[r'Authorization'] = auth
 
             return hdr
 
-        rsp['actions'] = {
-            '%s' % action: {
-                'href': '%s%s/.hg/lfs/objects/%s'
-                    % (req.baseurl, req.apppath, oid),
+        rsp[r'actions'] = {
+            r'%s' % pycompat.strurl(action): {
+                r'href': pycompat.strurl('%s%s/.hg/lfs/objects/%s'
+                    % (req.baseurl, req.apppath, oid)),
                 # datetime.isoformat() doesn't include the 'Z' suffix
-                "expires_at": expiresat.strftime('%Y-%m-%dT%H:%M:%SZ'),
-                'header': _buildheader(),
+                r"expires_at": expiresat.strftime(r'%Y-%m-%dT%H:%M:%SZ'),
+                r'header': _buildheader(),
             }
         }