Patchwork [2,of,6] lfs: add the ability to disable the usercache

login
register
mail settings
Submitter Matt Harbison
Date April 9, 2018, 4:26 a.m.
Message ID <f4381233ecb960307d39.1523248002@Envy>
Download mbox | patch
Permalink /patch/30581/
State Accepted
Headers show

Comments

Matt Harbison - April 9, 2018, 4:26 a.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1523155211 14400
#      Sat Apr 07 22:40:11 2018 -0400
# Node ID f4381233ecb960307d39459ea961a0af03df442b
# Parent  69ead78af94459ca41e529d0ebfb24bd1d11c32c
lfs: add the ability to disable the usercache

While the usercache is important for real world uses, I've been tripped up more
than a couple of times by it in tests- thinking a file was being downloaded, but
it was simply linked from the local cache.  The syntax for setting it is the
same as for setting a null remote endpoint, and like that endpoint, is left
undocumented.

This may or may not be a useful feature in the real world (I'd expect any sane
filesystem to support hardlinks at this point).
Yuya Nishihara - April 9, 2018, 1:21 p.m.
On Mon, 09 Apr 2018 00:26:42 -0400, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1523155211 14400
> #      Sat Apr 07 22:40:11 2018 -0400
> # Node ID f4381233ecb960307d39459ea961a0af03df442b
> # Parent  69ead78af94459ca41e529d0ebfb24bd1d11c32c
> lfs: add the ability to disable the usercache

> +class nullvfs(lfsvfs):
> +    def __init__(self):
> +        pass
> +
> +    def exists(self, oid):
> +        return False
> +
> +    def read(self, oid):
> +        raise IOError('%s: No such file or directory' % self.join(oid))
> +
> +    def walk(self, path=None, onerror=None):
> +        return ('', [], [])
> +
> +    def write(self, oid, data):
> +        pass

I don't think this abstraction would work nicely. For example, local.read()
expects something exists in usercache.

    def read(self, oid, verify=True):
        """Read blob from local blobstore."""
        if not self.vfs.exists(oid):
            blob = self._read(self.cachevfs, oid, verify)
Matt Harbison - April 9, 2018, 3:28 p.m.
> On Apr 9, 2018, at 9:21 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> 
>> On Mon, 09 Apr 2018 00:26:42 -0400, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1523155211 14400
>> #      Sat Apr 07 22:40:11 2018 -0400
>> # Node ID f4381233ecb960307d39459ea961a0af03df442b
>> # Parent  69ead78af94459ca41e529d0ebfb24bd1d11c32c
>> lfs: add the ability to disable the usercache
> 
>> +class nullvfs(lfsvfs):
>> +    def __init__(self):
>> +        pass
>> +
>> +    def exists(self, oid):
>> +        return False
>> +
>> +    def read(self, oid):
>> +        raise IOError('%s: No such file or directory' % self.join(oid))
>> +
>> +    def walk(self, path=None, onerror=None):
>> +        return ('', [], [])
>> +
>> +    def write(self, oid, data):
>> +        pass
> 
> I don't think this abstraction would work nicely. For example, local.read()
> expects something exists in usercache.
> 
>    def read(self, oid, verify=True):
>        """Read blob from local blobstore."""
>        if not self.vfs.exists(oid):
>            blob = self._read(self.cachevfs, oid, verify)

That initially gave me pause too. But vfs.read(oid) inside _read() will also fail if the file doesn’t exist.  So before reading from the store, all code calls store.exists(oid).  Since the nullvfs always says the file doesn’t exist, the only way for a store to say it exists is if self.vfs (the repo internal store) has it.  That’s the desired behavior.

The only thing I was puzzling over after seeing that was if nullvfs.read() should raise an error or return an empty string- it should never be called.
Yuya Nishihara - April 9, 2018, 3:43 p.m.
On Mon, 9 Apr 2018 11:28:54 -0400, Matt Harbison wrote:
> 
> > On Apr 9, 2018, at 9:21 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > 
> >> On Mon, 09 Apr 2018 00:26:42 -0400, Matt Harbison wrote:
> >> # HG changeset patch
> >> # User Matt Harbison <matt_harbison@yahoo.com>
> >> # Date 1523155211 14400
> >> #      Sat Apr 07 22:40:11 2018 -0400
> >> # Node ID f4381233ecb960307d39459ea961a0af03df442b
> >> # Parent  69ead78af94459ca41e529d0ebfb24bd1d11c32c
> >> lfs: add the ability to disable the usercache
> > 
> >> +class nullvfs(lfsvfs):
> >> +    def __init__(self):
> >> +        pass
> >> +
> >> +    def exists(self, oid):
> >> +        return False
> >> +
> >> +    def read(self, oid):
> >> +        raise IOError('%s: No such file or directory' % self.join(oid))
> >> +
> >> +    def walk(self, path=None, onerror=None):
> >> +        return ('', [], [])
> >> +
> >> +    def write(self, oid, data):
> >> +        pass
> > 
> > I don't think this abstraction would work nicely. For example, local.read()
> > expects something exists in usercache.
> > 
> >    def read(self, oid, verify=True):
> >        """Read blob from local blobstore."""
> >        if not self.vfs.exists(oid):
> >            blob = self._read(self.cachevfs, oid, verify)
> 
> That initially gave me pause too. But vfs.read(oid) inside _read() will also fail if the file doesn’t exist.  So before reading from the store, all code calls store.exists(oid).  Since the nullvfs always says the file doesn’t exist, the only way for a store to say it exists is if self.vfs (the repo internal store) has it.  That’s the desired behavior.
> 
> The only thing I was puzzling over after seeing that was if nullvfs.read() should raise an error or return an empty string- it should never be called.

Can you run all lfs tests with the nullvfs to see if nullvfs is mostly
working? I got some AttributeError.
Matt Harbison - April 9, 2018, 7:26 p.m.
> On Apr 9, 2018, at 11:43 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> 
>> On Mon, 9 Apr 2018 11:28:54 -0400, Matt Harbison wrote:
>> 
>>>> On Apr 9, 2018, at 9:21 AM, Yuya Nishihara <yuya@tcha.org> wrote:
>>>> 
>>>> On Mon, 09 Apr 2018 00:26:42 -0400, Matt Harbison wrote:
>>>> # HG changeset patch
>>>> # User Matt Harbison <matt_harbison@yahoo.com>
>>>> # Date 1523155211 14400
>>>> #      Sat Apr 07 22:40:11 2018 -0400
>>>> # Node ID f4381233ecb960307d39459ea961a0af03df442b
>>>> # Parent  69ead78af94459ca41e529d0ebfb24bd1d11c32c
>>>> lfs: add the ability to disable the usercache
>>> 
>>>> +class nullvfs(lfsvfs):
>>>> +    def __init__(self):
>>>> +        pass
>>>> +
>>>> +    def exists(self, oid):
>>>> +        return False
>>>> +
>>>> +    def read(self, oid):
>>>> +        raise IOError('%s: No such file or directory' % self.join(oid))
>>>> +
>>>> +    def walk(self, path=None, onerror=None):
>>>> +        return ('', [], [])
>>>> +
>>>> +    def write(self, oid, data):
>>>> +        pass
>>> 
>>> I don't think this abstraction would work nicely. For example, local.read()
>>> expects something exists in usercache.
>>> 
>>>   def read(self, oid, verify=True):
>>>       """Read blob from local blobstore."""
>>>       if not self.vfs.exists(oid):
>>>           blob = self._read(self.cachevfs, oid, verify)
>> 
>> That initially gave me pause too. But vfs.read(oid) inside _read() will also fail if the file doesn’t exist.  So before reading from the store, all code calls store.exists(oid).  Since the nullvfs always says the file doesn’t exist, the only way for a store to say it exists is if self.vfs (the repo internal store) has it.  That’s the desired behavior.
>> 
>> The only thing I was puzzling over after seeing that was if nullvfs.read() should raise an error or return an empty string- it should never be called.
> 
> Can you run all lfs tests with the nullvfs to see if nullvfs is mostly
> working? I got some AttributeError.

Good catch. The problem is the self.join reference in the read() call when generating the IOError- self.base that it uses doesn’t exist.  The upload case is not checking that the blob exists beforehand, because it wants an error if it doesn’t.  The revlog read will fetch it first if it doesn’t exist in the store.

I think the abstraction still works though because if I point self.cachevfs to self.vfs, the test case for not having the local blob (where this was blowing up) fails in exactly the same way.  The only difference is the full path isn’t available, and the error code for a real vfs may be ENOENT or ENOTDIR because of the path sharding.

I’ll fix up the IOError call.  test-lfs.t and test-lfs-test-server.t rely on the cache, but nothing else looked wrong and the other lfs tests run cleanly.

Patch

diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py
--- a/hgext/lfs/blobstore.py
+++ b/hgext/lfs/blobstore.py
@@ -59,6 +59,22 @@  class lfsvfs(vfsmod.vfs):
 
         yield ('', [], oids)
 
+class nullvfs(lfsvfs):
+    def __init__(self):
+        pass
+
+    def exists(self, oid):
+        return False
+
+    def read(self, oid):
+        raise IOError('%s: No such file or directory' % self.join(oid))
+
+    def walk(self, path=None, onerror=None):
+        return ('', [], [])
+
+    def write(self, oid, data):
+        pass
+
 class filewithprogress(object):
     """a file-like object that supports __len__ and read.
 
@@ -97,8 +113,14 @@  class local(object):
     def __init__(self, repo):
         fullpath = repo.svfs.join('lfs/objects')
         self.vfs = lfsvfs(fullpath)
-        usercache = lfutil._usercachedir(repo.ui, 'lfs')
-        self.cachevfs = lfsvfs(usercache)
+        usercache = util.url(lfutil._usercachedir(repo.ui, 'lfs'))
+        if usercache.scheme in (None, 'file'):
+            self.cachevfs = lfsvfs(usercache.localpath())
+        elif usercache.scheme == 'null':
+            self.cachevfs = nullvfs()
+        else:
+            raise error.Abort(_('unknown lfs cache scheme: %s')
+                              % usercache.scheme)
         self.ui = repo.ui
 
     def open(self, oid):
@@ -129,11 +151,7 @@  class local(object):
             if realoid != oid:
                 raise error.Abort(_('corrupt remote lfs object: %s') % oid)
 
-        # XXX: should we verify the content of the cache, and hardlink back to
-        # the local store on success, but truncate, write and link on failure?
-        if not self.cachevfs.exists(oid):
-            self.ui.note(_('lfs: adding %s to the usercache\n') % oid)
-            lfutil.link(self.vfs.join(oid), self.cachevfs.join(oid))
+        self._linktousercache(oid)
 
     def write(self, oid, data):
         """Write blob to local blobstore.
@@ -144,9 +162,13 @@  class local(object):
         with self.vfs(oid, 'wb', atomictemp=True) as fp:
             fp.write(data)
 
+        self._linktousercache(oid)
+
+    def _linktousercache(self, oid):
         # XXX: should we verify the content of the cache, and hardlink back to
         # the local store on success, but truncate, write and link on failure?
-        if not self.cachevfs.exists(oid):
+        if (not self.cachevfs.exists(oid)
+            and not isinstance(self.cachevfs, nullvfs)):
             self.ui.note(_('lfs: adding %s to the usercache\n') % oid)
             lfutil.link(self.vfs.join(oid), self.cachevfs.join(oid))
 
diff --git a/tests/test-lfs-serve.t b/tests/test-lfs-serve.t
--- a/tests/test-lfs-serve.t
+++ b/tests/test-lfs-serve.t
@@ -35,6 +35,7 @@  masked by the Internal Server Error mess
   $ cat >> $HGRCPATH <<EOF
   > [lfs]
   > url=file:$TESTTMP/dummy-remote/
+  > usercache = null://
   > threshold=10
   > [web]
   > allow_push=*