Patchwork [1,of,2,V2] lfs: add a local store method for opening a blob

login
register
mail settings
Submitter Matt Harbison
Date Jan. 5, 2018, 2:59 a.m.
Message ID <b8a794103bbb0506069f.1515121149@Envy>
Download mbox | patch
Permalink /patch/26547/
State Accepted
Headers show

Comments

Matt Harbison - Jan. 5, 2018, 2:59 a.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1514945910 18000
#      Tue Jan 02 21:18:30 2018 -0500
# Node ID b8a794103bbb0506069f2b0ad101ba6b0455dc26
# Parent  31fe397f2bdab5fea4752947e70549e7a7d04f75
lfs: add a local store method for opening a blob

The has() and read() methods already dynamically switch between the usercache
and local store.  This should generally be preferred to directly accessing the
vfs instances outside of the store.

The file is now explicitly opened in binary mode for clarity.  (It was also
being opened in binary mode before, but only because vfs.__call__() appends 'b'
if needed when not opening with 'text=True'.)
Yuya Nishihara - Jan. 5, 2018, 6:10 a.m.
On Thu, 04 Jan 2018 21:59:09 -0500, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1514945910 18000
> #      Tue Jan 02 21:18:30 2018 -0500
> # Node ID b8a794103bbb0506069f2b0ad101ba6b0455dc26
> # Parent  31fe397f2bdab5fea4752947e70549e7a7d04f75
> lfs: add a local store method for opening a blob
> 
> The has() and read() methods already dynamically switch between the usercache
> and local store.  This should generally be preferred to directly accessing the
> vfs instances outside of the store.
> 
> The file is now explicitly opened in binary mode for clarity.  (It was also
> being opened in binary mode before, but only because vfs.__call__() appends 'b'
> if needed when not opening with 'text=True'.)
> 
> diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py
> --- a/hgext/lfs/blobstore.py
> +++ b/hgext/lfs/blobstore.py
> @@ -100,6 +100,14 @@
>          self.cachevfs = lfsvfs(usercache)
>          self.ui = repo.ui
>  
> +    def open(self, oid):
> +        """Open a read-only file descriptor to the named blob, in either the
> +        usercache or the local store."""
> +        if self.cachevfs.exists(oid):
> +            return self.cachevfs(oid, 'rb')
> +
> +        return self.vfs(oid, 'rb')

Oops, one more nit. cachevfs vs vfs, which should precede the other? I think
vfs is more "near" as it must belong to the same filesystem.
Matt Harbison - Jan. 5, 2018, 1:04 p.m.
> On Jan 5, 2018, at 1:10 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> 
>> On Thu, 04 Jan 2018 21:59:09 -0500, Matt Harbison wrote:
>> # HG changeset patch
>> # User Matt Harbison <matt_harbison@yahoo.com>
>> # Date 1514945910 18000
>> #      Tue Jan 02 21:18:30 2018 -0500
>> # Node ID b8a794103bbb0506069f2b0ad101ba6b0455dc26
>> # Parent  31fe397f2bdab5fea4752947e70549e7a7d04f75
>> lfs: add a local store method for opening a blob
>> 
>> The has() and read() methods already dynamically switch between the usercache
>> and local store.  This should generally be preferred to directly accessing the
>> vfs instances outside of the store.
>> 
>> The file is now explicitly opened in binary mode for clarity.  (It was also
>> being opened in binary mode before, but only because vfs.__call__() appends 'b'
>> if needed when not opening with 'text=True'.)
>> 
>> diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py
>> --- a/hgext/lfs/blobstore.py
>> +++ b/hgext/lfs/blobstore.py
>> @@ -100,6 +100,14 @@
>>         self.cachevfs = lfsvfs(usercache)
>>         self.ui = repo.ui
>> 
>> +    def open(self, oid):
>> +        """Open a read-only file descriptor to the named blob, in either the
>> +        usercache or the local store."""
>> +        if self.cachevfs.exists(oid):
>> +            return self.cachevfs(oid, 'rb')
>> +
>> +        return self.vfs(oid, 'rb')
> 
> Oops, one more nit. cachevfs vs vfs, which should precede the other? I think
> vfs is more "near" as it must belong to the same filesystem.

The only reason I ordered it like that was so if the file doesn’t exist in either place, the “abort: no such file” message points to the local repo, not somewhere in ~/ or some other place.

It’s hard to say what’s more likely to be populated. I’ve been hitting a lot of cases lately where it is only in vfscache, but that’s because I’m working with multiple clones. However, the blob will also be in vfscache because of commits if there’s only one clone, so that seems like the most likely place to be populated?
Yuya Nishihara - Jan. 5, 2018, 1:20 p.m.
On Fri, 5 Jan 2018 08:04:54 -0500, Matt Harbison wrote:
> 
> > On Jan 5, 2018, at 1:10 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > 
> >> On Thu, 04 Jan 2018 21:59:09 -0500, Matt Harbison wrote:
> >> # HG changeset patch
> >> # User Matt Harbison <matt_harbison@yahoo.com>
> >> # Date 1514945910 18000
> >> #      Tue Jan 02 21:18:30 2018 -0500
> >> # Node ID b8a794103bbb0506069f2b0ad101ba6b0455dc26
> >> # Parent  31fe397f2bdab5fea4752947e70549e7a7d04f75
> >> lfs: add a local store method for opening a blob
> >> 
> >> The has() and read() methods already dynamically switch between the usercache
> >> and local store.  This should generally be preferred to directly accessing the
> >> vfs instances outside of the store.
> >> 
> >> The file is now explicitly opened in binary mode for clarity.  (It was also
> >> being opened in binary mode before, but only because vfs.__call__() appends 'b'
> >> if needed when not opening with 'text=True'.)
> >> 
> >> diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py
> >> --- a/hgext/lfs/blobstore.py
> >> +++ b/hgext/lfs/blobstore.py
> >> @@ -100,6 +100,14 @@
> >>         self.cachevfs = lfsvfs(usercache)
> >>         self.ui = repo.ui
> >> 
> >> +    def open(self, oid):
> >> +        """Open a read-only file descriptor to the named blob, in either the
> >> +        usercache or the local store."""
> >> +        if self.cachevfs.exists(oid):
> >> +            return self.cachevfs(oid, 'rb')
> >> +
> >> +        return self.vfs(oid, 'rb')
> > 
> > Oops, one more nit. cachevfs vs vfs, which should precede the other? I think
> > vfs is more "near" as it must belong to the same filesystem.
> 
> The only reason I ordered it like that was so if the file doesn’t exist in either place, the “abort: no such file” message points to the local repo, not somewhere in ~/ or some other place.
> 
> It’s hard to say what’s more likely to be populated. I’ve been hitting a lot of cases lately where it is only in vfscache, but that’s because I’m working with multiple clones. However, the blob will also be in vfscache because of commits if there’s only one clone, so that seems like the most likely place to be populated?

Can you add inline comments? I'm already getting confused with lfs, but we're
responsible for making lfs less bad than largefiles. :)
Yuya Nishihara - Jan. 6, 2018, 1:58 a.m.
On Fri, 5 Jan 2018 22:20:44 +0900, Yuya Nishihara wrote:
> On Fri, 5 Jan 2018 08:04:54 -0500, Matt Harbison wrote:
> > 
> > > On Jan 5, 2018, at 1:10 AM, Yuya Nishihara <yuya@tcha.org> wrote:
> > > 
> > >> On Thu, 04 Jan 2018 21:59:09 -0500, Matt Harbison wrote:
> > >> # HG changeset patch
> > >> # User Matt Harbison <matt_harbison@yahoo.com>
> > >> # Date 1514945910 18000
> > >> #      Tue Jan 02 21:18:30 2018 -0500
> > >> # Node ID b8a794103bbb0506069f2b0ad101ba6b0455dc26
> > >> # Parent  31fe397f2bdab5fea4752947e70549e7a7d04f75
> > >> lfs: add a local store method for opening a blob
> > >> 
> > >> The has() and read() methods already dynamically switch between the usercache
> > >> and local store.  This should generally be preferred to directly accessing the
> > >> vfs instances outside of the store.
> > >> 
> > >> The file is now explicitly opened in binary mode for clarity.  (It was also
> > >> being opened in binary mode before, but only because vfs.__call__() appends 'b'
> > >> if needed when not opening with 'text=True'.)
> > >> 
> > >> diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py
> > >> --- a/hgext/lfs/blobstore.py
> > >> +++ b/hgext/lfs/blobstore.py
> > >> @@ -100,6 +100,14 @@
> > >>         self.cachevfs = lfsvfs(usercache)
> > >>         self.ui = repo.ui
> > >> 
> > >> +    def open(self, oid):
> > >> +        """Open a read-only file descriptor to the named blob, in either the
> > >> +        usercache or the local store."""
> > >> +        if self.cachevfs.exists(oid):
> > >> +            return self.cachevfs(oid, 'rb')
> > >> +
> > >> +        return self.vfs(oid, 'rb')
> > > 
> > > Oops, one more nit. cachevfs vs vfs, which should precede the other? I think
> > > vfs is more "near" as it must belong to the same filesystem.
> > 
> > The only reason I ordered it like that was so if the file doesn’t exist in either place, the “abort: no such file” message points to the local repo, not somewhere in ~/ or some other place.

I've queued this series. Please send a follow up describing why vfs is tested
last.
Matt Harbison - Jan. 6, 2018, 2:15 a.m.
On Fri, 05 Jan 2018 08:20:44 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Fri, 5 Jan 2018 08:04:54 -0500, Matt Harbison wrote:
>>
>> > On Jan 5, 2018, at 1:10 AM, Yuya Nishihara <yuya@tcha.org> wrote:
>> >
>> >> On Thu, 04 Jan 2018 21:59:09 -0500, Matt Harbison wrote:
>> >> # HG changeset patch
>> >> # User Matt Harbison <matt_harbison@yahoo.com>
>> >> # Date 1514945910 18000
>> >> #      Tue Jan 02 21:18:30 2018 -0500
>> >> # Node ID b8a794103bbb0506069f2b0ad101ba6b0455dc26
>> >> # Parent  31fe397f2bdab5fea4752947e70549e7a7d04f75
>> >> lfs: add a local store method for opening a blob
>> >>
>> >> The has() and read() methods already dynamically switch between the  
>> usercache
>> >> and local store.  This should generally be preferred to directly  
>> accessing the
>> >> vfs instances outside of the store.
>> >>
>> >> The file is now explicitly opened in binary mode for clarity.  (It  
>> was also
>> >> being opened in binary mode before, but only because vfs.__call__()  
>> appends 'b'
>> >> if needed when not opening with 'text=True'.)
>> >>
>> >> diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py
>> >> --- a/hgext/lfs/blobstore.py
>> >> +++ b/hgext/lfs/blobstore.py
>> >> @@ -100,6 +100,14 @@
>> >>         self.cachevfs = lfsvfs(usercache)
>> >>         self.ui = repo.ui
>> >>
>> >> +    def open(self, oid):
>> >> +        """Open a read-only file descriptor to the named blob, in  
>> either the
>> >> +        usercache or the local store."""
>> >> +        if self.cachevfs.exists(oid):
>> >> +            return self.cachevfs(oid, 'rb')
>> >> +
>> >> +        return self.vfs(oid, 'rb')
>> >
>> > Oops, one more nit. cachevfs vs vfs, which should precede the other?  
>> I think
>> > vfs is more "near" as it must belong to the same filesystem.
>>
>> The only reason I ordered it like that was so if the file doesn’t exist  
>> in either place, the “abort: no such file” message points to the local  
>> repo, not somewhere in ~/ or some other place.
>>
>> It’s hard to say what’s more likely to be populated. I’ve been hitting  
>> a lot of cases lately where it is only in vfscache, but that’s because  
>> I’m working with multiple clones. However, the blob will also be in  
>> vfscache because of commits if there’s only one clone, so that seems  
>> like the most likely place to be populated?
>
> Can you add inline comments? I'm already getting confused with lfs, but  
> we're
> responsible for making lfs less bad than largefiles. :)

Lol.  Will do.

Which parts are confusing?  (I've been a little confused by some things  
too, but I'm wondering how much overlap there is, what can be clarified,  
and how much is me juggling too many projects at once.)
Yuya Nishihara - Jan. 6, 2018, 7:39 a.m.
On Fri, 05 Jan 2018 21:15:54 -0500, Matt Harbison wrote:
> Which parts are confusing?  (I've been a little confused by some things  
> too, but I'm wondering how much overlap there is, what can be clarified,  
> and how much is me juggling too many projects at once.)

When to verify=True or not, for example. FWIW, I'm reviewing lfs patches without
understanding how lfs was designed, so please disregard my pointless comments.
Matt Harbison - Jan. 6, 2018, 7:53 p.m.
On Sat, 06 Jan 2018 02:39:08 -0500, Yuya Nishihara <yuya@tcha.org> wrote:

> On Fri, 05 Jan 2018 21:15:54 -0500, Matt Harbison wrote:
>> Which parts are confusing?  (I've been a little confused by some things
>> too, but I'm wondering how much overlap there is, what can be clarified,
>> and how much is me juggling too many projects at once.)
>
> When to verify=True or not, for example.

Yeah, that took me a bit to puzzle through as well.  There are only a few  
uses, and I tried describing them in 417e8e040102.  Basically, if we are  
getting data from core (e.g. commit) or giving it back (e.g. cat), there's  
no need to verify because core does that.  Otherwise (e.g. getting stuff  
 from remote stores or writing to remote stores), we do verify.  I've been  
sitting on a series with an explicit local.download() method to help  
clarify writes from an external source vs commits.  But I haven't figured  
out a way to give upload the same treatment (read -> write to file vs open  
file and create a urlreq.request() seems too different to pack into one  
method).

I'll clean that up and submit, because it has other benefits.

> FWIW, I'm reviewing lfs patches without
> understanding how lfs was designed, so please disregard my pointless  
> comments.

All I know about it is from the wiki page[1], and past experience from  
largefiles.  It's good to know what isn't clear to others, so that can be  
fixed, and make it easier for others to help maintain this in the future.

[1] https://www.mercurial-scm.org/wiki/LfsPlan

Patch

diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py
--- a/hgext/lfs/blobstore.py
+++ b/hgext/lfs/blobstore.py
@@ -100,6 +100,14 @@ 
         self.cachevfs = lfsvfs(usercache)
         self.ui = repo.ui
 
+    def open(self, oid):
+        """Open a read-only file descriptor to the named blob, in either the
+        usercache or the local store."""
+        if self.cachevfs.exists(oid):
+            return self.cachevfs(oid, 'rb')
+
+        return self.vfs(oid, 'rb')
+
     def write(self, oid, data, verify=True):
         """Write blob to local blobstore."""
         if verify: