Patchwork lfs: ensure the blob is linked to the remote store on skipped uploads

login
register
mail settings
Submitter Matt Harbison
Date Sept. 6, 2018, 5:02 a.m.
Message ID <f8b494a211f85738d924.1536210144@Envy>
Download mbox | patch
Permalink /patch/34363/
State Accepted
Headers show

Comments

Matt Harbison - Sept. 6, 2018, 5:02 a.m.
# HG changeset patch
# User Matt Harbison <matt_harbison@yahoo.com>
# Date 1536209481 14400
#      Thu Sep 06 00:51:21 2018 -0400
# Node ID f8b494a211f85738d924f2869102a89c29ecb309
# Parent  53ed650843c5c1e1b7a829be525ca8087c68906f
lfs: ensure the blob is linked to the remote store on skipped uploads

I noticed a "missing" blob when pushing two repositories with common blobs to a
fresh server, and then running `hg verify` as a user different from the one
running the web server.  When pushing the second repo, several of the blobs
already existed in the user cache, so the server indicated to the client that it
doesn't need to upload the blobs.  That's good enough for the web server process
to serve up in the future.  But a different user has a different cache by
default, so verify complains that `lfs.url` needs to be set, because it wants to
fetch the missing blobs.

Aside from that corner case, it's better to keep all of the blobs in the repo
whenever possible.  Especially since the largefiles wiki says the user cache can
be deleted at any time to reclaim disk space- users switching over may have the
same expectations.
Yuya Nishihara - Sept. 6, 2018, 11:44 a.m.
On Thu, 06 Sep 2018 01:02:24 -0400, Matt Harbison wrote:
> # HG changeset patch
> # User Matt Harbison <matt_harbison@yahoo.com>
> # Date 1536209481 14400
> #      Thu Sep 06 00:51:21 2018 -0400
> # Node ID f8b494a211f85738d924f2869102a89c29ecb309
> # Parent  53ed650843c5c1e1b7a829be525ca8087c68906f
> lfs: ensure the blob is linked to the remote store on skipped uploads

Seems fine. Queued, thanks.

Patch

diff --git a/hgext/lfs/blobstore.py b/hgext/lfs/blobstore.py
--- a/hgext/lfs/blobstore.py
+++ b/hgext/lfs/blobstore.py
@@ -168,6 +168,20 @@  class local(object):
 
         self._linktousercache(oid)
 
+    def linkfromusercache(self, oid):
+        """Link blobs found in the user cache into this store.
+
+        The server module needs to do this when it lets the client know not to
+        upload the blob, to ensure it is always available in this store.
+        Normally this is done implicitly when the client reads or writes the
+        blob, but that doesn't happen when the server tells the client that it
+        already has the blob.
+        """
+        if (not isinstance(self.cachevfs, nullvfs)
+            and not self.vfs.exists(oid)):
+            self.ui.note(_('lfs: found %s in the usercache\n') % oid)
+            lfutil.link(self.cachevfs.join(oid), self.vfs.join(oid))
+
     def _linktousercache(self, oid):
         # XXX: should we verify the content of the cache, and hardlink back to
         # the local store on success, but truncate, write and link on failure?
diff --git a/hgext/lfs/wireprotolfsserver.py b/hgext/lfs/wireprotolfsserver.py
--- a/hgext/lfs/wireprotolfsserver.py
+++ b/hgext/lfs/wireprotolfsserver.py
@@ -204,6 +204,10 @@  def _batchresponseobjects(req, objects, 
         # verified as the file is streamed to the caller.
         try:
             verifies = store.verify(oid)
+            if verifies and action == 'upload':
+                # The client will skip this upload, but make sure it remains
+                # available locally.
+                store.linkfromusercache(oid)
         except IOError as inst:
             if inst.errno != errno.ENOENT:
                 _logexception(req)
diff --git a/tests/test-lfs-serve-access.t b/tests/test-lfs-serve-access.t
--- a/tests/test-lfs-serve-access.t
+++ b/tests/test-lfs-serve-access.t
@@ -150,6 +150,33 @@  Blob URIs are correct when --prefix is u
   $LOCALIP - - [$LOGDATE$] "POST /subdir/mount/point/.git/info/lfs/objects/batch HTTP/1.1" 200 - (glob)
   $LOCALIP - - [$LOGDATE$] "GET /subdir/mount/point/.hg/lfs/objects/f03217a32529a28a42d03b1244fe09b6e0f9fd06d7b966d4d50567be2abe6c0e HTTP/1.1" 200 - (glob)
 
+Blobs that already exist in the usercache are linked into the repo store, even
+though the client doesn't send the blob.
+
+  $ hg init server2
+  $ hg --config "lfs.usercache=$TESTTMP/servercache" -R server2 serve -d \
+  >    -p $HGPORT --pid-file=hg.pid \
+  >    -A $TESTTMP/access.log -E $TESTTMP/errors.log
+  $ cat hg.pid >> $DAEMON_PIDS
+
+  $ hg --config "lfs.usercache=$TESTTMP/servercache" -R cloned2 --debug \
+  >    push http://localhost:$HGPORT | grep '^[{} ]'
+  {
+    "objects": [
+      {
+        "oid": "f03217a32529a28a42d03b1244fe09b6e0f9fd06d7b966d4d50567be2abe6c0e"
+        "size": 20
+      }
+    ]
+    "transfer": "basic"
+  }
+  $ find server2/.hg/store/lfs/objects | sort
+  server2/.hg/store/lfs/objects
+  server2/.hg/store/lfs/objects/f0
+  server2/.hg/store/lfs/objects/f0/3217a32529a28a42d03b1244fe09b6e0f9fd06d7b966d4d50567be2abe6c0e
+  $ $PYTHON $RUNTESTDIR/killdaemons.py $DAEMON_PIDS
+  $ cat $TESTTMP/errors.log
+
   $ cat >> $TESTTMP/lfsstoreerror.py <<EOF
   > import errno
   > from hgext.lfs import blobstore