Patchwork D7708: lfs: add a switch to `hg verify` to ignore the content of blobs

login
register
mail settings
Submitter phabricator
Date Dec. 23, 2019, 7:47 a.m.
Message ID <differential-rev-PHID-DREV-6gf7j27k5l2jvjrh2k64-req@mercurial-scm.org>
Download mbox | patch
Permalink /patch/44017/
State Superseded
Headers show

Comments

phabricator - Dec. 23, 2019, 7:47 a.m.
mharbison72 created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  Trying to validate the fulltext of an external revision causes missing blobs to
  be downloaded and cached.  Since the downloads aren't batch prefetched[1] and
  aren't compressed, this can be expensive both in terms of time and space.
  
  I made this a tri-state instead of a simple bool because there's an existing
  (undocumented) config to handle this, and it would be weird if `hg verify` were
  to suddenly start ignoring that config but an `hg recover` initiated verify
  honors it.  Since this uses the same config setting, it too will skip
  rename verification (which requires fulltext, but not for LFS).
  
  [1] https://www.mercurial-scm.org/pipermail/mercurial-devel/2018-April/116118.html

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D7708

AFFECTED FILES
  hgext/lfs/__init__.py
  tests/test-lfs.t

CHANGE DETAILS




To: mharbison72, #hg-reviewers
Cc: mercurial-devel
phabricator - Dec. 23, 2019, 5:06 p.m.
pulkit added inline comments.

INLINE COMMENTS

> __init__.py:408
> +@eh.wrapcommand(
> +    b'verify', opts=[(b'', b'no-lfs', None, _(b'skip all lfs blob content'))]
> +)

IIUC, we can have a `--lfs` flag and `--no-lfs` will work.

Also by default, we may not want to verify large files content?

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7708/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7708

To: mharbison72, #hg-reviewers
Cc: pulkit, mercurial-devel
phabricator - Dec. 23, 2019, 5:32 p.m.
This revision now requires changes to proceed.
indygreg added a comment.
indygreg requested changes to this revision.


  Per Pulkit's review, please change the argument to `--lfs`, as `--no-lfs` should automatically work.
  
  Also, I didn't realize `verify.skipflags` existed and exposes low-level bit flags of revlog storage. That's a massive abstraction leak and I'd be in favor of reinventing that feature so it worked based on string feature names.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7708/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7708

To: mharbison72, #hg-reviewers, indygreg
Cc: indygreg, pulkit, mercurial-devel
phabricator - Dec. 23, 2019, 6:30 p.m.
mharbison72 added a comment.


  In D7708#113600 <https://phab.mercurial-scm.org/D7708#113600>, @indygreg wrote:
  
  > Per Pulkit's review, please change the argument to `--lfs`, as `--no-lfs` should automatically work.
  
  Does my response to @pulkit change your mind any?  (Not sure how familiar you are with the lfs internals- I apparently forgot a lot myself).
  
  > Also, I didn't realize `verify.skipflags` existed and exposes low-level bit flags of revlog storage. That's a massive abstraction leak and I'd be in favor of reinventing that feature so it worked based on string feature names.
  
  Any thumbnail sketch of what this should look like?  Having to know that things are externally stored or that rename checking has special requirements for example, also seems leaky to a lesser degree.  I was surprised that revlog bits were being passed around like this too.  My plan is/was to add an lfs configbool that's documented, which could toggle this on or off, so users don't need to know bit patterns.  (I was also surprised that hex numbers in the config don't work, and it pukes when you fetch the value.)  Obviously I wouldn't want the lack of `--lfs` or `--no-lfs` to override this.  IDK if it's better to have an LFS config for it's bit, and then another for that other bit... or some setting that is a list of skip names.  I assume whatever pattern is designed here would influence the internal replacement.

INLINE COMMENTS

> pulkit wrote in __init__.py:408
> IIUC, we can have a `--lfs` flag and `--no-lfs` will work.
> 
> Also by default, we may not want to verify large files content?

The way it is now, both `--lfs` and `--no-lfs` are accepted.  I actually started with `--lfs`, defaulting to on, but then changed it and forget why.  I'll try to figure that out today, but it may have been the code was simpler when defaulting to verifying the blobs and honoring the config setting.  It may have also had to do with shifting to verifying the locally accessible blobs later in the series.  I didn't want to over complicate things with `--none`, `--local-only`, and `--all`.  The options with largefiles are kind of annoying.  I guess the question is, "What would an average user who doesn't know hg internals want/understand?"

As far as defaulting to *not* verifying blobs, I'm uncertain.  I'd think that verifying local blobs wouldn't add much overhead.  But the deciding factor for me was that there's no way to verify the content in the revlog without verifying the blob itself- the stored hash in the revlog is for the blob, not the stored JSON data.  That's why I asked in IRC the other day about how strong the guarantees are about repo integrity after running a verify.  IIRC, largefiles doesn't verify content by default, but it will at least make sure the revlog for the standin file is OK.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7708/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7708

To: mharbison72, #hg-reviewers, indygreg
Cc: indygreg, pulkit, mercurial-devel
phabricator - Jan. 9, 2020, 4:37 a.m.
mharbison72 added a comment.
mharbison72 requested review of this revision.


  Setting this back to wanting a review because I suspect my response questions were overlooked with vacations, and it won't show up in anybody's dashboard otherwise.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST ACTION
  https://phab.mercurial-scm.org/D7708/new/

REVISION DETAIL
  https://phab.mercurial-scm.org/D7708

To: mharbison72, #hg-reviewers, indygreg
Cc: indygreg, pulkit, mercurial-devel

Patch

diff --git a/tests/test-lfs.t b/tests/test-lfs.t
--- a/tests/test-lfs.t
+++ b/tests/test-lfs.t
@@ -796,6 +796,27 @@ 
   $ test -f fromcorrupt/.hg/store/lfs/objects/66/100b384bf761271b407d79fc30cdd0554f3b2c5d944836e936d584b88ce88e
   [1]
 
+Verify will not try to download lfs blobs, if told not to process lfs content
+
+  $ hg -R fromcorrupt --config lfs.usercache=emptycache verify -v --no-lfs
+  repository uses revlog format 1
+  checking changesets
+  checking manifests
+  crosschecking files in changesets and manifests
+  checking files
+  checked 5 changesets with 10 changes to 4 files
+
+Verify will not try to download lfs blobs, if told not to by the config option
+
+  $ hg -R fromcorrupt --config lfs.usercache=emptycache verify -v \
+  >                   --config verify.skipflags=8192
+  repository uses revlog format 1
+  checking changesets
+  checking manifests
+  crosschecking files in changesets and manifests
+  checking files
+  checked 5 changesets with 10 changes to 4 files
+
 Verify will copy/link all lfs objects into the local store that aren't already
 present.  Bypass the corrupted usercache to show that verify works when fed by
 the (uncorrupted) remote store.
diff --git a/hgext/lfs/__init__.py b/hgext/lfs/__init__.py
--- a/hgext/lfs/__init__.py
+++ b/hgext/lfs/__init__.py
@@ -402,3 +402,24 @@ 
     revs = opts.get('rev', [])
     pointers = wrapper.extractpointers(repo, scmutil.revrange(repo, revs))
     wrapper.uploadblobs(repo, pointers)
+
+
+@eh.wrapcommand(
+    b'verify', opts=[(b'', b'no-lfs', None, _(b'skip all lfs blob content'))]
+)
+def verify(orig, ui, repo, **opts):
+    skipflags = repo.ui.configint(b'verify', b'skipflags')
+    no_lfs = opts.pop('no_lfs')
+
+    if skipflags:
+        # --lfs overrides the config bit, if set.
+        if no_lfs is False:
+            skipflags &= ~repository.REVISION_FLAG_EXTSTORED
+    else:
+        skipflags = 0
+
+    if no_lfs is True:
+        skipflags |= repository.REVISION_FLAG_EXTSTORED
+
+    with ui.configoverride({(b'verify', b'skipflags'): skipflags}):
+        return orig(ui, repo, **opts)